MSBI
BI (Business Intelligence) is a process of transforming data into a meaningful form, and this meaningful data is termed as information. Information is shown in graphical or tabular format.
In this article, I will try to explain what ETL process is, what BI Life Cycle should be, and how to create a simple project on a sales order system. This system gets data from CSV files and import this data into data warehouse.
Tools Required
- MS SQL
- Visual Studio
- SQL Server Data Tool
BI is a process of converting data into information. However, data and information both seem like similar words but data is different from information.
Data
Data is laying in technical format like CSV, Excel, Binary value. it is very difficult for the end user to understand.
Information
Information takes all this technical format and presents it in a very user-friendly way so the user can understand the data and make meaning out of it. BI transforms the data into meaningful data. This process involves a very important step i.e. "Analysis". Analysis involves lots of complexity because the data can be in CSV, Excel, or binary format and reading and analyzing each piece of data is very difficult.
The best approach is to take data and dump it into a central database; this central database is called a Data warehouse and on this central database analysis algorithms are run. After analysis, Information is Generated.
ETL means to take data from different Data sources (Data format can be CSV, EXCEL, txt) and dump it into Central database (Data warehouse)
Following process are involved in ETL
Extraction
Extraction means retrieving data from data source for further data processing.
Transformation
Means looking at the data source. They have different data types, for example one data source is Male as M and female as F and in another data source Male represents 1 and Female represents 2. It ensures that this type of data is transformed into a common format. The last step is loading data into data warehouse and this complete process in BI is termed ETL
Loading
The final target is to dump the data into data warehouse. After dumping the data into data warehouse, use some reporting tools to represent information in a graphical or tabular manner.
MSBI
STEP 1
- SSIS : ETL section done by Integration services (SSIS)
- analysis : analysis part done by analysis service termed as SSAS
- Data Representation done by report server (SSRS)
STEP 2 Data Base creation
Create database BIProject
Create table SalesOrder(SalesOrderId int primary key, SalesAmount money, Dates date)
STEP 3
Create CSV Dummy Data file.
SalesOrderId,SalesAmount,Date
101,1000,10/02/2017
102,2500,11/02/2017
103,1500,10/02/2017
104,1400,11/03/2017
105,1200,12/03/2017
106,1003,11/04/2017
107,1800,12/04/2017
file and the extension doesn't matter but the format should be CSV (comma separated file)
Drag the Data Flow tool from SSIS Toolbar.
- Data flow - Involves extraction, transformation loading is done by Data Flow Control
- Control Flow - Control Flow is invoking data flow
STEP 5
After double clicking on the Data flow control, Control flow tab moves on data flow tab. You can see three options in SSIS toolbox.
- Transformation - Ensuring the data into common format
- Source - Extraction data from different source.
- Destination - Dump data in destination Datacenter or Data warehouse.
STEP 6
Flat file source is data source which can be CSV, EXCEL, etc. Now I have to use CSV file.
- Drag Flat file source form SSIS tools box. You can see in the picture after that right click on flat file source and select Edit option.
- Connection Manager, select new option.
- Change connection manager name,
- Browser data source file, like CSV or Excel
- Select Delimited
- Select Header type like Comma, semicolon or as per file format.
- To check first line as column name.
- You can check data after selecting column option.
You will see two Arrows on Flat data source. Blue Arrow is the Arrow which comes out with actual data, and the Red Arrow will have an error inside.
STEP 7
The same process will be done in ADO.net Destination option which we have done in Flat data source.
STEP 8
After all configuration of flat data file and data source, execute the project. Remember we didn't do Transformation in this project.
- Green indicator of Flat data source represents that File load without any Error.
- Green indicator of ADO.Net source represents that Data load into data warehouse without any Error.