Creating Cubes in SSAS
It was the 1980s when the first relational database management system was created, and data scientists across the globe began talking about tables and their relationships with one another. The word "table" has been described as a collection of related data held in a structural format within a database. Every piece of meaningful information has been treated as data.
We should thank E.F. Codd for introducing such a concept that revolutionized the industry. Flat files and network file storages are no more. It was history after 1980. If you wonder why I am talking about this in this article, here is the answer! The total article revolves around data. Everything is data from storing your personal information during account creation in Amazon (or any e-marketing or e-commerce website) to purchase and delivery.
Suppose you are a .Net developer and are developing your applications for a large company handling millions or even trillions of records like mine, or your application is manipulating data every day. In that case, this article will be an eye-opener. How the world manipulated data during the 1980s was different from how it is manipulated now. This article is a collection of many reference materials used when working on a real-world application.
Table of Contents
- Introduction
- What a Data Warehouse is, and why is it used
- Single Version of Truth
- Performance
- Simplicity
- Data Persistence
- Creating your first SSAS project
- Data Source
- Data Source Views
- Cubes
- Dimension
- Mining Structures
- Roles
- Assemblies
- Miscellaneous
- Data Source
- Data Source Views
- Dimensions
- Cubes
- What are Measures
- What are Measure Groups
- Conclusion
Introduction
As real-world product developers, our everyday activity involves connections to the database. It is how user information is stored or we retrieve it from the database. Most applications naturally perform the four operations in the common insert, update, delete, and select. We do these four operations from various providers: SQL Server, Azure, MongoDB, etc. These are from various data sources. But to the end-user, this will be an easy experience when using this product (application).
For example, when logging into Amazon, the authentication is verified by a web service connected to a data source from its servers located in Australia (just saying, for example). While you search products (mobiles, furniture, or whatever), the data is retrieved from a data source located in a local server in the USA. We surf Amazon as a single product. If our product is running in production, the number of users who do these preceding four operations from the first day of our launch until the date will not be constant. As our product gains popularity, we gain more users. The solid reason to write this article is performance. The greater the number of users, the slower the performance.
Here are the ways elaborated on how a Data Warehouse helps us solve complex query processing problems. Again, Suppose you are a .Net developer and developing your application for a large company handling millions or trillions of records daily (as does mine), or your application is manipulating data daily. In that case, this article will be an eye-opener.
What a Data Warehouse is, and why is it used?
A Data Warehouse is a database, as the name implies, but it is much more than that in terms of functionality. First of all, data does not originate from a Data Warehouse. Instead, it originates from source systems, such as a Customer Relationship Management (CRM) system, HR System, Financial Management system, and so on, in the form of SQL tables, CSV files, etc. The data from these source systems are brought to a Data Warehouse so that Data Warehouse becomes a central repository with all of the company's relevant information.
That makes the question of why we would do such a thing. Why do we need to copy data when it already exists in the system and place it into the central repository? To answer that question, we need to talk about how decisions are made in organizations and the impact of a decision or a cumulative decision in an organization. We could agree that an organization's success is defined by the cumulative ability to make good decisions for successful outcomes. How are we going to make the decision-makers make their decisions? We are not only talking about the decisions made by top executives but also the operational decisions to be taken on a day-to-day basis.
To make decisions, we need to provide decision-makers with proper information. We have our internal data, such as data from the data sources. We may also require external information such as physical and financial (such as whether it is physical and financial is stock market) information. All of these must be brought together for decision-making to provide meaningful information. This is where a Data Warehouse is essential. It combines all the internal and external data from various data sources and keeps it in a single place called a warehouse. This helps decision-makers compare the environments and make proper decisions. So, why can't we query those from each source separately? There are four solid reasons for that.
Single Version of Truth
We may have user information in our database of the financial system and weather information system. Querying each of them differently for user information adds redundancy, and Data Warehouse solves this problem by having all the data in a single repository.
Performance
Querying millions of data from various data sources is much more complex and gains less performance than querying a single centralized data source.
Simplicity
Our normal database architecture suggests we use normalized data. When the user wants information, it takes time for us to join user information split across tables or data sources. Data Warehouses avoid these problems.
Data Persistence
Organizations want to align their data in a way that sometimes is not the way our data sources align the data. For example, suppose the user or the decision-maker wants to update information about the user when the temperature is above 10 degrees. In that case, it does not persist when we have different data sources. A Data Warehouse helps you avoid this problem.
A Data Warehouse is not a product or technology but a concept. Let's build a Data Warehouse in this article.
SQL Server Analysis Services (SSAS) in MSBI
This will create a multidimensional database from a single-dimensional database.
Had we installed Visual Studio, we would have had Business Intelligence Development Studio (BIDS) that does all these activities.
Creating your first SSAS application
I installed the SQL Server, and the business intelligence suite of vs. suits me. Even though MSBI consists of three main parts, in this article, we will have an introduction to what SSAS is.
Step 1. Open Visual Studio and click on the new project. There is an option named Business intelligence to select a BI-related project on the left pane of the opened window. Click on the Analysis Services Project, name it, and click OK.
Step 2. This project is an SSAS project, and it will create a Data Warehouse for us. We can see eight folders, and let us know what they are. These folders are the layers to create our SSAS Data Warehouse.
Data Sources
Contains a set of data sources from the Oracle or SQL Server databases. SSAS supports a set of data providers that can be chosen when adding data sources.
Data Source Views
A data source view contains a logical model of the schema used by analysis services' multidimensional database objects. They can be built over one or more data sources, allowing you to define multidimensional and data-mining objects that integrate data from multiple sources.
Cubes
SQL Server stores data in the form of tables, but in SSAS, the same happens in the form of cubes. Without creating it, we cannot query a Data Warehouse. We will explain this shortly.
Dimensions
Dimensions contain the necessary information to query the cubes. They contain information about the fact tables to contain measure values to query the cubes.
Mining Structures
This is the least-used function in MSBI nowadays. This feature was not developed much after SQL Server 2005. SSAS has included 9 data mining algorithms. We will see them shortly.
Roles
This helps you to create and manage roles for the specific Data Warehouse.
Assemblies
Assemblies let you create external, user-defined functions using any Common Language Runtime (CLR) language, such as Microsoft Visual Basic .NET or Microsoft Visual C#. Component Object Model (COM) Automation languages such as Microsoft Visual Basic or Microsoft Visual C++ can also be used.
Miscellaneous
If you add documents, images, or any other things apart from all the preceding, you are welcome to add them here.
Let us discuss each one of them.
Creating a Data Source
In an Analysis Services Multidimensional Model, a data source object represents a connection to the source from which we will process the data. A multi-dimensional model should at least contain only one data source object.
Choosing a data provider
We can choose types of data providers, namely SQL Server and Oracle providers. For SQL Server data providers, we can use the SQL Server native client, and for Oracle, we can use some third-party data sources.
Set up credentials
We can set up credentials with either Windows Authentication or SQL Server Authentication, as used in our normal daily applications. In servers, it is advisable to use Windows Authentication.
Let us switch back to our application open in Visual Studio to create a data source.
Right-click on the data source folder and choose the new data source.
From the data source wizard, do the following options.
Click the new button to create a new data source from the SQL Server or Oracle DB.
From the connection, the manager chooses the provider type and server name, and you can set the authentication.
After providing the database information, click OK, and it takes you back to the Data Source Wizard with the connection added. Click next.
The next window asks for impersonation information. We can choose one of four types of accounts, and I am choosing Inherit since it is the safest for this demo.
And now click Next, and it takes you to set the data source name. You can see the connection string for verification in this window.
The data source is finally created!
We can have multiple data sources created in the same way.
The data source can be anything like SQL Server, Oracle, Access, DB2, Informatica, etc. It allows you to edit all the information you have configured. To edit all the information, click on your created data source and click open.
Thus we have created the data source.
Creating a Data Source View
A Data Source View contains the metadata about the data source. A data source view can be used with cubes, dimensions, and mining structures. This data source view is stored in an XML format. They can be built over one or more data sources or metadata. A data source view may contain relationships, primary keys, object names, and calculated columns. The client application cannot query this.
Step 1. Right-click the "Data Sources" view and then click New Data Source View.
Step 2. Select a specific data source.
Step 3. You can also add a data source by clicking the new data source button. Once the data source is selected, click Next. You can see you select tables and options. With this, you can select the view or table you want.
Step 4. After selecting tables and views,s you can see the completing wizard. Click complete.
That's it. We have created a data source view.
Creating Dimensions
Dimensions are business objects or entities that contain a list of attributes that describe the object (dimension itself), and they are usually derived from some code tables.
The following is the procedure to create dimensions.
Step 1. Right-click on the dimension folder and click select a new dimension. Click Next if there are any prompts.
Step 2. A Cube Wizard page appears to "Select Creation Method." In that, select "Use an existing table." Then click on the "Next" Button.
Step 3. The dialog will guide you to choose a data source, table name, and name column.
Step 4. Choose your data source. Choose from the list of tables selected in the data source view and choose a name column. Preferred is the primary key column. Once you click next, it will show you a set of related tables with foreign key constraints to it, if any. Click Next.
Step 5. You need to select all the dimensions that you want.
Step 6. In the next screen, you will see the summary of Dimensions. Here provide the name of the Dimension, in other words, ProductDimension, and then click on the Finish button.
Step 7. Now you can see the dimensions created In the Solution Explorer.
Creating Cubes
A cube includes measures in measure groups, business logic, plus a collection of dimensions that provide context for evaluating the numerical data that a measure offers. Both measures and measure groups are essential components of a cube. A cube cannot exist without at least one of each.
What Measures are?
A measure represents a column that contains quantifiable data, usually numeric, that can be aggregated. Measures represent some aspect of organizational activity, expressed in monetary terms (such as revenue, margins, or costs) as counts (inventory levels, number of employees, customers, or orders) or as a more complex calculation incorporating business logic.
What are Measure Groups?
Measure Groups are the collection of measure values.
Step 1. Right-click on the cubes folder and select new cube.
Step 2. Select the "Use existing table" option and click Next.
Step 3. Select the tables you want for the cube. A Cube Wizard will appear to "Select Measure Group Table" with the table in the data source view.
Step 4. Select the measures you want in the Cube Wizard to "Select Measures" and then click Next.
Step 5. Select the existing dimensions and click Next.
Step 6. Select the new dimensions you want.
Step 7. The complete wizard appears with the cube name.
Step 8. We can see the cube is created.
Conclusion
Thus, cube creation in SSAS is required to reduce the query processing time and improve performance. In the next article, I will cover how to query these cubes and integrate this cube with our .Net application.