Introduction
This article is intended for audiences who are aware of designing the database but are not sure about the Normalization process.
What is Normalization?
When we want to store some piece of information, we store it in a database. Now storing a data in database means it should be stored properly so that while retrieving it, it should be easy. So we say to store it in a Normalized way. In short, Normalization can be defined as the process of organizing the data in a database efficiently. The result of normalization is a logical database design and is called Normal Form.
Why Normalization?
The goals of the Normalization process are
- It helps you to eliminate the redundant data from the same table.
- It ensures the data dependencies between the tables are proper.
- A Normalized database design makes it easy to change in modification is required.
Advantages of Normalization
- Data redundancy is removed
- Faster update as redundant columns from tables are removed.
- Easy understanding of the structure
- Improvement in Index as be achieved
- Long-term maintainability of the database gets easier
Disadvantages of Normalization
- Query, to some extent, gets complicated
- Performance may degrade due to multiple joins
Suppose I have some data with me, say
UnNormalized Table
As you can see above, the table is not properly managed
The normalization process is mainly divided into stages which we call Normal Form. Let’s talk about each Normal Form one by one with an example. Basically, a database can be normalized into various normal forms, such as
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce Codd Normal Form (BCNF)
- Fourth Normal Form (4NF) and so on
But today I would like to talk about only up to Boyce Codd Normal Form because Fourth Normal Form and others are rarely used in the database design.
First Normal Form (1NF)
- Eliminate all repeating groups in individual table
- One cell should contain only one data
- Create a separate table for each set of related data
- Identify each set of related data with a primary key
As you can see above,
- Our UnNormalized table does not have a repeating group. So this is not applicable over here.
- We have eliminated the comma-separated values and shifted to another table, Skill. Each employee in the Employee table is related to the Skill table through the EmployeeId column (Note: It is not a foreign key relationship. We are just replacing the comma-separated values in each cell into rows and creating a separate table for each)
- Each cell is containing a single value
- Skills of the table are identified with primary key
Second Normal Form (2NF)
- Tables should be in First Normal Form (1NF)
- Eliminate partial primary key dependencies
- Nonkey columns must depend on the entire composite primary key and create separate tables for sets of values that apply to multiple records
- Relate these tables with a foreign key
As you can see above,
- We have eliminated the partial primary key dependency of EmployeeId and created a new table that will contain the relation of EmployeeId and SkillId. These columns will be the respective foreign key for the Employee and Skill table.
Third Normal Form (3NF)
- The table should be in Second Normal Form
- Eliminate fields that are not dependent on key i.e. - Eliminate Transitive Dependencies. Create a separate table for it.
Let’s suppose I want to add new columns in Employee Table Manager and Project
Now, adding these 2 columns violates the third Normal Form because the non-key column Project is dependent on another not key column i.e. - Manager. So we will normalize the table by separating the nondependent column to another table. This way, we can achieve the third Normal Form.
Boyce Codd Normal Form (BCNF)
- BCNF is based on the concept of a determinant.
- A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.
- A relation in BCNF is, and only if, every determinant is a candidate key.
- The same as 3NF except in 3NF, we only worry about non-key attributes
Note: If there is only one candidate key, then 3NF and BCNF are the same.
Consider another column below the table, say ProjectTechnology
As shown above, each manager will be handling a unique project, so we can say a particular project is determined by a particular manager where Manager and Project depict a candidate key.
If we delete the entry of Manager Ronnie from the table, we lose not only information about Project BCF but also the fact that the project was developed in Asp.Net technology. We cannot make the entry of the fact that the BCF project was developed using Asp.Net.
So let’s break this into separate tables
Fourth Normal Form (4NF)
- Database design should follow the 1NF, 2NF, 3NF and BCNF if possible
- There must not be more than one multivalued dependency other than a candidate key.
I hope you got what actually is the process of Normalization.
Conclusion
Thus concluding it, Normalization is a process which is a must to design any database. Hope you like this article. Please share your comments whether it’s good or bad. Your comments are valuable to me to get better.