A database is a software system that is used to store information about an organization or an enterprise or user's activities, or any other business activities, for the purposes of using that later.
RDBMS stands for Relational Database Management System. Relational database management system is based on a relational model. In RDBMS database, data stores in the form of tables. Tables contain rows and columns and maintain relations to other tables. Most used RDBMSs are SQL Server, Oracle, MySQL, etc. In RDBMS, each table has a unique name. Table has columns with unique names and collections of rows which contain the data related to the same entity.
For example, the Employee table has stored employee’s data, like EmployeeName, Address, Designation, Salary etc.
Database normalization is the process which helps the database designer to design a relational database with minimum redundant data. Normalization is the process of data organization in the form of tables. The normalization process includes dividing data into tables and making relationships among them.
Normalization process follows many steps. We generally call them normalization forms and they come one after another but mostly database gets normalized till 3rd normal form.
1st Normal Form
In the 1st normal form data is divided into tables and each row should have unique data or table assign a primary key to uniquely identify the row. Table is called in 1st normal form if, Repeating Groups of columns Eliminated into the separate tables of related attributes. Each row should be unique in the table, or table has a primary key.
Let’s take example of above employee table, for 1st normal form, add new column in the table as ProjectId and set EmpId and ProjectId as primary key with combination of two column to uniquely identify row. And repeating columns and multi-value columns can be separate in another table. Project column has multi valued data and if we separate it to another column then it will be Project1, Project2 which represent repeating groups, so we must separate the project data to another table.
EmpId
|
EmpName
|
Designation
|
Salary
|
Address
|
City
|
PostalCode
|
Emp001
|
Arvind singh Baghel
|
Consultant
|
1000
|
Warje NDA road house 1
|
Pune
|
411058
|
Emp002
|
John
|
Developer
|
500
|
Avenue du Colonel Henri
|
Paris
|
75014
|
Emp003
|
Vinod Singh
|
Manager
|
1200
|
Rajeev nagar lane 2
|
Delhi
|
110086
|
Emp004
|
Rambo
|
Consultant
|
1000
|
NE 97th st
|
Redmond
|
98052
|
ProjectDetail table: EmpId and ProjectId can set composite primary key.
EmpId
|
ProjectId
|
Project
|
Emp001
|
E1
|
ERP
|
Emp002
|
BS1
|
Billing System
|
Emp003
|
E1
|
ERP
|
Emp003
|
BS1
|
Billing System
|
Emp004
|
BNS1
|
Banking System
|
2nd Normal Form
For the 2nd normal form table must satisfy all the condition of the 1st normal form and redundant data should be eliminated to the separate table. Table is called in 2nd normal form if, table meet all the criteria of 1st normal form and subset of the data or redundant data is separated into another table and created relationship between these tables using the foreign key.
Let’s continue with above table example, The EmployeeProjectDetail table has redundant data and multicolumn primary key. For the 2nd normal form table should not have redundant data and should avoid multi column primary key. Create another table as Project with two columns ProjectName and ProjectId and set ProjectId as primary key.
Project table
ProjectId
|
Project
|
E1
|
ERP
|
BS1
|
Billing System
|
BNS1
|
Banking System
|
To make relation of Employee table with Project table create another table with two columns EmployeeId and Project code. EmpId is foreign key reference to Employee table EmpId primary key and ProjectId is foreign key reference to Project table primary key.
EmployeeProjectDetails
EmpId
|
ProjectId
|
Emp001
|
E1
|
Emp002
|
BS1
|
Emp003
|
BS1
|
Emp003
|
E1
|
Emp004
|
BNS1
|
3rd Normal Form
For the 3rd normal form table must satisfy all the requirement of the 2nd normal form and Columns which are not directly dependent on the primary key is separated to another table. There should not be transitive functional dependency. Transitive dependency should be removed and separated to another table. If a non-key column depends on another non-key column, that is called a transitive dependency.
Let’s continue with Employee table example, Transitive dependency can be found in the employee table. City column does not directly depend on the EmpId, if postal code change then city also change. It means city depend on postal code. These columns can be separate to another table. Create New table with two columns postalcode named it CityDetails and set postalcode as primary key.
CityDetails
PostalCode
|
City
|
411058
|
Pune
|
75014
|
Paris
|
110086
|
Delhi
|
98052
|
Redmond
|
Employee table
Postal code is foreign key in the Employee table to join with CityDetails table.
EmpId
|
EmpName
|
Designation
|
Salary
|
Address
|
PostalCode
|
Emp001
|
Arvind singh Baghel
|
Consultant
|
1000
|
Warje NDA road house 1
|
411058
|
Emp002
|
John
|
Developer
|
500
|
Avenue du Colonel Henri
|
75014
|
Emp003
|
Vinod Singh
|
Manager
|
1200
|
Rajeev nagar lane 2
|
110086
|
Emp004
|
Rambo
|
Consultant
|
1000
|
NE 97th st
|
98052
|
Transitive dependency has been removed, but if city change then the address also changes, so create separate table for address with AddressId, Address, Postalcode and EmpId column. Postal code is foreign key in the Address table to join with CityDetails table and EmpId is foreign key to join Employee table.
AddressDetails
AddressId
|
Address
|
PostalCode
|
EmpId
|
A001
|
Warje NDA road house 1
|
411058
|
Emp001
|
A002
|
Avenue du Colonel Henri
|
75014
|
Emp002
|
A003
|
Rajeev nagar lane 2
|
110086
|
Emp003
|
A004
|
NE 97th st
|
98052
|
Emp004
|
Update Employee table remove Address and postalCode from the table.
Employee table
EmpId
|
EmpName
|
Designation
|
Salary
|
Emp001
|
Arvind singh Baghel
|
Consultant
|
1000
|
Emp002
|
John
|
Developer
|
500
|
Emp003
|
Vinod Singh
|
Manager
|
1200
|
Emp004
|
Rambo
|
Consultant
|
1000
|
The database can normalize more in 4th Normal Form also called Boyce-Code Normal Form, 5th Normal Form, 6th normal form, but these are rarely used.
Denormalization
Denormalization is the opposite process of normalization. Denormalization is used on the normalized database to increase the performance by combining tables to reduce the joins so that the data can be fetched without joins, which increases the performance of data fetching.