Database Normalization

Database normalization

Here you will learn the concept of data base normalization which give a basic idea to understand the process of data base normalization.

Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization with (1NF) in 1970. Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971.

Concept of normalization

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table ) and ensuring data  dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.

What is an "inconsistent dependency"? While it is intuitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee's salary is related to, or dependent on, the employee and thus should be moved to the Employees table. Inconsistent dependencies can make data difficult to access because the path to find the data may be missing or broken.

The Normal Forms

There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form. Third normal form is considered the highest level necessary for most applications." Although other levels of normalization are possible like 4NF,5NF and 6NF which rarely seen and won't be discussed in this article.

First Normal Form (1NF)

First normal form (1NF) sets the very basic rules for an organized database:

1. Eliminate duplicate columns from the same table.
2. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Lets see the example prior to normalization

itemcolorspricetax
pencilred,blue100.4
scalered,yellow100.4
pencilred,blue100.4
bookblack,blue1508.80

This table is not in first normal form because :

1. There are multiple fields in color lab.
2. Records are repeating (Duplicate records) or no primary key.

First Normal Form (1NF)

itemcolorspricetax
pencilred100.4
pencilblue100.4
scalered100.4
scaleyellow100.4
bookblack1508.80
bookblue1508.80

This table is now in first normal form.

Second Normal Form (2NF)

The concept of remove the delicacy of data comes in the Second Normal Form (2NF)

  1. It should meet all the requirements of the first normal form.
  2. It should remove subsets of data that apply to multiple rows of a table and place them in separate tables. 
  3. It create relationships between these new tables and their predecessors through the use of foreign keys. 
The First Normal form deals with the atomicity whereas the Second Normal Form deals with the relationship between the composite key columns and non-key columns. To achieve the next progressive level your table should satisfy the requirement of First Normal Form then move towards the Second Normal Form.

Let's introduce a Review table as an example :

itemcolorspricetax
pencilred100.4
pencilblue100.4
scalered100.4
scaleyellow100.4
bookblack1508.80
bookblue1508.80

Table is not in Second Normal Form because the price and tax depends on the item, but not color.

itemcolors
pencilred
pencilblue
scalered
scaleyellow
bookblack
bookblue

 

itempricetax
pencil100.4
scale100.4
book108.80

Tables are now in Second Normal Form.

Third Normal Form (3NF)

The Third Normal Form has one more additional requirement :

  1. It should meet all the requirements of the second normal form.
  2. It should remove columns that are not dependent upon the primary key.
  3. There should no transitive dependency between field.

For example

if A->B and B->c then C doesn't belongs to A [where A,B,C are field]
In other word we can say A transitive dependency is a type of functional dependency in which the value in a non-key field is determined by the value in another non-key field and that field is not a candidate key.

In the Third Normal Form all columns depend upon the primary key. When one column depends upon the other column,  table break the rule and turns into the dependency on the primary key.

itemcolors
pencilred
pencilblue
scalered
scaleyellow
bookblack
bookblue

 

itempricetax
pencil100.4
scale100.4
book1508.80

Tables are not in third Normal Form because tax depends on price, not item.

itemcolors
pencilred
pencilblue
scalered
scaleyellow
bookblack
bookblue

 

itemprice
pencil10
scale
10
book150

 

pricetax
100.4
1508.80

Tables are now in third Normal Form.

conclusion

Data base normalization is used to produce the most efficient and functional databases. Normalization is the storage of data where it uniquely belongs. Doing so creates a secure, robust and reliable structure for your application.

Next Recommended Reading List databases, tables, columns in MySQL