Normalization in MySQL


Share

The process of separating your data into tables and creating primary keys is called normalization. Its main goal is to make sure each piece of information appears in the database only once.

Duplicating data is very inefficient, because it makes databases larger than they need to be and therefore slows down access. More importantly, the presence of duplicates creates a strong risk that you’ll update only one row of the duplicated data, creating inconsistencies and potentially causing serious errors.

There are three normalization form in MySQL:


First Normal Form
Second Normal Form
Third Normal Form


First Normal Form


For a database to satisfy the First Normal Form, it must fulfill three requirements:
1. There should be no repeating columns containing the same kind of data.
2. All columns should contain a single value.
3. There should be a primary key to uniquely identify each row.
First normal form also known as 1NF.


Example:-

Second Normal Form


The First Normal Form deals with duplicate data (or redundancy) across multiple columns. The Second Normal Form is all about redundancy across multiple rows.

In order to achieve Second Normal Form, your tables must already be in First Normal Form. Once this has been done, Second Normal Form is achieved by identifying columns whose data repeats in different places and removing them to their own tables.
Second normal form also known as 2NF.


Example:-


Third Normal Form


Once you have a database that complies with both the First and Second Normal Forms, it is in pretty good shape and you might not have to modify it any further. However, if you wish to be very strict with your database, you can ensure that it adheres to the Third Normal Form, which requires that data that is not directly dependent on the primary key but that is dependent on another value in the table should also be moved into separate tables, according to the dependence.


Example:-


Deciding whether to use the Third Normal Form can be tricky. Your evaluation should rest on what additional data you may need to add at a later date. If you are absolutely certain that the name and address of a customer is all that you will ever require, you probably will want to leave out this final normalization stage.
Third normal form also known as 3NF.


When Not to Use Normalization


Now that you know all about normalization, I’m going to tell you why you should throw these rules out of the window on high-traffic sites. Now, I’m not saying you’ve wasted your time reading the last several pages (you most definitely haven’t), but you should never fully normalize your tables on sites that will cause MySQL to thrash.

You see, normalization requires spreading data across multiple tables, and this means making multiple calls to MySQL for each query. On a very popular site, if you have normalized tables, your database access will slow down considerably once you get above a few dozen concurrent users, because they will be creating hundreds of database accesses between them. In fact, I would go so far as to say that you should denormalize any commonly looked-up data as much as you can.

The reason is that if you have data duplicated across your tables, you can substantially reduce the number of additional requests that need to be made, because most of the data you want is available in each table. This means that you can simply add an extra column to a query and that field will be available for all matching results, although (of course) you will have to deal with the previously mentioned downsides, such as using up large amounts of disk space and needing to ensure that you update every single duplicate copy of your data when it needs modifying.