Do you know about Data Normalization ??
Data Science

Do you know about Data Normalization ??

Database Normalization with SQL

Hey SQL enthusiast

This is a guide on Database Normalization and how to implement normalization using SQL, let's get started.

 

What is Database Normalization
Normalization is a technique used for evaluating and reorganizing table structures to minimize redundancies, improve data integrity, improve storage efficiency, and reduce the need to re-design the database if new data is introduced. Database normalization removes data inconsistencies which may cause the analysis of our data to be more complicated. These inconsistencies could come from updating records, inserting, or deleting records, which is known as update anomaly, insertion, and deletion anomaly, normalization also includes the removal of duplicate records which in turn saves storage.
The figure below shows the stages of Database normalization.



Let's take a look at the first normal form, what does it take for a database table to be in the first normal form. Here are the rules for a table to be in first normal form (1NF)

    • Each column should contain an atomic value
    • A column should contain data of the same data type eg (if a column contains an integer all the other elements in the column should be an integer)
    • Each column should have a unique column name
    • The order in which data are stored in the columns does not matter.

let's take a look at a table to better understand the concept of 1NF

Records Table Not Normalize



As you can see from the above table there are columns with duplicate elements in the database table, from the last row, the columns are (student major Std_major, courses title CTitle, and Instructor location Ilocation).
to solve this problem we are going to create a new table and insert only one atomic value into every column in the table.

Here is the code used to create the table and insert the element into the table from the original table.

reating a new table for the schools Record 1NF
CREATE TABLE Schools_Record_1NF (Std_ID VARCHAR(50) NOT NULL, 
                                 Std_Name VARCHAR(50), 
                                 Std_address VARCHAR(50), 
                                 Std_Major VARCHAR(50), 
                                 CID VARCHAR NOT NULL, 
                                 CTitle VARCHAR(50), 
                                 Iname VARCHAR(50), 
                                 Ilocation VARCHAR(100), 
                                 Grade CHAR(5))

--Inserting into the Schools Record 1NF table
INSERT INTO Schools_Record_1NF(Std_ID, Std_Name, 
                               Std_address, Std_Major, 
                               CID, CTitle, Iname, 
                               Ilocation, Grade)

--Using Case statement to deal with columns with duplicate entries
SELECT Std_ID, Std_Name, Std_address,
-- The CASE Statement here checks the first entry in the column Using (SUBSTR AND INSTR)
    CASE WHEN
        SUBSTR(Std_Major, 1, INSTR(Std_Major, ',')-1)='' THEN Std_Major
        ELSE SUBSTR(Std_Major,1,INSTR(Std_Major, ',')-1)
    END AS Std_Major,
    CASE WHEN
        SUBSTR(CID,1,INSTR(CID, ' ')-1)='' THEN CID
        ELSE SUBSTR(CID,1,INSTR(CID, ' ')-1)
    END AS CID,
    CASE WHEN
        SUBSTR(CTitle,1,INSTR(CTitle,',')-1)='' THEN CTitle
        ELSE SUBSTR(CTitle,1,INSTR(CTitle,',')-1)
    END AS CTitle,
    CASE WHEN
        SUBSTR(Iname,1,INSTR(Iname,',')-1)='' THEN Iname
        ELSE SUBSTR(Iname,1,INSTR(Iname,',')-1)
    END AS Iname, 
    CASE WHEN
        SUBSTR(Ilocation,1,INSTR(Ilocation, ',')-1)='' THEN Ilocation
        ELSE SUBSTR(Ilocation,1,INSTR(Ilocation, ',')-1)
    END AS Ilocation,
    CASE WHEN
        SUBSTR(Grade,1,INSTR(Grade, ',')-1)='' THEN Grade
        ELSE SUBSTR(Grade,1,INSTR(Grade, ',')-1)
    END AS Grade
FROM School_Record

-- Using the union operator to join the first and the second entry of the table
UNION

SELECT Std_ID, Std_Name, Std_address, 
-- Using SUBSTR and INSTR to access the second entry inside the columns
    SUBSTR(Std_Major,INSTR(Std_Major,',')+1) As Std_Major,
    SUBSTR(CID, INSTR(CID,' ')+1) AS CID,
    SUBSTR(CTitle, INSTR(CTitle, ',')+1) AS CTitle,
    SUBSTR(Iname, INSTR(Iname, ',')+1) AS Iname, 
    SUBSTR(Ilocation, INSTR(Ilocation, ',')+1) AS Ilocation, 
    SUBSTR(Grade, INSTR(Grade, ',')+1) AS Grade
FROM School_Record
-- Checking the columns with multiple entries
WHERE Std_Major LIKE "%,%" or CID LIKE "% %"


Here is the new table after applying the above code

Records Table in 1NF




one new row is added to the table rather than having columns with non-atomic values.

Next up let's move into the second normal form (2NF)

The Second Normal Form (2NF)


Table not in 2NF



The above table is not in 2NF because not all the columns in the table are depending on the primary keys of the table. e.g. columns like the student name Std_name only depends on the student Id Std_ID column which is one part of the primary key since both Std_ID and CID are the primary keys in this case and also courses title only depends on courses Id CID without depending on student Id Std_ID which means we are having a partial functional dependency.

 To Convert a table to Second Normal Form

  • The table must be in 1NF
  • A table in 2NF should not have a partial functional dependency.

What is a partial functional dependency?
let's understand Functional dependency first
Dependency in a table is when the columns in the table are depending on a particular column like the primary key of a table.
Partial functional dependency is when a particular column is not full depending on all the primary key columns of the table (in this case our primary key is more than one column).
To implement the second normal form 2NF we will start by creating new tables one for the student information, instructor information, and another table for the student grade information.

 


--Creating the Tables for the Second Normal Form 2NF
CREATE TABLE Student_Info_2NF (
                               Std_ID VARCHAR(50) NOT NULL, 
                               Std_Name VARCHAR(50), 
                               Std_address VARCHAR(50),
                               Std_Major VARCHAR(50),
                               PRIMARY KEY(Std_ID));

CREATE TABLE Instructor_Info_2NF(
                                 CID VARCHAR NOT NULL, 
                                 CTitle VARCHAR(50), 
                                 Iname VARCHAR(50), 
                                 Ilocation VARCHAR(100),
                                 PRIMARY KEY(CID));

CREATE TABLE Grade_Info_2NF(
                            Std_ID VARCHAR(50) NOT NULL,
                            CID VARCHAR NOT NULL,
                            Grade CHAR(5),
                            PRIMARY KEY(Std_ID, CID),
                            FOREIGN KEY(Std_ID) REFERENCES Student_Info_2NF(Std_ID) ,
                            FOREIGN KEY(CID) REFERENCES Instructor_Info_2NF(CID));


students Information table




Instructor information table



students grade information table


Next, we will be moving on to the 3NF

Third Normal Form 3NF

 

For Relation to be in 3NF the Relation First must be in 1NF and 2NF

  • They should be no transitive dependencies

Well, transitive dependency simply means that all other columns in the table should only depend on the PRIMARY KEY Column only and primary key alone.

Based on the tables we currently have,
there is only one table that is showing an issue which is the instructor information table because the Primary key here is the course id (CID) and the instructor location can be determined by the CID and it can also be determined by the instructor's name Iname column, so to solve this problem we are going to create a new table for the Instructor information and another table with the courses information.

here is the code to create the table and insert the data into the table from the Instructor information table above.

CREATE TABLE Instructor_Info_3NF (
                                 Iname VARCHAR(50) NOT NULL,
                                 Ilocation VARCHAR(100),
                                 PRIMARY KEY (Iname));

CREATE TABLE Courses_Info_3NF(
                             CID VARCHAR NOT NULL, 
                             CTitle VARCHAR(50), 
                             Iname VARCHAR(50), 
                             PRIMARY KEY(CID)
                             FOREIGN KEY(Iname) REFERENCES Instructor_Info_3NF(Iname));


--Inserting into the Instructor_Info_3NF
INSERT INTO Instructor_Info_3NF(Iname, Ilocation)
SELECT Iname, Ilocation 
FROM Instructor_Info_2NF;

--Inserting into the Courses_Info_3NF
INSERT INTO Courses_Info_3NF(CID, CTitle, Iname )
SELECT CID, CTitle, Iname 
FROM Instructor_Info_2NF;


Here are the new tables


Instructor information table 3NF



Courses information table 3NF



We are going to stop on the 3NF

 

Conclusion
We have now completed the normalization process, In general, most Databases are in 3NF, in this post you've understood what normalization is and also learned how to implement normalization using SQL.

 

Hope this blog helps you understand Database normalization. If you enjoy my blog please don't forget to like or comment on the article. It will encourage me to write more articles.



 


  • Alsadiq Abubakar
  • Mar, 27 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.