Introduction
In my last article, Discussion on Normalization and Database Design Concepts, I discussed about the normalization concepts. Here, I will take one table in SQL Server and implement different normal forms. As we move up the higher normal forms, we end up on more number of tables.
Implementation of Normal Forms
Let us start with a table, named StudentProject. I added few records in the table as per the defined structure. I will verify if the table satisfies different normal forms. I will modify the table structure or will split the data in multiple tables to fulfill the requirements for each normal form.
IF OBJECT_ID('StudentProject','U') IS NOT NULL DROP TABLE dbo.StudentProject CREATE TABLE dbo.StudentProject ( StudentId INT, ProjectId INT, GuideId INT, HrsWorked INT, Sname VARCHAR(50), --Fname, Lname Saddr VARCHAR(100), -- Saddr1, Saddr2 Pname VARCHAR(100), DeptId INT, Dname VARCHAR(50) ) INSERT INTO dbo.StudentProject (StudentId,ProjectId,GuideId,HrsWorked,Sname,Saddr,Pname,DeptId,Dname) VALUES (101,201,301,40,'xxx zzz','house1,house2','project1',401,'dept1'), (102,201,301,40,'aaa ccc','house3','project1',401,'dept1'), (103,204,304,40,'ddd','house4,house5','project4',402,'dept2'), (101,203,303,40,'xxx zzz','house1,house2','project3',401,'dept1') SELECT StudentId,ProjectId,GuideId,HrsWorked,Sname,Saddr,Pname,DeptId,Dname FROM dbo.StudentProject
First Normal Form (1NF)
The domain of an attribute must include only atomic (simple, indivisible) values and value of any attribute in a tuple must be a single value from the domain of that attribute.
There is a column, named Sname, in StudentProject that is for storing the student name. I want to save last name and/or first name of the student. I entered the last name and first name values with a space separator. To maintain last name and first name separately, I should create two different columns.
The saddress column contains the student address. A student may have multiple addresses. Here, I entered two addresses for a student with a comma separator. To maintain two addresses, two different columns should be created.
I create a new table named StudentProject1NF where I keep two different columns for first name and last name of the student. Also, there are two different columns for address 1 and address 2. That way, each column is having single indivisible value and the table StudentProject1NF is in first normal form.
I will transfer data from StudentProject to StudentProject1NF and will drop the StudentProject table later.
--1NF --StudentProject table structure is modified to create StudentProject1NF IF OBJECT_ID('StudentProject1NF','U') IS NOT NULL DROP TABLE dbo.StudentProject1NF CREATE TABLE dbo.StudentProject1NF ( StudentId INT, ProjectId INT, GuideId INT, HrsWorked INT, Fname VARCHAR(50), Lname VARCHAR(50), Saddr1 VARCHAR(100), Saddr2 VARCHAR(100), Pname VARCHAR(100), DeptId INT, Dname VARCHAR(50), ) --data transfer INSERT INTO dbo.StudentProject1NF (StudentId,ProjectId,GuideId,HrsWorked,Lname,Fname,Saddr1,Saddr2,Pname,DeptId,Dname) SELECT StudentId, ProjectId, GuideId, HrsWorked, PARSENAME(REPLACE (Sname,' ','.'),1) AS Lname, PARSENAME(REPLACE (Sname,' ','.'),2) AS Fname, PARSENAME(REPLACE (Saddr,',','.'),1) AS Saddr1, PARSENAME(REPLACE (Saddr,',','.'),2) AS Saddr2, Pname, DeptId, Dname FROM dbo.StudentProject SELECT StudentId,ProjectId,GuideId,HrsWorked,Lname,Fname,Saddr1,Saddr2,Pname,DeptId,Dname FROM dbo.StudentProject1NF DROP TABLE dbo.StudentProject
Second Normal Form (2NF)
A relation schema or table is in second normal form (2NF) if every non-prime attribute is not partially dependent on any key of the table.
Let us start with StudentProject1NF table. For this table, (StudentId, ProjectId) is the candidate key of the table. Now, we will check on the functional dependencies existing on the table.
- FD1: StudentId, ProjectId --> HrsWorked
- FD2: StudentId --> Lname, Fname, Saddr1, Saddr2
- FD3: ProjectId --> Pname
- FD4: StudentId, ProjectId --> DeptId
- FD5: DeptId --> Dname
- FD6: GuideId --> ProjectId
FD2 and FD3 violate the 2NF condition. In FD2, Lname, Fname, Saddr1, and Saddr2 are dependent on StudentId, i.e. these columns are partially dependent on the candidate key (StudentId, ProjectId) of the table. Also, in FD3, Pname is dependent on ProjectId, i.e. Pname is partially dependent on the candidate key.
To achieve 2NF, I will create three new tables: StudentProject2NF, Student2NF, and Project2NF. I will transfer data from StudentProject1NF to these three new tables appropriately and later will drop the StudentProject1NF table.
--2NF --StudentProject1NF table is divided into three tables -- StudentProject2NF, Student2NF, Project2NF --table 1 IF OBJECT_ID('StudentProject2NF','U') IS NOT NULL DROP TABLE dbo.StudentProject2NF CREATE TABLE dbo.StudentProject2NF ( StudentId INT, ProjectId INT, GuideId INT, HrsWorked INT, DeptId INT, Dname VARCHAR(50), ) --table 2 IF OBJECT_ID('Student2NF','U') IS NOT NULL DROP TABLE dbo.Student2NF CREATE TABLE dbo.Student2NF ( StudentId INT, Sname VARCHAR(50), Fname VARCHAR(50), Lname VARCHAR(50), Saddr1 VARCHAR(100), Saddr2 VARCHAR(100), ) --table 3 IF OBJECT_ID('Project2NF','U') IS NOT NULL DROP TABLE dbo.Project2NF CREATE TABLE dbo.Project2NF ( ProjectId INT, Pname VARCHAR(100), ) --data transfer INSERT INTO dbo.StudentProject2NF (StudentId,ProjectId,GuideId,HrsWorked,DeptId,Dname) SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId,Dname FROM dbo.StudentProject1NF INSERT INTO dbo.Student2NF (StudentId,Fname,Lname,Saddr1,Saddr2) SELECT DISTINCT StudentId,Fname,Lname,Saddr1,Saddr2 FROM dbo.StudentProject1NF INSERT INTO dbo.Project2NF (ProjectId, Pname) SELECT DISTINCT ProjectId, Pname FROM dbo.StudentProject1NF SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId,Dname FROM dbo.StudentProject2NF SELECT StudentId,Fname,Lname,Saddr1,Saddr2 FROM dbo.Student2NF SELECT ProjectId, Pname FROM dbo.Project2NF DROP TABLE dbo.StudentProject1NF
Let us check the functional dependencies in the three new tables.
StudentProject2NF
(StudentId, ProjectId) is the candidate key of the table. The functional dependencies are:
- FD1: StudentId, ProjectId --> HrsWorked
- FD2: StudentId, ProjectId --> DeptId
- FD3: DeptId --> Dname
- FD4: GuideId --> ProjectId
All these functional dependencies follow 2NF. So, StudentProject2NF table is in 2NF.
Student2NF
StudentId is the candidate key of the table. There is one functional dependency:
- FD1: StudentId --> Lname, Fname, Saddr1, Saddr2
FD1 follows 2NF. So, the table Student2NF is in 2NF.
Project2NF
ProjectId is the candidate key of the table. Functional dependency:
- FD1: ProjectId --> Pname
FD1 follows 2NF. So, the table Project2NF is in 2NF.
All the new three tables are in 2NF.
Third Normal Form (3NF)
A relation schema or table is in Third normal form (3NF), when a nontrivial functional dependency X --> A holds in the table, then either X is a super key of the table or A is a prime attribute of the table.
Let us consider the StudentProject2NF table first. (StudentId, ProjectId) is the candidate key. The functional dependencies are:
- FD1: StudentId, ProjectId --> HrsWorked
- FD2: StudentId, ProjectId --> DeptId
- FD3: DeptId --> Dname
- FD4: GuideId --> ProjectId
FD3 violates 3NF condition. Because, neither DeptId is the super key of the table nor Dname is any prime attribute.
I will create two new tables -- StudentProject3NF, Department3NF and will transfer data from StudentProject2NF to these new tables and will drop StudentProject2NF later. Let us now consider Student2NF and Project2NF.
Student2NF
StudentId is the candidate key of the table. Functional dependency:
- FD1: StudentId --> Lname, Fname, Saddr1, Saddr2
FD1 follows 3NF. So, the table Student2NF is in 3NF.
Project2NF
ProjectId is the candidate key of the table. Functional dependency:
- FD1: ProjectId --> Pname
FD1 follows 3NF. So, the table Project2NF is in 3NF.
No change is required for Student2NF and Project2NF.
--3NF -- StudentProject2NF table is further divided in two tables -- StudentProject3NF, Department3NF IF OBJECT_ID('StudentProject3NF','U') IS NOT NULL DROP TABLE dbo.StudentProject3NF CREATE TABLE dbo.StudentProject3NF ( StudentId INT, ProjectId INT, GuideId INT, HrsWorked INT, DeptId INT ) IF OBJECT_ID('Department3NF','U') IS NOT NULL DROP TABLE dbo.Department3NF CREATE TABLE dbo.Department3NF ( DeptId INT, Dname VARCHAR(50), ) --data transfer INSERT INTO dbo.StudentProject3NF (StudentId,ProjectId,GuideId,HrsWorked,DeptId) SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId FROM dbo.StudentProject2NF INSERT INTO dbo.Department3NF (DeptId,Dname) SELECT DISTINCT DeptId,Dname FROM dbo.StudentProject2NF SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId FROM dbo.StudentProject3NF SELECT DeptId,Dname FROM dbo.Department3NF DROP TABLE dbo.StudentProject2NF
Let us check the functional dependencies in the two new tables.
StudentProject3NF
(StudentId, ProjectId) is the candidate key. The functional dependencies are:
- FD1: StudentId, ProjectId --> HrsWorked
- FD2: StudentId, ProjectId --> DeptId
- FD3: GuideId --> ProjectId
All the functional dependencies follow 3NF. So, StudentProject3NF table is in 3NF.
Department3NF
DeptId is the candidate key. The functional dependency is:
- FD1: DeptId --> DName
FD1 follows 3NF. So, Department3NF table is in 3NF.
At this stage, we have four tables: StudentProject3NF , Department3NF , Student2NF and Project2NF. All tables are in 3NF.
Boyce Codd Normal Form (BCNF)
A relation schema or table is in Boyce Codd normal form (BCNF), when a nontrivial functional dependency X --> A holds in the table, then X is a super key of the table.
Let us consider the table StudentProject3NF first. (StudentId, ProjectId) is the candidate key. The functional dependencies are:
- FD1: StudentId, ProjectId --> HrsWorked
- FD2: StudentId, ProjectId --> DeptId
- FD3: GuideId --> ProjectId
FD1 and FD2 satisfy BCNF. But, FD3 violates BCNF as GuideId is not a super key. To maintain BCNF, the table StudentProject3NF needs to be decomposed.
I will create two new tables -- StudentProjectBCNF, ProjectGuideBCNF. I will transfer data from StudentProject3NF to the new tables and later will drop StudentProject3NF.
Let us consider the other three existing tables to check if they are in BCNF or not.
Department3NF
DeptId is the candidate key. The functional dependency is:
- FD1: DeptId --> DName
FD1 follows BCNF. So, Department3NF table is in BCNF.
Student2NF
StudentId is the candidate key of the table. Functional dependency:
- FD1: StudentId --> Lname, Fname, Saddr1, Saddr2
FD1 follows BCNF. So, the table Student2NF is in BCNF.
Project2NF
ProjectId is the candidate key of the table. Functional dependency:
- FD1: ProjectId --> Pname
FD1 follows BCNF. So, the table Project2NF is in BCNF.
No change is required for Department3NF , Student2NF and Project2NF.
--BCNF --StudentProject3NF table is further divided in two tables -- StudentProjectBCNF, ProjectGuideBCNF IF OBJECT_ID('StudentProjectBCNF','U') IS NOT NULL DROP TABLE dbo.StudentProjectBCNF CREATE TABLE dbo.StudentProjectBCNF ( StudentId INT, ProjectId INT, HrsWorked INT, DeptId INT ) IF OBJECT_ID('ProjectGuideBCNF','U') IS NOT NULL DROP TABLE dbo.ProjectGuideBCNF CREATE TABLE dbo.ProjectGuideBCNF ( ProjectId INT, GuideId INT ) --data transfer INSERT INTO dbo.StudentProjectBCNF (StudentId,ProjectId,HrsWorked,DeptId) SELECT StudentId,ProjectId,HrsWorked,DeptId FROM dbo.StudentProject3NF INSERT INTO dbo.ProjectGuideBCNF (ProjectId,GuideId) SELECT DISTINCT ProjectId,GuideId FROM dbo.StudentProject3NF SELECT StudentId,ProjectId,HrsWorked,DeptId FROM dbo.StudentProjectBCNF SELECT ProjectId,GuideId FROM dbo.ProjectGuideBCNF DROP TABLE dbo.StudentProject3NF
Let us check the functional dependencies in the two new tables.
StudentProjectBCNF
(StudentId, ProjectId) is the candidate key. The functional dependencies are:
- FD1: StudentId, ProjectId --> HrsWorked
- FD2: StudentId, ProjectId --> DeptId
All the functional dependencies follow BCNF. So, StudentProjectBCNF table is in BCNF.
ProjectGuideBCNF
GuideId is the candidate key. The functional dependency is:
- FD1: GuideId --> ProjectId
FD1 follows BCNF. So, ProjectGuideBCNF table is in BCNF.
At this stage, we have five tables: StudentProjectBCNF , ProjectGuideBCNF, Department3NF , Student2NF and Project2NF. All tables are in BCNF.
Conclusion
In this article, we started with a single table and created five tables to fulfill the criteria for the normal forms up to BCNF. The step-by-step process is shown for checking if a table is in a particular normal form. If not, the table data is decomposed in multiple tables to fulfill the requirement of that normal form. For ease of understanding, I added suffix 1NF, 2NF, 3NF, BCNF with the table names.
Users may use any acceptable naming conventions for their tables. Also, the column names can be set as per the choice of the individual user. When we are creating the new tables, it is important to transfer the data from the existing tables correctly. There should not be any data loss or redundant data. At the end of data transfer from existing table to the new decomposed tables, the old table was dropped to clean the database.