Wednesday, May 30, 2007

Normalization

Normalization is the process of efficiently organizing data in a database.

There are two goals of the normalization process:
1)Eliminating redundant data (for example, storing the same data in more than one table)
2)Ensuring data dependencies make sense (only storing related data in a table).

Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
________________________________________
1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes, and give each table a primary key.
2NF Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove it to a separate table.
3NF Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table.
BCNF Boyce-Codd Normal Form - If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.
4NF Isolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m relationships that are not directly related.
5NF Isolate Semantically Related Multiple Relationships - There may be practical constrains on information that justify separating logically related many-to-many relationships.
ONF Optimal Normal Form - a model limited to only simple (elemental) facts, as expressed in Object Role Model notation.
DKNF Domain-Key Normal Form - a model free from all modification anomalies.

________________________________________
1. Eliminate Repeating Groups
In the original member list, each member name is followed by any databases that the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information.
Moving the known databases into a seperate table helps a lot. Separating the repeating groups of databases from the member information results in first normal form. The MemberID in the database table matches the primary key in the member table, providing a foreign key for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members.


________________________________________
2. Eliminate Redundant Data
In the Database Table, the primary key is made up of the MemberID and the DatabaseID. This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends only on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the Database Table.
Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database! If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly.
Or suppose the last member listing a particular database leaves the group. His records will be removed from the system, and the database will not be stored anywhere! This is a delete anomaly. To avoid these problems, we need second normal form.
To achieve this, separate the attributes depending on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database" which gives the name for each DatabaseID, and "MemberDatabase" which lists the databases for each member.
Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table and change its name. The result will instantly be available throughout the application.


________________________________________
3. Eliminate Columns Not Dependent On Key
The Member table satisfies first normal form - it contains no repeating groups. It satisfies second normal form - since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table.
The motivation for this is the same for second normal form: we want to avoid update and delete anomalies. For example, suppose no members from the IBM were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM!

No comments: