INFORMIX
Informix Guide to SQL: Tutorial
Chapter 8: Building Your Data Model
Home Contents Index Master Index New Book

Normalizing Your Data Model

The telephone-directory example in this chapter appears to be a good model. You could implement it at this point into a database, but this example might present problems later on with application development and data-manipulation operations. Normalization is a formal approach to applying a set of rules used in associating attributes with entities.

Normalizing your data model can do the following things:

Normalization consists of several steps to reduce the entities to more desirable physical properties. These steps are called normalization rules, also referred to as normal forms. Several normal forms exist; this chapter discusses the first three normal forms. Each normal form constrains the data to be more organized than the last form. Because of this, you must achieve first normal form before you can achieve second normal form, and you must achieve second normal form before you can achieve third normal form.

First Normal Form

An entity is in first normal form if it contains no repeating groups. In relational terms, a table is in first normal form if it contains no repeating columns. Repeating columns make your data less flexible, waste disk space, and make it more difficult to search for data. In the telephone-directory example, it appears that the name table contains repeating columns, child1, child2, and child3, as Figure 8-19 shows.

Figure 8-19
Name Entity Before Normalization

You can see some problems in the current table. The table always reserves space on the disk for three child records, whether the person has children or not. The maximum number of children that you can record is three, but some of your acquaintances might have four or more children. To look for a particular child, you would have to search all three columns in every row.

To eliminate the repeating columns and bring the table to first normal form, separate the table into two tables as Figure 8-20 shows. Put the repeating columns into one of the tables. The association between the two tables is established with a primary-key and foreign-key combination. Because a child cannot exist without an association in the name table, you can reference the name table with a foreign key, rec_num.

Figure 8-20
First Normal Form Reached for Name Entity

Now check Figure 8-17 for groups that are not in first normal form. The name-modem relationship is not at the first normal form because the columns b9600, b14400, and b28800 are considered repeating columns. Add a new attribute called b_type to the modem table to contain occurrences of b9600, b14400, and b28800. Figure 8-21 shows the data model normalized through first normal form.

Figure 8-21
The Data Model of a Personal Telephone Directory

Second Normal Form

An entity is in the second normal form if it is in the first normal form, and all its attributes depend on the whole (primary) key. In relational terms, every column in a table must be functionally dependent on the whole primary key of that table. Functional dependency indicates that a link exists between the values in two different columns.

If the value of an attribute depends on a column, the value of the attribute must change if the value in the column changes. The attribute is a function of the column. The following explanations make this more specific:

If you do not convert your model to the second normal form, you risk data redundancy and difficulty in changing data. To convert first-normal-form tables to second-normal-form tables, remove columns that are not dependent on the primary key.

Third Normal Form

An entity is in the third normal form if it is in the second normal form, and all its attributes are not transitively dependent on the primary key. Transitive dependence means that descriptor key attributes depend not only on the whole primary key but also on other descriptor key attributes that, in turn, depend on the primary key. In SQL terms, the third normal form means that no column within a table is dependent on a descriptor column that, in turn, depends on the primary key.

To convert to the third normal form, remove attributes that depend on other descriptor key attributes.

Summary of Normalization Rules

The following normal forms are discussed in this section:

When you follow these rules, the tables of the model are in the third normal form, according to E. F. Codd, the inventor of relational databases. When tables are not in the third normal form, either redundant data exists in the model, or problems exist when you attempt to update the tables.

If you cannot find a place for an attribute that observes these rules, you have probably made one of the following errors:




Informix Guide to SQL: Tutorial, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.