Determining Keys for Tables

The columns of a table are either key columns or descriptor columns. A key column is one that uniquely identifies a particular row in the table. For example, a social security number is unique for each employee. A descriptor column specifies the nonunique characteristics of a particular row in the table. For example, two employees can have the same first name, Sue. The first name Sue is a nonunique characteristic of an employee. The main types of keys in a table are primary keys and foreign keys.

You designate primary and foreign keys when you create your tables. Primary and foreign keys are used to relate tables physically. Your next task is to specify a primary key for each table. That is, you must identify some quantifiable characteristic of the table that distinguishes each row from every other row.

Primary Keys

The primary key of a table is the column whose values are different in every row. Because they are different, they make each row unique. If no one such column exists, the primary key is a composite of two or more columns whose values, taken together, are different in every row.

Every table in the model must have a primary key. This rule follows automatically from the rule that all rows must be unique. If necessary, the primary key is composed of all the columns taken together.

For efficiency, the primary key should be a numeric data type (INT or SMALLINT), SERIAL or SERIAL8 data type, or a short character string (as used for codes). It is recommended that you do not use long character strings as primary keys.

Null values are never allowed in a primary-key column. Null values are not comparable; that is, they cannot be said to be alike or different. Hence, they cannot make a row unique from other rows. If a column permits null values, it cannot be part of a primary key.

Some entities have ready-made primary keys such as catalog codes or identity numbers, which are defined outside the model. Sometimes more than one column or group of columns can be used as the primary key. All columns or groups that qualify to be primary keys are called candidate keys. All candidate keys are worth noting because their property of uniqueness makes them predictable in a SELECT operation.

Composite Keys

Some entities lack features that are reliably unique. Different people can have identical names; different books can have identical titles. You can usually find a composite of attributes that work as a primary key. For example, people rarely have identical names and identical addresses, and different books rarely have identical titles, authors, and publication dates.

System-Assigned Keys

A system-assigned primary key is usually preferable to a composite key. A system-assigned key is a number or code that is attached to each instance of an entity when the entity is first entered into the database. The easiest system-assigned keys to implement are serial numbers because the database server can generate them automatically. Informix database servers offer the SERIAL and SERIAL8 data types for serial numbers. However, the people who use the database might not like a plain numeric code. Other codes can be based on actual data; for example, an employee identification code could be based on a person's initials combined with the digits of the date that they were hired. In the telephone directory example, a system-assigned primary key is used for the name table.

Foreign Keys (Join Columns)

A foreign key is a column or group of columns in one table that contains values that match the primary key in another table. Foreign keys are used to join tables. Figure 16 shows the primary and foreign keys of the customer and order tables from the demonstration database.

For ease in maintaining and using your tables, it is important to choose names for the primary and foreign keys so that the relationship is readily apparent. In Figure 16, both the primary and foreign key columns have the same name, customer_num. Alternatively, you might name the columns in Figure 16 customer_custnum and orders_custnum, so that each column has a distinct name.

Foreign keys are noted wherever they appear in the model because their presence can restrict your ability to delete rows from tables. Before you can delete a row safely, either you must delete all rows that refer to it through foreign keys, or you must define the relationship with special syntax that allows you to delete rows from primary-key and foreign-key columns with a single delete command. The database server disallows deletes that violate referential integrity.

To preserve referential integrity, delete all foreign-key rows before you delete the primary key to which they refer. If you impose referential constraints on your database, the database server does not permit you to delete primary keys with matching foreign keys. It also does not permit you to add a foreign-key value that does not reference an existing primary-key value. For more information about referential integrity, see the IBM Informix: Guide to SQL Tutorial.

Adding Keys to the Telephone Directory Diagram

Figure 17 shows the initial choices of primary and foreign keys. This diagram reflects some important decisions.

For the name table, the primary key rec_num is chosen. The data type for rec_num is SERIAL. The values for rec_num are system generated. If you look at the other columns (or attributes) in the name table, you see that the data types that are associated with the columns are mostly character-based. None of these columns alone is a good candidate for a primary key. If you combine elements of the table into a composite key, you create a cumbersome key. The SERIAL data type gives you a key that you can also use to join other tables to the name table.

The voice, fax, modem, and address tables are each joined to name through the rec_num key.

For the voice, fax, and modem tables the telephone numbers are used as primary keys. The address table contains a special column (id_num) that serves no other purpose than to act as a primary key. This is done because if id_num did not exist then all of the other columns would have to be used together as a composite key in order to guarantee that no duplicate primary keys existed. Using all of the columns as a primary key would be very inefficient and confusing.