Home | Previous Page | Next Page   Basics of Database Design and Implementation > Building a Relational Data Model > Translating E-R Data Objects into Relational Constructs >

Determining Keys for Tables

The columns of a table are either key columns or descriptor columns. A key column is one that uniquely identifies a particular row in the table. For example, a social security number is unique for each employee. A descriptor column specifies the nonunique characteristics of a particular row in the table. For example, two employees can have the same first name, Sue. The first name Sue is a nonunique characteristic of an employee. The main types of keys in a table are primary keys and foreign keys.

You designate primary and foreign keys when you create your tables. Primary and foreign keys are used to relate tables physically. Your next task is to specify a primary key for each table. That is, you must identify some quantifiable characteristic of the table that distinguishes each row from every other row.

Primary Keys

The primary key of a table is the column whose values are different in every row. Because they are different, they make each row unique. If no one such column exists, the primary key is a composite of two or more columns whose values, taken together, are different in every row.

Every table in the model must have a primary key. This rule follows automatically from the rule that all rows must be unique. If necessary, the primary key is composed of all the columns taken together.

For efficiency, the primary key should be a numeric data type (INT or SMALLINT), SERIAL or SERIAL8 data type, or a short character string (as used for codes). It is recommended that you do not use long character strings as primary keys.

Null values are never allowed in a primary-key column. Null values are not comparable; that is, they cannot be said to be alike or different. Hence, they cannot make a row unique from other rows. If a column permits null values, it cannot be part of a primary key.

Some entities have ready-made primary keys such as catalog codes or identity numbers, which are defined outside the model. Sometimes more than one column or group of columns can be used as the primary key. All columns or groups that qualify to be primary keys are called candidate keys. All candidate keys are worth noting because their property of uniqueness makes them predictable in a SELECT operation.

Composite Keys

Some entities lack features that are reliably unique. Different people can have identical names; different books can have identical titles. You can usually find a composite of attributes that work as a primary key. For example, people rarely have identical names and identical addresses, and different books rarely have identical titles, authors, and publication dates.

System-Assigned Keys

A system-assigned primary key is usually preferable to a composite key. A system-assigned key is a number or code that is attached to each instance of an entity when the entity is first entered into the database. The easiest system-assigned keys to implement are serial numbers because the database server can generate them automatically. Informix database servers offer the SERIAL and SERIAL8 data types for serial numbers. However, the people who use the database might not like a plain numeric code. Other codes can be based on actual data; for example, an employee identification code could be based on a person's initials combined with the digits of the date that they were hired. In the telephone directory example, a system-assigned primary key is used for the name table.

Foreign Keys (Join Columns)

A foreign key is a column or group of columns in one table that contains values that match the primary key in another table. Foreign keys are used to join tables. Figure 16 shows the primary and foreign keys of the customer and order tables from the demonstration database.

Figure 16. Primary and Foreign Keys in the Customer-Order Relationships
begin figure description - Portions of two database tables are shown graphically as collections of rectanglular cells like a table or a matrix. Some of the columns are labeled with the names of the represented column in the datbase table. For the table named "customer" only the customer_num column is labled. The customer_num column is shaded and marked as the primary key of the table. For the table named "orders" two columns are labeled: order_num, and customer_num. The customer_num column is shaded and marked as the foreign key. A grey arrow points from one of the cells in the customer_num column of the orders table to one of the cells in the customer_num column in the customer table. This shows that the value in foreign key column refers to an existing value in the primary key column. - end figure description

Tip:
For ease in maintaining and using your tables, it is important to choose names for the primary and foreign keys so that the relationship is readily apparent. In Figure 16, both the primary and foreign key columns have the same name, customer_num. Alternatively, you might name the columns in Figure 16 customer_custnum and orders_custnum, so that each column has a distinct name.

Foreign keys are noted wherever they appear in the model because their presence can restrict your ability to delete rows from tables. Before you can delete a row safely, either you must delete all rows that refer to it through foreign keys, or you must define the relationship with special syntax that allows you to delete rows from primary-key and foreign-key columns with a single delete command. The database server disallows deletes that violate referential integrity.

To preserve referential integrity, delete all foreign-key rows before you delete the primary key to which they refer. If you impose referential constraints on your database, the database server does not permit you to delete primary keys with matching foreign keys. It also does not permit you to add a foreign-key value that does not reference an existing primary-key value. For more information about referential integrity, see the IBM Informix: Guide to SQL Tutorial.

Adding Keys to the Telephone Directory Diagram

Figure 17 shows the initial choices of primary and foreign keys. This diagram reflects some important decisions.

For the name table, the primary key rec_num is chosen. The data type for rec_num is SERIAL. The values for rec_num are system generated. If you look at the other columns (or attributes) in the name table, you see that the data types that are associated with the columns are mostly character-based. None of these columns alone is a good candidate for a primary key. If you combine elements of the table into a composite key, you create a cumbersome key. The SERIAL data type gives you a key that you can also use to join other tables to the name table.

The voice, fax, modem, and address tables are each joined to name through the rec_num key.

For the voice, fax, and modem tables the telephone numbers are used as primary keys. The address table contains a special column (id_num) that serves no other purpose than to act as a primary key. This is done because if id_num did not exist then all of the other columns would have to be used together as a composite key in order to guarantee that no duplicate primary keys existed. Using all of the columns as a primary key would be very inefficient and confusing.

Figure 17. Telephone Directory Diagram with Primary and Foreign Keys Added
begin figure description - There are five entities in the diagram. Each entity is represented by a rectangle. The name of each entity is above its rectangle. The attributes of each entity are inside its rectangle. The names of the five entities are: name, voice, fax, modem, and address. A legend in the lower right of the diagram shows that the letters PK after an attribute indicates that it is a primary key (or is part of a composite primary key) and the letter FK after an attribute indicates that it is a foreign key. The PK and FK are not part of the attribute name. The attributes of the "name" entity are: rec_num PK, lname, fname, bdate, anniv, email, child1, child2, and child3. The attributes of the "address" entity are: id_num PK, rec_num FK, street, city, state, and zipcode. The attributes of the "voice" entity are: vce_num PK, rec_num FK, and vce_type. The attributes of the "fax" entity are: fax_num PK, rec_num FK, oper_num, and oper_till. The attributes of the "modem" entity are: mdm_num PK, rec_num FK, b9600, b14400, and b28800. The "name" entity is connected to each of the other four entities by relationships. None of the other four entities are connected to each other. The relationship between "name" and "voice" has no special symbols on the end near "name". The end near "voice" has the symbol for "optional" and the symbol for "many". The relationship between "name" and "fax" has the symbol for "many" on the end near "name". The end near "fax" has the smbols for "optional" and "many". The relationship between "name" and "modem" has the symbol for "exactly one" on the end near "name". The end near "modem" has the symbols for "optional" and "many". The relationship between "name" and "address" has the symbols for "optional" and "many" on the end near "name". The end near "address" has the symbols for "optional" and "exactly one". - end figure description
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]