Informix Guide to SQL: Tutorial

Informix Guide to SQL: Tutorial
Chapter 8: Building Your Data Model

Home Contents Index Master Index New Book

Translating E-R Data Objects into Relational Constructs

All the data objects you have learned about so far, entities, relationships, attributes, and entity occurrences, will be translated into SQL tables, joins between tables, columns, and rows. The tables, columns, and rows of your database must fit the rules found in "Rules for Defining Tables, Rows, and Columns."

Your data objects should fit these rules before you normalize your data objects. To normalize your data objects, analyze the dependencies between your entities, relationships, and attributes. Normalization is discussed in "Normalizing Your Data Model".

After you normalize the data model, you can use SQL statements to create a database that is based on your data model. Chapter 9, "Implementing Your Data Model," describes how to create your database and provides the database schema for the telephone-directory example.

Each entity that you choose is represented as a table in the model. The table stands for the entity as an abstract concept, and each row represents a specific, individual occurrence of the entity. In addition, each attribute of an entity is represented by a column in the table.

Universal Server is an object-relational database server. Support for extensible data types and inheritance define the object-oriented capabilities of Universal Server. (For information about extensible data types, see "Extended Data Types". For information about inheritance, see "What Is Inheritance?".) Support for SQL and many concepts that are fundamental to relational data-model methods, including the E-R data model, define the relational capabilities of Universal Server.

You can apply the following rules, which represent the relational aspect of Universal Server, to help you design your data model. Following these rules will save you time and effort when you normalize your model.

Rules for Defining Tables, Rows, and Columns

You are already familiar with the idea of a table that is composed of rows and columns. But you must respect the following rules when you define the tables of a formal data model:

Rows must stand alone.

After the database is implemented, you can tell the database server to store rows in a certain order for the sake of efficiency, but that order does not affect the model.

Rows must be unique.

In every row, some column must contain a unique value. If no single column has this property, the values of some group of columns taken as a whole must be different in every row.

Columns must stand alone.

The order of columns within a table has no meaning in the model. The model should still be correct even if the columns are rearranged.

After the database is implemented, programs and stored queries that use an asterisk to mean all columns are dependent on the final order of columns, but that order does not affect the model.

Each column must have a unique name.

If your previous experience is only with data organized as arrays or sequential files, these rules might seem unnatural. However, Universal Server requires that you use only tables, rows, and columns (that follow these rules) to represent all types of data. With a little practice, these rules become automatic.

Placing Constraints on Columns

When you define your table and columns with the CREATE TABLE statement, you constrain each column. These constraints specify whether you want the column to contain characters or numbers, the form that you want dates to use, and other constraints. The column-specific properties describe the constraints and identify the set of valid values that attributes can assume. The column-specific properties of a column can consist of the following items:

Data type (INTEGER, CHAR, DATE, and so on)
Format (for example, yy/mm/dd)
Range (for example, 1,000 to 5,400)
Meaning (for example, personnel number)
Allowable values (for example, only grades S or U)
Uniqueness
Null support
Default value
Referential constraints

You define the column-specific properties of columns when you create your tables. Defining column-specific properties and creating your tables and database are discussed in Chapter 9, "Implementing Your Data Model."

Determining Keys for Tables

The columns of a table are either key columns or descriptor columns. A key column is one that uniquely identifies a particular row in the table. For example, a social-security number is unique for each employee. A descriptor column specifies the nonunique characteristics of a particular row in the table. For example, two employees can have the same first name, Sue. The first name Sue is a nonunique characteristic of an employee. The main types of keys in a table are primary keys and foreign keys.

You designate primary and foreign keys when you create your tables. Primary and foreign keys are used to relate tables physically. Your next task is to specify a primary key for each table. That is, you must identify some quantifiable characteristic of the table that distinguishes each row from every other row.

Primary Keys

The primary key of a table is the column whose values are different in every row. Because they are different, they make each row unique. If no one such column exists, the primary key is a composite of two or more columns whose values, taken together, are different in every row.

Every table in the model must have a primary key. This rule follows automatically from the rule that all rows must be unique. If necessary, the primary key is composed of all the columns taken together.

The primary key should be a numeric data type (INT or SMALLINT), SERIAL data type, or a short character string (as used for codes). Informix recommends that you avoid using long character strings as primary keys.

Null values are never allowed in a primary-key column. Null values are not comparable; that is, they cannot be said to be alike or different. Hence, they cannot make a row unique from other rows. If a column permits null values, it cannot be part of a primary key.

Some entities have ready-made primary keys such as catalog codes or identity numbers, which are defined outside the model. These are
user-assigned keys.

Sometimes more than one column or group of columns can be used as the primary key. All columns or groups that qualify to be primary keys are called candidate keys. All candidate keys are worth noting because their property of uniqueness makes them predictable in a SELECT operation. When you select the column of a candidate key, you know the result does not contain any duplicate rows, therefore, the result of a SELECT operation can be a table in its own right, with the selected candidate key as its primary key.

Composite Keys

A composite key is used when the values of two or more columns are required to uniquely identify each row. Some entities lack features that are reliably unique. Different people can have identical names; different books can have identical titles. You can usually find a composite of attributes that work as a primary key. For example, people rarely have identical names and identical addresses, and different books rarely have identical titles, authors, and publication dates.

System-Assigned Keys

A system-assigned primary key is usually preferable to a composite key. A system-assigned key is a number or code that is attached to each instance of an entity when the entity is first entered into the database. The easiest system-assigned keys to implement are serial numbers because the database server can generate them automatically. Informix offers the SERIAL data type for serial numbers. However, the people who use the database might not like a plain numeric code. Other codes can be based on actual data; for example, an employee identification code could be based on a person's initials combined with the digits of the date that they were hired. In the telephone-directory example, a system-assigned primary key is used for the name table.

Foreign Keys (Join Columns)

A foreign key is simply a column or group of columns in one table that contains values that match the primary key in another table. Foreign keys are used to join tables; in fact, most of the join columns referred to earlier in this book are foreign-key columns. Figure 8-16 shows the primary and foreign keys of the customer and order tables from the stores7 database.


	Figure 8-16 Primary and Foreign Keys in the Customer-Order Relationships

Foreign keys are noted wherever they appear in the model because their presence can restrict your ability to delete rows from tables. Before you can delete a row safely, either you must delete all rows that refer to it through foreign keys, or you must define the relationship using special syntax that allows you to delete rows from primary-key and foreign-key columns with a single delete command. The database server disallows deletes that violate referential integrity.

You can always preserve referential integrity by deleting all foreign-key rows before you delete the primary key to which they refer. If you are imposing referential constraints on your database, the database server does not permit you to delete primary keys with matching foreign keys. It also does not permit you to add a foreign-key value that does not reference an existing primary-key value. Referential integrity is discussed in Chapter 4, "Modifying Data."

Adding Keys to the Telephone-Directory Diagram

The initial choices of primary and foreign keys are as Figure 8-17 shows. This diagram reflects some important decisions.

For the name table, the primary key rec_num is chosen. Note that the data type for rec_num is SERIAL. The values for rec_num are system generated. If you look at the other columns (or attributes) in the name table, you see that the data types that are associated with the columns are mostly character- based. None of these columns alone is a good candidate for a primary key. If you combine elements of the table into a composite key, you create an exceedingly cumbersome key. The SERIAL data type gives you a key that you can also use to join other tables to the name table.

For the voice, fax, and modem tables, the telephone numbers are shown as primary keys. These tables are joined to the name table through the rec_num key.

The address table also uses a system-generated primary key, id_num. The address table must have a primary key because the business rules state that an address can exist when no names use it. If the business rules prevent an address from existing unless a name is associated with it, then the address table could be joined to the name table with the foreign key rec_num only.

Figure 8-17
Telephone-Directory Diagram with Primary and Foreign Keys Added

Informix Guide to SQL: Tutorial, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.