INFORMIX
Informix Guide to SQL: Tutorial
Chapter 8: Building Your Data Model
Home Contents Index Master Index New Book

Identifying and Defining Your Principal Data Objects

The first step in building an entity-relationship data model is to identify and define your principal data objects. The principal data objects are entities, relationships, and attributes.

Discovering Entities

An entity is a principal data object that is of significant interest to the user. It is usually a person, place, thing, or event to be recorded in the database. If the data model were a language, entities would be its nouns. The stores7 database contains the following entities: customer, orders, items, stock, catalog, cust_calls, call_type, manufact, and state.

The first step in modeling is to choose the entities to record. Most of the entities that you choose will become tables in the model.

Choosing Possible Entities

If you have an idea for your database, you can probably list several entities immediately. However, if other people use the database, you should poll them for their understanding of what fundamental things the database should contain. Make a preliminary list of all the entities you can identify. Interview the potential users of the database for their opinions about what must be recorded in the database. Determine basic characteristics for each entity, such as "at least one address must be associated with a name." All the decisions you make in determining your entities become your business rules. "The Telephone-Directory Example" provides some of the business rules for the example in this chapter.

Later, when you normalize your data model, some of the entities can expand or become other data objects. See "Normalizing Your Data Model" for additional information.

Pruning Your List of Entities

When the list of entities seems complete, prune it by making sure that each entity has the following qualities:

    List only entities that are important to the users of the database and worth the trouble and expense of computer tabulation.

    List only types of things, not individual instances. For instance, symphony might be an entity, but Beethoven's Fifth would be an entity instance or entity occurrence.

    List only entities that exist independently, without needing something else to explain them. Anything you could call a trait, a feature, or a description is not an entity. For example, a part number is a feature of the fundamental entity called part. Also, do not list things that you can derive from other entities; for example, avoid any sum, average, or other quantity that you can calculate in a SELECT expression.

These choices are neither simple nor automatic. To discover the best choice of entities, you must think deeply about the nature of the data you want to store. Of course, that is exactly the point of making a formal data model. The following section describes this chapter's example in further detail.

The Telephone-Directory Example

Suppose that you create a database that computerizes a personal telephone directory. The database model must record the names, addresses, and telephone numbers of people and organizations that its user deals with for business and pleasure.

The first step is to define the entities, and the first thing you might do is look carefully at a page from a telephone directory to see what entities are there.

Figure 8-1
Partial Page from a Telephone Directory

The physical form of the existing data can be misleading. Do not let the layout of pages and entries in the telephone directory mislead you into trying to specify an entity that represents one entry in the book-some kind of alphabetized record with fields for name, number, and address. You want to model the data, not the medium.

At first glance, the entities that are recorded in a telephone directory include the following items:

Do these entities meet the earlier criteria? They are clearly significant to the model and are generic.

Are they fundamental? A good test is to ask if an entity can vary in number independently of any other entity. After you think about it, you realize that a telephone directory sometimes lists people who have no number or current address (people who move or change jobs). A telephone directory also can list both addresses and numbers that are used by more than one person. All three of these entities can vary in number independently; this fact strongly suggests that they are fundamental, not dependent.

Are they unitary? Names can be split into personal names and corporate names. After thinking about it, you decide that all names should have the same features in this model; that is, you do not plan to record different information about a company than you would about a person. Likewise, you decide only one kind of address exists; no need exists to treat home addresses differently from business ones.

However, you also realize that more than one kind of telephone number exists. Voice numbers are answered by a person, fax numbers connect to a fax machine, and modem numbers connect to a computer. You decide that you want to record different information about each kind of number, so these three are different entities.

For the personal telephone-directory example, you decide that you want to keep track of the following entities:

Diagramming Your Entities

A section in this chapter will teach you how to use the entity-relationship diagrams. For now, create a separate, rectangular box for each entity in the telephone-directory example. You will learn how to put the entities together with relationships in "Diagramming Your Data Objects".

Figure 8-2
Entities in the Personal Telephone- Directory Example

Defining the Relationships

After you choose your entities, you need to consider the relationships between them. Relationships are not always obvious, but all the ones worth recording must be found. The only way to ensure that all the relationships are found is to list all possible relationships exhaustively. Consider every pair of entities A and B and ask, "What is the relationship between an A and a B?"

A relationship is an association between two entities. Usually, a verb or preposition that connects two entities implies a relationship. A relationship between entities is described in terms of connectivity, existence dependency, and cardinality.

Connectivity

Connectivity refers to the number of entity instances. An entity instance is a particular occurrence of an entity. The three types of connectivity are one-to-one (written 1:1), one-to-many (written 1:n), and many-to-many (written m:n) as Figure 8-3 shows.

Figure 8-3
Connectivity in Relationships

For example, in the telephone-directory example, an address can be associated with more than one name. The connectivity for the relationship between the name and address entities is one-to-many (1:n).

Existence Dependency

Existence dependency describes whether an entity in a relationship is optional or mandatory. Analyze your business rules to identify whether an entity must exist in a relationship. For example, your business rules might dictate that an address must be associated with a name. Such an association makes the existence dependency for the relationship between the name and address entities mandatory. An example of an optional existence dependency could be a business rule that says a person might or might not have children.

Cardinality

Cardinality places a constraint on the number of times an entity can appear in a relationship. The cardinality of a 1:1 relationship is always one. But the cardinality of a 1:n relationship is open; n could be any number. If you need to place an upper limit on n, you do it by specifying a cardinality for the relationship. For example, in a store sale example, you could limit the number of sale items that a customer can purchase at one time. You usually place cardinality constraints through your application program or through stored procedures.

For additional information about cardinality, see any entity-relationship data-modeling text. See the "Summary" for references to two data-modeling books.

Discovering the Relationships

A compact way to discover the relationships is to prepare a matrix that names all the entities on the rows and again on the columns. The matrix in Figure 8-4 reflects the entities for the personal telephone directory.

Figure 8-4
A Matrix That Reflects the Entities for a Personal Telephone Directory

You can ignore the lower triangle of the matrix, which is shaded. You must consider the diagonal cells; that is, you must ask the question "What is the relationship between an A and another A?" In this model, the answer is always none. No relationship exists between a name and a name or an address and another address, at least none that is worth recording in this model. When a relationship exists between an A and another A, you have found a recursive relationship. (See "Resolving Other Special Relationships".)

For all cells for which the answer is clearly none, write none in the matrix. Figure 8-5 shows the current matrix.

Figure 8-5
A Matrix with Initial Relationships Included

Although no entities relate to themselves in this model, this is not always true in other models. A typical example is an employee who is the manager of another employee. Another example occurs in manufacturing, when a part entity is a component of another part.

In the remaining cells, you write the connectivity relationship that exists between the entity on the row and the entity on the column. The following kinds of relationships are possible:

One-to-many relationships are the most common. The telephone-directory model examples shows one-to-many and many-to-many relationships.

As Figure 8-5 shows, the first unfilled cell represents the relationship between names and addresses. What connectivity lies between these entities? You might ask yourself, "How many names can be associated with an address?" You decide that a name can have zero or one address but no more than one. You write 0-1 opposite name and below address, as Figure 8-6 shows.

Figure 8-6
Relationship Between Name and Address

Ask yourself how many addresses can be associated with a name. You decide that an address can be associated with more than one name. For example, you can know several people at one company or more than two people who live at the same address.

Can an address be associated with zero names? That is, should it be possible for an address to exist when no names use it? You decide that yes, it can. Below address and opposite name, you write 0-n, as Figure 8-7 shows.

Figure 8-7
Relationship Between Address and Name

If you decide that an address cannot exist unless it is associated with at least one name, you write 1-n instead of 0-n.

When the cardinality of a relationship is limited on either side to 1, it is a 1:n relationship. In this case, the relationship between names and addresses is a 1:n relationship.

Now consider the next cell, the relationship between a name and a voice number. How many voice numbers can a name be associated with, one or more than one? Glancing at your telephone directory, you see that you have often noted more than one telephone number for a person. For a busy salesperson you have a home number, an office number, a paging number, and a car phone number. But you might also have names without associated numbers. You write 0-n opposite name and below number (voice), as Figure 8-8 shows.

Figure 8-8
Relationship Between Name and Number

What is the other side of this relationship? How many names can be associated with a voice number? You decide that only one name can be associated with a voice number. Can a number be associated with zero names? No, you decide there is no point to recording a number unless someone uses it. You write 1 under number (voice) and opposite name, as Figure 8-9 shows.

Figure 8-9
Relationship Between Number and Name

Fill out the rest of the matrix in the same fashion, using the following decisions:

Figure 8-10 shows a completed matrix.

Figure 8-10
A Completed Matrix for a Telephone Directory

The matrix also reflects the following decisions:

You might disagree with some of these decisions (for example, that a relationship between voice numbers and modem numbers is not supported). For the sake of this example, these are our business rules.

Diagramming Your Relationships

For now, save the matrix that you created in this section. You will learn how to create an entity-relationship diagram in "Diagramming Your Data Objects".

Identifying Attributes

Entities contain attributes, which are characteristics or modifiers, qualities, amounts, or features. An attribute is a fact or nondecomposable piece of information about an entity. Later, when you represent an entity as a table, its attributes are added to the model as new columns.

Before you can identify your attributes, you must identify your entities. After you determine your entities, ask yourself, "What characteristics do I need to know about each entity?" For example, in an address entity, you probably need information about street, city, state, and zipcode. Each of these characteristics of the address entity becomes an attribute.

Selecting Attributes for Your Entities

In selecting attributes, choose ones that have the following qualities:

    Include only attributes that are useful to the database users.

    An attribute that can be derived from existing attributes (for instance, through an expression in a SELECT statement) should not be made part of the model. The presence of derived data greatly complicates the maintenance of a database.

    An attribute can contain only single values, never lists or repeating groups. Composite values must be broken into separate attributes.

    For example, you would want to enter only date values in a birthday attribute, not names or telephone numbers.

The rules for defining attributes are the same as those for defining columns. For more information about defining columns, see "Placing Constraints on Columns".

The following attributes are added to the telephone-directory example to produce the diagram shown in Figure 8-15:

Listing Your Attributes

For now, simply list the attributes for the telephone-directory example with the entities with which you think they belong. Your list should look something like Figure 8-11.

Figure 8-11
Attributes for the Telephone-Directory Example

About Entity Occurrences

An additional data object that you need to know about is the entity occurrence. Each row in a table represents a specific, single occurrence of the entity. For example, if customer is an entity, a customer table represents the idea of customer; in it, each row represents one specific customer, such as Sue Smith. Keep in mind that entities will become tables, attributes will become columns, and rows will become entity occurrences.




Informix Guide to SQL: Tutorial, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.