|
Home | Contents | Index | Master Index | New Book |
Defining Column-Specific PropertiesTo complete the data model described in Chapter 8, "Building Your Data Model," you must define column-specific properties for each column. The column-specific properties describe the constraints and identify the set of valid values that attributes (or columns) can assume.The purpose of column-specific properties is to guard the semantic integrity of the data in the model; that is, to ensure that it reflects reality in a sensible way. The integrity of the data model is at risk if you can substitute a name for a telephone number or if you can enter a fraction where only integers are allowed. To define column-specific properties, first define the constraints that a data value must satisfy before it can be part of the column. Use the following constraints to specify column-specific properties: You can identify the primary and foreign keys in each table to place referential constraints on columns. For more information on primary and foreign keys, see Chapter 8, "Building Your Data Model."
Built-In Data TypesA built-in data type is a data type that the database server defines. INTEGER, CHAR, DATE, and DECIMAL are examples of built-in data types. To help you choose the appropriate data types for implementing your data model, this section provides a description of the built-in data types. For additional information on the built-in data types, see Chapter 3, "Environment Variables," in the Informix Guide to SQL: Reference.
Every column in a table must have a data type that is chosen from the built-in or extended types that the database server supports. The choice of data type is important for the following reasons:
Using Data Types in Referential ConstraintsAlmost all data type combinations must match when you are trying to pick columns for primary and foreign keys. For example, if you define a primary key as a CHAR data type, you must also define the foreign key as a CHAR data type. However, when you specify a SERIAL data type on a primary key in one table, you specify an INTEGER on the foreign key of the relationship. Similarly, when you specify a SERIAL8 data type on a primary key in one table, you specify an INT8 on the foreign key of the relationship.The only data type combinations that you can mix in a relationship are as follows:Figure 9-1 shows the decision tree that summarizes the choices among built-in data types. The choices are explained in the following sections.
Both types are stored as signed binary integers. INTEGER values have 32 bits and can represent whole numbers from -(231 -1) through 231-1; that is, from SMALLINT values have only 16 bits. They can represent whole numbers from -32,767 through 32,767. (The maximum negative number, -32,768, is reserved and cannot be used.) The INTEGER and SMALLINT data types have the following advantages:
The INT8 data type is stored as a signed binary integer, which uses 8 bytes per value. Although INT8 takes up twice the space as the INTEGER data type, INT8 has the advantage of a significantly larger range of data representation. INT8 can represent integers ranging from - (2 63 -1) through 2 63 -1; that is from - 9,223,372,036,854,775,807 through 9,223,372,036,854,775,807. (The maximum negative number, - 9,223,372,036,854,775,808, is reserved and cannot be used.)
The SERIAL data type can yield up to 231-1 positive integers. Consequently, the database server uses all the positive serial numbers by the time it inserts 231-1 rows in a table. For most users the exhaustion of the positive serial numbers is not a concern, however, because a single application would need to insert a row every second for 68 years, or 68 applications would need to insert a row every second for a year, to use all the positive serial numbers. However, if all the positive serial numbers were used, the database server would continue to generate new numbers. It would treat the next serial quantity as a signed integer. Because the database server uses only positive values, it would simply wrap around and start to generate integer values that begin with a 1. The SERIAL8 data type can yield up to 263 -1 positive integers. With a reasonable starting value, it is virtually impossible to cause a SERIAL8 value to wrap around during insertions. For SERIAL and SERIAL8 data types, the sequence of generated numbers always increases. When rows are deleted from the table, their serial numbers are not reused. Rows that are sorted on a SERIAL or SERIAL8 column are returned in the order in which they were created. That cannot be said of any other data types. You can specify the initial value in a SERIAL or SERIAL8 column in the CREATE TABLE statement. This makes it possible to generate different subsequences of system-assigned keys in different tables. The stores7 database uses this technique. In stores7, the customer numbers begin at 101, and the order numbers start at 1001. As long as this small business does not register more than 899 customers, all customer numbers have three digits, and order numbers have four. A SERIAL or SERIAL8 column is not automatically a unique column. If you want to be perfectly sure that no duplicate serial numbers occur, you must apply a unique constraint (see "Using CREATE TABLE"). If you define the table using the interactive schema editor in DB-Access or SQL Editor, it automatically applies a unique constraint to any SERIAL or SERIAL8 column. The SERIAL and SERIAL8 data types have the following advantages:
Altering the next SERIAL or SERIAL8 numberThe starting value for a SERIAL or SERIAL8 column is set when the column is created (see "Using CREATE TABLE"). You can use the ALTER TABLE statement later to reset the next value, the value that is used for the next-inserted row.You cannot set the next value below the current maximum value in the column because doing so causes the database server to generate duplicate numbers in certain situations. However, you can set the next value to any value higher than the current maximum, thus creating gaps in the sequence.
The floating-point data types are designed for these applications. They can represent any numerical quantity, fractional or whole, over a wide range of magnitudes from the cosmic to the microscopic. For example, they can easily represent both the average distance from the Earth to the Sun (1.5 ¥ 109 meters) or Planck's constant (6.625 ¥ 10-27). Their only restriction is their limited precision. Floating-point numbers retain only the most significant digits of their value. If a value has no more digits than a floating-point number can store, the value is stored exactly. If it has more digits, it is stored in approximate form, with its least-significant digits treated as zeros. This lack of exactitude is fine for many uses, but you should never use a floating-point data type to record money or any other quantity whose least significant digits should not be changed to zero. Two sizes of floating-point data types exist. The FLOAT type is a double-precision, binary floating-point number as implemented in the C language on your computer. A FLOAT data type value usually takes up 8 bytes. The SMALLFLOAT (also known as REAL) data type is a single-precision, binary floating-point number that usually takes up 4 bytes. The main difference between the two data types is their precision. A FLOAT column retains about 16 digits of its values; a SMALLFLOAT column retains only about 8 digits. Floating-point numbers have the following advantages:
The magnitude of a DECIMAL(p) number ranges from 10-130 to 10124. It is easy to be confused about decimal data types. The one under discussion is DECIMAL(p); that is, DECIMAL with only a precision specified. The size of DECIMAL(p) numbers depends on their precision; they occupy 1+p/2 bytes (rounded up to a whole number, if necessary). DECIMAL(p) has the following advantages over FLOAT:
Fixed-Point Numbers: DECIMAL and MONEYMost commercial applications need to store numbers that have fixed numbers of digits on the right and left of the decimal point. Amounts of money are the most common examples. Amounts in U.S. and other currencies are written with two digits to the right of the decimal point. Normally, you also know the number of digits needed on the left, depending on the kind of transactions that are recorded: perhaps 5 digits for a personal budget, 7 digits for a small business, and 12 or 13 digits for a national budget.These numbers are fixed-point numbers because the decimal point is fixed at a specific place, regardless of the value of the number. The DECIMAL(p,s) data type is designed to hold them. When you specify a column of this type, you write its precision (p) as the total number of digits that it can store, from 1 to 32. You write its scale (s) as the number of those digits that fall to the right of the decimal point. (Figure 9-2 shows the relation between precision and scale.) Scale can be zero, meaning it stores only whole numbers. When only whole numbers are stored, DECIMAL(p,s) provides a way of storing integers of up to 32 digits.
Like the DECIMAL(p) data type, DECIMAL(p,s) takes up space in proportion to its precision. One value occupies 1+p/2 bytes, rounded up to a whole number of bytes. The MONEY type is identical to DECIMAL(p,s), but with one extra feature. Whenever the database server converts a MONEY value to characters for display, it automatically includes a currency symbol. The advantages of DECIMAL(p,s) over INTEGER and FLOAT are that much greater precision is available (up to 32 digits as compared with 10 digits for INTEGER and 16 digits for FLOAT), and both the precision and the amount of storage required can be adjusted to suit the application. The disadvantages are that arithmetic operations are less efficient and that many programming languages do not support numbers in this form. Therefore, when a program extracts a number, it usually must convert the number to another numeric form for processing. (However, INFORMIX-4GL programs can use DECIMAL(p,s) and MONEY values directly.)
Choosing a currency formatEach nation has its own way of displaying money values. When an Informix database server displays a MONEY value, it refers to a currency format that the user specifies. The default locale specifies a U.S. English currency format of the following form:For non-English locales, you can change the current format by means of the MONETARY category of the locale file. For more information on using locales, refer to Chapter 1 of the Guide to GLS Functionality. To customize this currency format, choose your locale appropriately or set the DBMONEY environment variable. For more information, see Chapter 3, "Environment Variables" of the Informix Guide to SQL: Reference.
The DATE format has ample precision to carry dates into the far future (58,000 centuries). Negative DATE values are interpreted as counts of days prior to the epoch date; that is, a DATE value of - Because DATE values are integers, Informix database servers permit them to be used in arithmetic expressions. For example, you can take the average of a DATE column, or you can add 7 or 365 to a DATE column. In addition, a rich set of functions exists specifically for manipulating DATE values. (See the Informix Guide to SQL: Syntax.) The DATE data type is compact, at 4 bytes per item. Arithmetic functions and comparisons execute quickly on a DATE column.
Choosing a date formatYou can punctuate and order the components of a date in many ways. When an Informix database server displays a DATE value, it refers to a date format that the user specifies. The default locale specifies a U.S. English date format of the form:To customize this date format, choose your locale appropriately or set the DBDATE environment variable. For more information, see Chapter 3 of the Informix Guide to SQL: Reference. For languages other than English, you can also change the date format by means of the TIME category of the locale file. For more information on using locales, refer to the Guide to GLS Functionality.
The advantage of DATETIME is that it can store dates more precisely than to the nearest day, and it can store time values. Its sole disadvantage is an inflexible display format, but you can circumvent this advantage. (See "Forcing the format of a DATETIME or INTERVAL value".)
INTERVAL values can be negative as well as positive. You can add or subtract them, and you can scale them by multiplying or dividing by a number. This is not true of either DATE or DATETIME. You can reasonably ask, "What is one-half the number of days until April 23?" but not, "What is one-half of April 23?"
You can write a SELECT statement that displays the date part of a DATETIME value in the system-defined format. The trick is to isolate the component fields using the EXTEND function and pass them through the MDY() function, which converts them to a DATE. The following code shows a partial example:
Choosing a DATETIME FormatWhen an Informix database server displays a DATETIME value, it refers to a DATETIME format that the user specifies. The default locale specifies a U.S. English DATETIME format of the following form:For languages other than English, you change the DATETIME format by means of the TIME category of the locale file. For more information on using locales, refer to the Guide to GLS Functionality. To customize this DATETIME format, choose your locale appropriately or set the GL_DATETIME or DBTIME environment variable. For more information, see the Guide to GLS Functionality.
The following table shows how the BOOLEAN data type is represented.
You can compare a BOOLEAN column against another BOOLEAN column, or against Boolean values ( The following query returns rows from the emp_info table where bool_col values are true. The following query returns rows from the emp_info table where bool_col values are null. You can also use a column that is assigned the BOOLEAN data type to capture the results of an expression as shown in the following example:
Character Data TypesThe database server supports the NCHAR data type and NVARCHAR, the special-use character data type.
Data in CHAR columns is sorted in code-set order. For example, in the ASCII code set, the character a has a code-set value of 97, b has 98, and so forth. The database server sorts CHAR(n) data in this order. The NCHAR(n) data type also contains a sequence of n bytes. These characters can be a mixture of English and non-English characters and can be either single byte or multibyte (Asian). The length of n has the same limits as the CHAR(n) data type. Whenever an NCHAR(n) value is retrieved or stored, exactly n bytes are transferred. The number of characters transferred can be less than the number of bytes if the data contains multibyte characters. If an inserted value is shorter than n, the database server extends the value by using single byte ASCII space characters to make up n bytes.
A CHAR(n) or NCHAR(n) value can include tabs and spaces but normally contains no other nonprinting characters. When rows are inserted using INSERT or UPDATE, or when rows are loaded with a utility program, no means exists for entering nonprintable characters. However, when rows are created by a program using embedded SQL, the program can insert any character except the null (binary zero) character. It is not a good idea to store nonprintable characters in a character column because standard programs and utilities do not expect them. The advantage of the CHAR(n) or NCHAR(n) data type is its availability on all database servers. The only disadvantage of CHAR(n) or NCHAR(n) is its fixed length. When the length of data values varies widely from row to row, space is wasted.
The second parameter, r, is an optional reserve length that sets a lower limit on the number of bytes required by the value that is being stored on disk. Even if a value requires fewer than r bytes, r bytes are nevertheless allocated to hold it. The purpose is to save time when rows are updated. (See "Varying-Length Execution Time".) The advantages of the CHARACTER VARYING(m,r) or VARCHAR(m,r) data type over the CHAR(n) data type are as follows:
The following list describes the disadvantages of using CHARACTER VARYING(m,r), VARCHAR(m,r), and NVARCHAR(m,r) data types:
Varying-Length Execution TimeWhen you use the CHARACTER VARYING(m,r), VARCHAR(m,r), or NVARCHAR(m,r) data types, the rows of a table have a varying number of bytes instead of a fixed number of bytes. The speed of database operations is affected when the rows of a table have a varying number of bytes.Because more rows fit in a disk page, the database server can search the table with fewer disk operations than if the rows were of a fixed number of bytes. As a result, queries can execute more quickly. Insert and delete operations can be a little quicker for the same reason. When you update a row, the amount of work the database server must do depends on the number of bytes in the new row as compared with the number of bytes in the old row. If the new row uses the same number of bytes or fewer, the execution time is not significantly different than it is with fixed-length rows. However, if the new row requires a greater number of bytes than the old one, the database server might have to perform several times as many disk operations. Thus, updates of a table that use CHARACTER VARYING(m,r), VARCHAR(m,r), or NVARCHAR(m,r) data can sometimes be slower than updates of a fixed-length field. To mitigate this effect, specify r as a number of bytes that covers a high proportion of the data items. Then most rows use the reserve number of bytes, and padding wastes only a little space. Updates are slow only when a value using the reserve number of bytes is replaced with a value that uses more than the reserve number of bytes.
Smart large objects refer to columns that are assigned a BLOB or CLOB data type. A smart large object allows an application program to randomly access column data, which means you can read or write to any part of a BLOB or CLOB column in any arbitrary order. Simple large objects refer to columns that are assigned a TEXT or BYTE data type. A simple large object can store and retrieve character data or binary data, but cannot randomly access portions of the column data. In other words, TEXT or BYTE data can be inserted or deleted but cannot be modified. The database server simply stores or retrieves the TEXT or BYTE data in a single SQL statement. The following sections describe additional differences between simple large objects and smart large objects.
CLOB values are not stored with the rows of which they are a part. They are allocated in whole disk pages, usually areas away from rows. (For more information, see the INFORMIX-Universal Server Administrator's Guide.) The CLOB data type is similar to the TEXT data type except that the CLOB data type provides the following advantages:
Smart Large Objects: BLOBThe BLOB datatype is designed to hold any data that a program can generate: graphic images, satellite images, video clips, audio clips, or formatted documents saved by any word processor or spreadsheet. The database server permits any kind of data of any length in a BLOB column.BLOB data items are stored in whole disk pages in separate disk areas from normal row data. The advantage of the BLOB data type, as opposed to CLOB, is that it accepts any data. Otherwise, the advantages and disadvantages of the BLOB data type are the same as for the CLOB data type.
In any SQL statement, interactive or programmed, a CLOB or BLOB column cannot be used in the following ways:
In a SELECT statement entered interactively, a CLOB or BLOB column can:
Copying smart large objectsUniversal Server provides functions that you can call from within an SQL statement to import and export smart large objects. Figure 9-5 shows the smart-large-object functions. For detailed information and the syntax of smart-large-object functions, see the Expression segment in the Informix Guide to SQL: Syntax.
Figure 9-5
FILETOBLOB()
Copies a file into a BLOB column.
FILETOCLOB()
Copies a file into a CLOB column.
LOCOPY()
Copies BLOB or CLOB data into another BLOB or CLOB column.
LOTOFILE()
Copies a BLOB or CLOB into a file.
Suppose you create the following inmate and fbi_list tables: The following INSERT statement uses the FILETOBLOB() and FILETOCLOB() functions to insert a row of the inmate table. In the preceding example, the first argument for the FILETOBLOB() and FILETOCLOB() functions specifies the path of the source file to be copied into the BLOB and CLOB columns respectively. The second argument for each function specifies whether the source file is located on the client computer ('client') or server computer ('server'). The following rules apply for specifying the path of a filename in a function argument, depending on whether the file resides on the client or server computer:
The first argument for LOCOPY() specifies the column (mugshot) from which the large object is exported. The second and third arguments specify the name of the table (fbi_list) and column (mugshot) whose storage characteristics are used for the newly created large object. After execution of the UPDATE statement, the picture column contains data from the mugshot column. Because LOCOPY() uses the storage defaults of the column that it exports, this instance of the picture column is stored in sbspace3, which is the default storage specified for the mugshot column of the fbi_list table. The following SELECT statement uses the LOTOFILE() function to copy data from the felony column into the felon_322.txt file that is located on the client computer: The first argument for LOTOFILE() specifies the name of the column from which data is to be exported. The second argument specifies the name of the file into which data is to be copied. The third argument specifies whether the target file is located on the client computer ('client') or server computer ('server'). (See the previous discussion for the rules that apply to specifying the path of a filename for client and server computers.)
ESQL/C programs that access CLOB or BLOB data from a row can override some column-level defaults at the time that the program creates a BLOB or CLOB instance. For information about how to override column-level defaults when you create an instance of a smart large object from an ESQL/C program, see the INFORMIX-ESQL/C Programmer's Manual. Figure 9-6 shows the precedence rules that Universal Server uses to determine which characteristics a smart large object inherits.
For information about Universal Server defaults and sbspace defaults, see the INFORMIX-Universal Server Administrator's Guide.
TEXT values are not stored with the rows of which they are a part. They are allocated in whole disk pages, usually in areas away from rows. (See the INFORMIX-Universal Server Administrator's Guide.) The advantage of the TEXT data type over CHAR(n) and VARCHAR(m,r) is that the size of a TEXT data item has no limit except the capacity of disk storage to hold it. The disadvantages of the TEXT data type are as follows:
As with TEXT, BYTE data items are stored in whole disk pages in separate disk areas from normal row data. The advantage of the BYTE data type, as opposed to TEXT or CHAR(n), is that it accepts any data. Its disadvantages are the same as those of the TEXT data type.
In any SQL statement, interactive or programmed, a TEXT or BYTE column cannot be used in the following ways:
In an interactive UPDATE statement, you can update a simple-large-object column to null or to a subquery that returns a simple-large-object column.
Columns that are collection types cannot contain null elements. For more information, see "Collection Data Types".
Not all columns need default values, but as you work with your data model, you might discover instances where the use of a default value saves data-entry time or prevents data-entry error. For example, the telephone-directory model has a State column. While you are looking at the data for this column, you discover that more than 50 percent of the addresses list California as the state. To save time, you specify the string "CA" as the default value for the State column.
To express constraints on character-based domains, use the MATCHES predicate and the regular-expression syntax that it supports. For example, the following constraint restricts a telephone domain to the form of a U.S. local telephone number: For additional information about check constraints, see the CREATE TABLE and ALTER TABLE statements in the Informix Guide to SQL: Syntax.
You cannot use the CREATE DOMAIN statement to create an alias for the following data types:
Creating a DomainTo create or drop a domain you use the CREATE DOMAIN or DROP DOMAIN statement. For example, suppose you want to use the following collection data type to define columns in different tables:The following statement creates a domain name for the collection data type. You can use the domain anywhere you might use the typename. For example, the following statement use the d_employee domain to define the data type of a column in a table: For more information, see the description of the CREATE DOMAIN statement in the Informix Guide to SQL: Syntax.
After you drop a domain, any columns that currently are defined on the domain continue to retain the original data type assigned to the column. For example, consider the following sequence of SQL statements: The first SELECT statement returns the employee column, which is of type, SET(VARCHAR(30)NOT NULL), even though the d_employee domain has been dropped.The second SELECT statement returns the employee column, which is also of type SET(VARCHAR(30)NOT NULL) even though the d_employee domain has been recreated as a different data type. As the examples illustrate, once you define a column on a domain, the type of the column does not change. For more information, see the description of the DROP DOMAIN statement in the Informix Guide to SQL: Syntax.
For example, to change the data type of the employees column of the department table (shown in the preceding examples), you might construct the following statements: Although execution of the DROP DOMAIN and CREATE DOMAIN statements changes the domain definition, to change the data type of the employees column, which has been defined on the d_employee domain, you must use the ALTER TABLE statement.
|