INFORMIX
Informix Guide to SQL: Tutorial
Chapter 9: Implementing Your Data Model
Home Contents Index Master Index New Book

Defining Column-Specific Properties

To complete the data model described in Chapter 8, "Building Your Data Model," you must define column-specific properties for each column. The column-specific properties describe the constraints and identify the set of valid values that attributes (or columns) can assume.

The purpose of column-specific properties is to guard the semantic integrity of the data in the model; that is, to ensure that it reflects reality in a sensible way. The integrity of the data model is at risk if you can substitute a name for a telephone number or if you can enter a fraction where only integers are allowed.

To define column-specific properties, first define the constraints that a data value must satisfy before it can be part of the column. Use the following constraints to specify column-specific properties:

  • Data types (user-defined types and built-in types)
  • Default values
  • Check constraints
You can identify the primary and foreign keys in each table to place referential constraints on columns. For more information on primary and foreign keys, see Chapter 8, "Building Your Data Model."

Extended Data Types

In addition to the built-in data types that Universal Server supports, you can create and use the following data types to specify the data type of a column:

  • Opaque data types. You can use these encapsulated data types to define columns in the same way that you use built-in types. When you create an opaque data type, you also define the functions, operators, and aggregates to operate on the type. For information about opaque types, see the CREATE OPAQUE TYPE statement in the Informix Guide to SQL: Syntax and the user guide Extending INFORMIX-Universal Server: Data Types.
  • Distinct data types. These data types have the same representation as, but are distinct from, existing data types. You can create a distinct type from a built-in type, opaque type, named row type, or other distinct type. For information about distinct types, see Chapter 3, "Environment Variables" in the Informix Guide to SQL: Reference and the CREATE DISTINCT TYPE statement in the Informix Guide to SQL: Syntax.
  • Complex data types. These data types combine one or more existing data types to create a new data type. A complex data type allows access to each of its component data types. For information about complex types, see Chapter 10, "Understanding Complex Data Types."

Built-In Data Types

A built-in data type is a data type that the database server defines. INTEGER, CHAR, DATE, and DECIMAL are examples of built-in data types. To help you choose the appropriate data types for implementing your data model, this section provides a description of the built-in data types. For additional information on the built-in data types, see Chapter 3, "Environment Variables," in the Informix Guide to SQL: Reference.

Choosing a Data Type

The first constraint on any column is the one that is implicit in the data type for the column. When you choose a data type, you constrain the column so that it contains only values that can be represented by that type.

Every column in a table must have a data type that is chosen from the built-in or extended types that the database server supports. The choice of data type is important for the following reasons:

  • It establishes the basic properties of the column; that is, the set of valid data items that the column can store.
  • It determines the kinds of operations that you can perform on the data. For example, you cannot apply aggregate functions, such as SUM, to columns with a character data type.
  • It determines how much space each data item occupies on disk. The space required to accommodate data items is not as important for small tables as it for tables with tens or hundreds of thousands of rows. When a table reaches that many rows, the difference between a 4-byte and an 8-byte type can be crucial.
Using Data Types in Referential Constraints
Almost all data type combinations must match when you are trying to pick columns for primary and foreign keys. For example, if you define a primary key as a CHAR data type, you must also define the foreign key as a CHAR data type. However, when you specify a SERIAL data type on a primary key in one table, you specify an INTEGER on the foreign key of the relationship. Similarly, when you specify a SERIAL8 data type on a primary key in one table, you specify an INT8 on the foreign key of the relationship.The only data type combinations that you can mix in a relationship are as follows:

  • SERIAL and INTEGER
  • SERIAL8 and INT8
Figure 9-1 shows the decision tree that summarizes the choices among built-in data types. The choices are explained in the following sections.

(1 of 2)

Figure 9-1
Choosing a Data Type

(2 of 2)

Numeric Data Types

Informix database servers support eight numeric data types. Some are best suited for counters and codes, some for engineering quantities, and some for money.

Counters and Codes: INTEGER, SMALLINT, and INT8
The INTEGER and SMALLINT data types hold small whole numbers. They are suited for columns that contain counts, sequence numbers, numeric identity codes, or any range of whole numbers when you know in advance the maximum and minimum values to be stored.

Both types are stored as signed binary integers. INTEGER values have 32 bits and can represent whole numbers from -(231 -1) through 231-1; that is, from
-2,147,483,647 through 2,147,483,647. (The maximum negative number,
-2,147,483,248, is reserved and cannot be used.)

SMALLINT values have only 16 bits. They can represent whole numbers from -32,767 through 32,767. (The maximum negative number, -32,768, is reserved and cannot be used.)

The INTEGER and SMALLINT data types have the following advantages:

  • They take up little space (2 bytes per value for SMALLINT and 4 bytes per value for INTEGER).
  • Arithmetic expressions such as SUM and MAX as well as sort comparisons can be done very efficiently on them.
The disadvantage to using INTEGER and SMALLINT is the limited range of values that they can store. The database server does not store a value that exceeds the capacity of an integer. Of course, such excess is not a problem when you know the maximum and minimum values to be stored.

The INT8 data type is stored as a signed binary integer, which uses 8 bytes per value. Although INT8 takes up twice the space as the INTEGER data type, INT8 has the advantage of a significantly larger range of data representation. INT8 can represent integers ranging from - (2 63 -1) through 2 63 -1; that is from - 9,223,372,036,854,775,807 through 9,223,372,036,854,775,807. (The maximum negative number, - 9,223,372,036,854,775,808, is reserved and cannot be used.)

Automatic Sequences: SERIAL and SERIAL8
The SERIAL data type is simply INTEGER with a special feature. Similarly, the SERIAL8 data type is INT8 with a special feature. Whenever a new row is inserted into a table, the database server automatically generates a new value for a SERIAL or SERIAL8 column. A table can have only one SERIAL or SERIAL8 column, but it can have both a SERIAL column and a SERIAL8 column. Because the database server generates them, the serial values in new rows are always different even when multiple users are adding rows at the same time. This service is useful, because it is quite difficult for an ordinary program to coin unique numeric codes under those conditions.

The SERIAL data type can yield up to 231-1 positive integers. Consequently, the database server uses all the positive serial numbers by the time it inserts 231-1 rows in a table. For most users the exhaustion of the positive serial numbers is not a concern, however, because a single application would need to insert a row every second for 68 years, or 68 applications would need to insert a row every second for a year, to use all the positive serial numbers. However, if all the positive serial numbers were used, the database server would continue to generate new numbers. It would treat the next serial quantity as a signed integer. Because the database server uses only positive values, it would simply wrap around and start to generate integer values that begin with a 1.

The SERIAL8 data type can yield up to 263 -1 positive integers. With a reasonable starting value, it is virtually impossible to cause a SERIAL8 value to wrap around during insertions.

For SERIAL and SERIAL8 data types, the sequence of generated numbers always increases. When rows are deleted from the table, their serial numbers are not reused. Rows that are sorted on a SERIAL or SERIAL8 column are returned in the order in which they were created. That cannot be said of any other data types.

You can specify the initial value in a SERIAL or SERIAL8 column in the CREATE TABLE statement. This makes it possible to generate different subsequences of system-assigned keys in different tables. The stores7 database uses this technique. In stores7, the customer numbers begin at 101, and the order numbers start at 1001. As long as this small business does not register more than 899 customers, all customer numbers have three digits, and order numbers have four.

A SERIAL or SERIAL8 column is not automatically a unique column. If you want to be perfectly sure that no duplicate serial numbers occur, you must apply a unique constraint (see "Using CREATE TABLE"). If you define the table using the interactive schema editor in DB-Access or SQL Editor, it automatically applies a unique constraint to any SERIAL or SERIAL8 column.

The SERIAL and SERIAL8 data types have the following advantages:

  • They provide a convenient way to generate system-assigned keys.
  • They produce unique numeric codes even when multiple users are updating the table.
  • Different tables can use different ranges of numbers.
The SERIAL and SERIAL8 data types have the following disadvantages:

  • Only one SERIAL or SERIAL8 column is permitted in a table.
  • They can produce only arbitrary numbers.

Altering the next SERIAL or SERIAL8 number

The starting value for a SERIAL or SERIAL8 column is set when the column is created (see "Using CREATE TABLE"). You can use the ALTER TABLE statement later to reset the next value, the value that is used for the next-inserted row.

You cannot set the next value below the current maximum value in the column because doing so causes the database server to generate duplicate numbers in certain situations. However, you can set the next value to any value higher than the current maximum, thus creating gaps in the sequence.

Approximate Numbers: FLOAT and SMALLFLOAT
In scientific, engineering, and statistical applications, numbers are often known to only a few digits of accuracy, and the magnitude of a number is as important as its exact digits.

The floating-point data types are designed for these applications. They can represent any numerical quantity, fractional or whole, over a wide range of magnitudes from the cosmic to the microscopic. For example, they can easily represent both the average distance from the Earth to the Sun (1.5 ¥ 109 meters) or Planck's constant (6.625 ¥ 10-27). Their only restriction is their limited precision. Floating-point numbers retain only the most significant digits of their value. If a value has no more digits than a floating-point number can store, the value is stored exactly. If it has more digits, it is stored in approximate form, with its least-significant digits treated as zeros.

This lack of exactitude is fine for many uses, but you should never use a floating-point data type to record money or any other quantity whose least significant digits should not be changed to zero.

Two sizes of floating-point data types exist. The FLOAT type is a double-precision, binary floating-point number as implemented in the C language on your computer. A FLOAT data type value usually takes up 8 bytes. The SMALLFLOAT (also known as REAL) data type is a single-precision, binary floating-point number that usually takes up 4 bytes. The main difference between the two data types is their precision. A FLOAT column retains about 16 digits of its values; a SMALLFLOAT column retains only about 8 digits.

Floating-point numbers have the following advantages:

  • They store very large and very small numbers, including fractional ones.
  • They represent numbers compactly in 4 or 8 bytes.
  • Arithmetic functions such as AVG, MIN, and sort comparisons are efficient on these data types.
The main disadvantage of floating-point numbers is that digits outside their range of precision are treated as zeros.

Adjustable-Precision Floating Point: DECIMAL(p)
The DECIMAL(p) data type is a floating-point data type similar to FLOAT and SMALLFLOAT. The important difference is that you specify how many significant digits it retains. The precision you write as p can range from 1 to 32, from fewer than SMALLFLOAT up to twice the precision of FLOAT.

The magnitude of a DECIMAL(p) number ranges from 10-130 to 10124.

It is easy to be confused about decimal data types. The one under discussion is DECIMAL(p); that is, DECIMAL with only a precision specified. The size of DECIMAL(p) numbers depends on their precision; they occupy 1+p/2 bytes (rounded up to a whole number, if necessary).

DECIMAL(p) has the following advantages over FLOAT:

  • Precision can be set to suit the application, from highly approximate to highly precise.
  • Numbers with as many as 32 digits can be represented exactly.
  • Storage is used in proportion to the precision of the number.
  • Every Informix database server supports the same precision and range of magnitudes, regardless of the host operating system.
The DECIMAL(p) data type has the following disadvantages compared to FLOAT:

  • Performing arithmetic and sorts on DECIMAL(p) values is somewhat slower than on FLOAT values.
  • Many programming languages do not support the DECIMAL(p) data format the way that they support FLOAT and INTEGER. When a program extracts a DECIMAL(p) value from the database, it might have to convert the value to another format for processing.
Fixed-Point Numbers: DECIMAL and MONEY
Most commercial applications need to store numbers that have fixed numbers of digits on the right and left of the decimal point. Amounts of money are the most common examples. Amounts in U.S. and other currencies are written with two digits to the right of the decimal point. Normally, you also know the number of digits needed on the left, depending on the kind of transactions that are recorded: perhaps 5 digits for a personal budget, 7 digits for a small business, and 12 or 13 digits for a national budget.

These numbers are fixed-point numbers because the decimal point is fixed at a specific place, regardless of the value of the number. The DECIMAL(p,s) data type is designed to hold them. When you specify a column of this type, you write its precision (p) as the total number of digits that it can store, from 1 to 32. You write its scale (s) as the number of those digits that fall to the right of the decimal point. (Figure 9-2 shows the relation between precision and scale.) Scale can be zero, meaning it stores only whole numbers. When only whole numbers are stored, DECIMAL(p,s) provides a way of storing integers of up to 32 digits.

Figure 9-2
The Relation Between Precision and Scale in a Fixed-Point Number

Like the DECIMAL(p) data type, DECIMAL(p,s) takes up space in proportion to its precision. One value occupies 1+p/2 bytes, rounded up to a whole number of bytes.

The MONEY type is identical to DECIMAL(p,s), but with one extra feature. Whenever the database server converts a MONEY value to characters for display, it automatically includes a currency symbol.

The advantages of DECIMAL(p,s) over INTEGER and FLOAT are that much greater precision is available (up to 32 digits as compared with 10 digits for INTEGER and 16 digits for FLOAT), and both the precision and the amount of storage required can be adjusted to suit the application.

The disadvantages are that arithmetic operations are less efficient and that many programming languages do not support numbers in this form. Therefore, when a program extracts a number, it usually must convert the number to another numeric form for processing. (However, INFORMIX-4GL programs can use DECIMAL(p,s) and MONEY values directly.)

GLS

Choosing a currency format

Each nation has its own way of displaying money values. When an Informix database server displays a MONEY value, it refers to a currency format that the user specifies. The default locale specifies a U.S. English currency format of the following form:

For non-English locales, you can change the current format by means of the MONETARY category of the locale file. For more information on using locales, refer to Chapter 1 of the Guide to GLS Functionality.

To customize this currency format, choose your locale appropriately or set the DBMONEY environment variable. For more information, see Chapter 3, "Environment Variables" of the Informix Guide to SQL: Reference.

Chronological Data Types

Informix database servers support three data types for recording time. The DATE data type stores a calendar date. DATETIME records a point in time to any degree of precision from a year to a fraction of a second. The INTERVAL data type stores a span of time; that is, a duration.

Calendar Dates: DATE
The DATE data type stores a calendar date. A DATE value is actually a signed integer whose contents are interpreted as a count of full days since midnight on December 31, 1899. Most often it holds a positive count of days into the current century.

The DATE format has ample precision to carry dates into the far future (58,000 centuries). Negative DATE values are interpreted as counts of days prior to the epoch date; that is, a DATE value of -1 represents the day December 30, 1899.

Because DATE values are integers, Informix database servers permit them to be used in arithmetic expressions. For example, you can take the average of a DATE column, or you can add 7 or 365 to a DATE column. In addition, a rich set of functions exists specifically for manipulating DATE values. (See the Informix Guide to SQL: Syntax.)

The DATE data type is compact, at 4 bytes per item. Arithmetic functions and comparisons execute quickly on a DATE column.

GLS

Choosing a date format

You can punctuate and order the components of a date in many ways. When an Informix database server displays a DATE value, it refers to a date format that the user specifies. The default locale specifies a U.S. English date format of the form:

To customize this date format, choose your locale appropriately or set the DBDATE environment variable. For more information, see Chapter 3 of the Informix Guide to SQL: Reference.

For languages other than English, you can also change the date format by means of the TIME category of the locale file. For more information on using locales, refer to the Guide to GLS Functionality.

Exact Points in Time: DATETIME
The DATETIME data type stores any moment in time in the era that begins
1 A.D. In fact, DATETIME is really a family of 28 data types, each with a different precision. When you define a DATETIME column, you specify its precision. The column can contain any sequence from the list year, month, day, hour, minute, second, and fraction. Thus, you can define a DATETIME column that stores only a year, only a month and day, or a date and time that is exact to the hour or even to the millisecond. The size of a DATETIME value ranges from 2 to 11 bytes depending on its precision, as Figure 9-3 shows.

The advantage of DATETIME is that it can store dates more precisely than to the nearest day, and it can store time values. Its sole disadvantage is an inflexible display format, but you can circumvent this advantage. (See "Forcing the format of a DATETIME or INTERVAL value".)

Figure 9-3
Precisions for the DATETIME Data Type

Durations: INTERVAL
The INTERVAL data type stores a duration, that is, a length of time. The difference between two DATETIME values is an INTERVAL, which represents the span of time that separates them. The following examples might help to clarify the differences:

  • An employee began working on January 21, 1994 (either a DATE or a DATETIME).
  • She has worked for 254 days (an INTERVAL value, the difference between the TODAY function and the starting DATE or DATETIME value).
  • She begins work each day at 0900 hours (a DATETIME value).
  • She works 8 hours (an INTERVAL value) with 45 minutes for lunch (another INTERVAL value).
  • Her quitting time is 1745 hours (the sum of the DATETIME when she begins work and the two INTERVALs).
Like DATETIME, INTERVAL is a family of types with different precisions. An INTERVAL value can represent a count of years and months; or it can represent a count of days, hours, minutes, seconds, or fractions of seconds; 18 precisions are possible. The size of an INTERVAL value ranges from 2 to 12 bytes, depending on the formulas that Figure 9-4 shows.

Figure 9-4
Precisions for the INTERVAL Data Type

INTERVAL values can be negative as well as positive. You can add or subtract them, and you can scale them by multiplying or dividing by a number. This is not true of either DATE or DATETIME. You can reasonably ask, "What is one-half the number of days until April 23?" but not, "What is one-half of April 23?"

Forcing the format of a DATETIME or INTERVAL value

The database server always displays the components of an INTERVAL or DATETIME value in the order year-month-day hour:minute:second.fraction. It does not refer to the date format that is defined to the operating system, as it does when it formats a DATE value.

You can write a SELECT statement that displays the date part of a DATETIME value in the system-defined format. The trick is to isolate the component fields using the EXTEND function and pass them through the MDY() function, which converts them to a DATE. The following code shows a partial example:

GLS
Choosing a DATETIME Format
When an Informix database server displays a DATETIME value, it refers to a DATETIME format that the user specifies. The default locale specifies a U.S. English DATETIME format of the following form:

For languages other than English, you change the DATETIME format by means of the TIME category of the locale file. For more information on using locales, refer to the Guide to GLS Functionality.

To customize this DATETIME format, choose your locale appropriately or set the GL_DATETIME or DBTIME environment variable. For more information, see the Guide to GLS Functionality.

Boolean Data Type

The BOOLEAN data type is a one byte data type. In DB-Access or SQL Editor, legal values are true ('t'), false ('f') or NULL. The values are case insensitive.

The following table shows how the BOOLEAN data type is represented.

BOOLEAN
Representation
Internal
Representation
Literal
Representation

TRUE

\1

't', 'T'

FALSE

\0

'f', 'F'

NULL

For internal use only

NULL

You can compare a BOOLEAN column against another BOOLEAN column, or against Boolean values ('t','f'). For example, suppose you create the following table:

The following query returns rows from the emp_info table where bool_col values are true.

The following query returns rows from the emp_info table where bool_col values are null.

You can also use a column that is assigned the BOOLEAN data type to capture the results of an expression as shown in the following example:

GLS

Character Data Types

The database server supports the NCHAR data type and NVARCHAR, the special-use character data type.

Character Data: CHAR(n) and NCHAR(n)
The CHAR(n) data type contains a sequence of n bytes. These characters can be a mixture of English and non-English characters and can be either single byte or multibyte (Asian). The length n ranges from 1 to 32,767. Whenever a CHAR(n) value is retrieved or stored, exactly n bytes are transferred. If an inserted value is shorter than n, the database server extends the value by using single byte ASCII space characters to make up n bytes.

Data in CHAR columns is sorted in code-set order. For example, in the ASCII code set, the character a has a code-set value of 97, b has 98, and so forth. The database server sorts CHAR(n) data in this order.

The NCHAR(n) data type also contains a sequence of n bytes. These characters can be a mixture of English and non-English characters and can be either single byte or multibyte (Asian). The length of n has the same limits as the CHAR(n) data type. Whenever an NCHAR(n) value is retrieved or stored, exactly n bytes are transferred. The number of characters transferred can be less than the number of bytes if the data contains multibyte characters. If an inserted value is shorter than n, the database server extends the value by using single byte ASCII space characters to make up n bytes.

Tip: The database server accepts values from the user that are extended with either single-byte or multibyte spaces as the locale defines.
The database server sorts data in NCHAR(n) columns according to the order that the locale specifies. For example, the French locale specifies that the character ê is sorted after the value e but before the value f. In other words, the sort order dictated by the French locale is e, ê, f, and so on. For more information on using locales, refer to the Guide to GLS Functionality.

Tip: The only difference between CHAR(n) and NCHAR(n) data is the data sorting and comparison. You can store non-English characters in a CHAR(n) column. However, because the database server uses code-set order to perform any sorting or comparison on CHAR(n) columns, you might not obtain the results in the order that you expected.
A CHAR(n) or NCHAR(n) value can include tabs and spaces but normally contains no other nonprinting characters. When rows are inserted using INSERT or UPDATE, or when rows are loaded with a utility program, no means exists for entering nonprintable characters. However, when rows are created by a program using embedded SQL, the program can insert any character except the null (binary zero) character. It is not a good idea to store nonprintable characters in a character column because standard programs and utilities do not expect them.

The advantage of the CHAR(n) or NCHAR(n) data type is its availability on all database servers. The only disadvantage of CHAR(n) or NCHAR(n) is its fixed length. When the length of data values varies widely from row to row, space is wasted.

Varying-Length Strings: CHARACTER VARYING(m,r), VARCHAR(m,r), NVARCHAR(m,r), and LVARCHAR
For the following data types, m represents the maximum number of bytes and r represents the minimum number of bytes.

Tip: The data type CHARACTER VARYING (m,r) is ANSI compliant. VARCHAR(m,r) is an Informix data type.
Often the items in a character column have different lengths; that is, many have an average length, and only a few have the maximum length. The following data types are designed to save disk space when you store such data:

  • CHARACTER VARYING (m,r). The CHARACTER VARYING (m,r) data type contains a sequence of, at most, m bytes or at the least, r bytes. This data type is the ANSI-compliant format for character data of varying length. CHARACTER VARYING (m,r) supports code-set order for comparisons of its character data.
  • VARCHAR (m,r). VARCHAR (m,r) is an Informix-specific data type for storing character data of varying length. In functionality, it is the same as CHARACTER VARYING(m,r).
  • NVARCHAR (m,r). NVARCHAR (m,r) is also an Informix-specific data type for storing character data of varying length. It compares character data in the order that the locale specifies.
  • LVARCHAR. LVARCHAR is an Informix-specific data type for storing character data of varying length for values greater than 256 bytes but less than 32 kilobytes. LVARCHAR supports code-set order for comparisons of its character data
Tip: The difference in the way data is compared distinguishes NVARCHAR(m,r) data from CHARACTER VARYING(m,r) or VARCHAR(m,r) data. For more information on code set and sort order determined by the locale, see "Character Data: CHAR(n) and NCHAR(n)".
When you define columns of VARCHAR(m,r), CHARACTER VARYING(m,r), or VARCHAR(m,r) data types, you specify m as the maximum number of bytes. If an inserted value consists of fewer than m bytes, the database server does not extend the value with single-byte spaces (as with CHAR(n) and NCHAR(n) values.) Instead, it stores only the actual contents on disk, with a 1-byte length field. The limit on m is 254 bytes for indexed columns and 255 bytes for non-indexed columns.

The second parameter, r, is an optional reserve length that sets a lower limit on the number of bytes required by the value that is being stored on disk. Even if a value requires fewer than r bytes, r bytes are nevertheless allocated to hold it. The purpose is to save time when rows are updated. (See "Varying-Length Execution Time".)

The advantages of the CHARACTER VARYING(m,r) or VARCHAR(m,r) data type over the CHAR(n) data type are as follows:

  • It conserves disk space when the number of bytes that data items require vary widely or when only a few items require more bytes than average.
  • Queries on the more compact tables can be faster.
These advantages also apply to the NVARCHAR(m,r) data type in comparison to the NCHAR(n) data type.

The following list describes the disadvantages of using CHARACTER VARYING(m,r), VARCHAR(m,r), and NVARCHAR(m,r) data types:

  • They do not allow lengths that exceed 255 bytes.
  • Table updates can be slower in some circumstances.
  • They are not available with all Informix database servers.
Varying-Length Execution Time
When you use the CHARACTER VARYING(m,r), VARCHAR(m,r), or NVARCHAR(m,r) data types, the rows of a table have a varying number of bytes instead of a fixed number of bytes. The speed of database operations is affected when the rows of a table have a varying number of bytes.

Because more rows fit in a disk page, the database server can search the table with fewer disk operations than if the rows were of a fixed number of bytes. As a result, queries can execute more quickly. Insert and delete operations can be a little quicker for the same reason.

When you update a row, the amount of work the database server must do depends on the number of bytes in the new row as compared with the number of bytes in the old row. If the new row uses the same number of bytes or fewer, the execution time is not significantly different than it is with fixed-length rows. However, if the new row requires a greater number of bytes than the old one, the database server might have to perform several times as many disk operations. Thus, updates of a table that use CHARACTER VARYING(m,r), VARCHAR(m,r), or NVARCHAR(m,r) data can sometimes be slower than updates of a fixed-length field.

To mitigate this effect, specify r as a number of bytes that covers a high proportion of the data items. Then most rows use the reserve number of bytes, and padding wastes only a little space. Updates are slow only when a value using the reserve number of bytes is replaced with a value that uses more than the reserve number of bytes.

Large Object Data Types

Universal Server supports both simple large objects and smart large objects to handle data that exceeds a length of 255 bytes and non-ASCII character data.

Smart large objects refer to columns that are assigned a BLOB or CLOB data type. A smart large object allows an application program to randomly access column data, which means you can read or write to any part of a BLOB or CLOB column in any arbitrary order.

Simple large objects refer to columns that are assigned a TEXT or BYTE data type. A simple large object can store and retrieve character data or binary data, but cannot randomly access portions of the column data. In other words, TEXT or BYTE data can be inserted or deleted but cannot be modified. The database server simply stores or retrieves the TEXT or BYTE data in a single SQL statement.

The following sections describe additional differences between simple large objects and smart large objects.

Smart Large Objects: CLOB
The CLOB data type stores a block of text. It is designed to store ASCII text data, including formatted text such as HTML or PostScript. Although you can store any data in a CLOB object, Informix tools expect a CLOB object to be printable, so restrict this data type to printable ASCII text.

CLOB values are not stored with the rows of which they are a part. They are allocated in whole disk pages, usually areas away from rows. (For more information, see the INFORMIX-Universal Server Administrator's Guide.)

The CLOB data type is similar to the TEXT data type except that the CLOB data type provides the following advantages:

  • An application program can read from or write to any portion of the CLOB object.
  • Access times can be significantly faster because an application program can access any portion of a CLOB object.
  • Default characteristics are relatively easy to override. Database administrators can override default characteristics for sbspace at the column level. Application programmers can override some default characteristics for the column when they create a CLOB object.
  • You can use the equals operator (=) to test whether two CLOB values are equal.
  • A CLOB object is recoverable in the event of a system crash and obeys transaction isolation modes (when specified by the DBA or application programmer). (Recovery of CLOB objects requires that your database system has the necessary resources to provide buffers large enough to handle CLOB objects.)
  • You can use the CLOB data type to provide large storage for a user-defined data type.
  • DataBlade developers can create indexes on CLOB data types.
The disadvantages of the CLOB data type are as follows:

  • It is allocated in whole disk pages, so a short item wastes space.
  • Restrictions apply on how you can use a CLOB column in an SQL statement. (See "Using Smart Large Objects".)
  • It is not available with all Informix database servers.
Smart Large Objects: BLOB
The BLOB datatype is designed to hold any data that a program can generate: graphic images, satellite images, video clips, audio clips, or formatted documents saved by any word processor or spreadsheet. The database server permits any kind of data of any length in a BLOB column.

BLOB data items are stored in whole disk pages in separate disk areas from normal row data.

The advantage of the BLOB data type, as opposed to CLOB, is that it accepts any data. Otherwise, the advantages and disadvantages of the BLOB data type are the same as for the CLOB data type.

Using Smart Large Objects
To store columns of a CLOB or BLOB data type, you must allocate an sbspace. An sbspace is a logical storage unit that stores BLOB and CLOB data in the most efficient way possible. You can write INFORMIX-ESQL/C programs that allow users to fetch and store CLOB and BLOB data. Application programmers who want to access and manipulate large objects directly can consult the INFORMIX-ESQL/C Programmer's Manual.

In any SQL statement, interactive or programmed, a CLOB or BLOB column cannot be used in the following ways:

  • In arithmetic or Boolean expressions
  • In a GROUP BY or ORDER BY clause
  • In a UNIQUE test
  • For indexing, as part of an Informix B+ tree index
    However, DataBlade developers have the capability to create indexes on CLOB columns.

In a SELECT statement entered interactively, a CLOB or BLOB column can:

  • specify null values as a default when you create a table with the DEFAULT NULL clause.
  • disallow null values using the NOT NULL constraint when you create a table.
  • be tested with the IS [NOT] NULL predicate.
From an ESQL/C program, you can use the ifx_lo_stat() function to determine the length of CLOB or BLOB data.

Important: Casts between CLOB and BLOB data types are not permitted.

Copying smart large objects

Universal Server provides functions that you can call from within an SQL statement to import and export smart large objects. Figure 9-5 shows the smart-large-object functions. For detailed information and the syntax of smart-large-object functions, see the Expression segment in the Informix Guide to SQL: Syntax.

Figure 9-5
SQL Functions for Smart Large Objects
Function Name Purpose

FILETOBLOB()

Copies a file into a BLOB column.

FILETOCLOB()

Copies a file into a CLOB column.

LOCOPY()

Copies BLOB or CLOB data into another BLOB or CLOB column.

LOTOFILE()

Copies a BLOB or CLOB into a file.

You can use any of the functions that Figure 9-5 shows in the SELECT, UPDATE, and INSERT statements. (The following examples assume that the SBSPACENAME parameter has been specified as sbspace1.)

Suppose you create the following inmate and fbi_list tables:

The following INSERT statement uses the FILETOBLOB() and FILETOCLOB() functions to insert a row of the inmate table.

In the preceding example, the first argument for the FILETOBLOB() and FILETOCLOB() functions specifies the path of the source file to be copied into the BLOB and CLOB columns respectively. The second argument for each function specifies whether the source file is located on the client computer ('client') or server computer ('server'). The following rules apply for specifying the path of a filename in a function argument, depending on whether the file resides on the client or server computer:

  • If the source file resides on the server computer, you must specify the full pathname to the file (not the pathname relative to the current working directory).
  • If the source file resides on the client computer, you can specify either the full or relative pathname to the file.
The following UPDATE statement uses the LOCOPY() function to copy BLOB data from the mugshot column of the fbi_list table into the picture column of the inmate table:

The first argument for LOCOPY() specifies the column (mugshot) from which the large object is exported. The second and third arguments specify the name of the table (fbi_list) and column (mugshot) whose storage characteristics are used for the newly created large object. After execution of the UPDATE statement, the picture column contains data from the mugshot column. Because LOCOPY() uses the storage defaults of the column that it exports, this instance of the picture column is stored in sbspace3, which is the default storage specified for the mugshot column of the fbi_list table.

The following SELECT statement uses the LOTOFILE() function to copy data from the felony column into the felon_322.txt file that is located on the client computer:

The first argument for LOTOFILE() specifies the name of the column from which data is to be exported. The second argument specifies the name of the file into which data is to be copied. The third argument specifies whether the target file is located on the client computer ('client') or server computer ('server'). (See the previous discussion for the rules that apply to specifying the path of a filename for client and server computers.)

Inheritance of characteristics for smart large objects

The database administrator can specify, at the column level, the estimated extent size for CLOB or BLOB data to override sbspace defaults. An extent is the unit of storage allocation that is used when a large object needs additional storage. For information about how to specify an extent size when you create a table, see the CREATE TABLE statement in the Informix Guide to SQL: Syntax. If the size of the smart large object is not specified at the column-level, the CLOB or BLOB column inherits characteristics from the sbspace defaults or (if sbspace defaults are not specified) from the default values in the database server.

ESQL/C programs that access CLOB or BLOB data from a row can override some column-level defaults at the time that the program creates a BLOB or CLOB instance. For information about how to override column-level defaults when you create an instance of a smart large object from an ESQL/C program, see the INFORMIX-ESQL/C Programmer's Manual.

Figure 9-6 shows the precedence rules that Universal Server uses to determine which characteristics a smart large object inherits.

Figure 9-6
Precedence Rules That Determine How a Smart Large Object Inherits Characteristics

For information about Universal Server defaults and sbspace defaults, see the INFORMIX-Universal Server Administrator's Guide.

Simple Large Objects: TEXT
The TEXT data type stores a block of text. It is designed to store self-contained documents: business forms, program source or data files, or memos. Although you can store any data in a TEXT item, Informix tools expect a TEXT item to be printable, so restrict this data type to printable ASCII text.

TEXT values are not stored with the rows of which they are a part. They are allocated in whole disk pages, usually in areas away from rows. (See the INFORMIX-Universal Server Administrator's Guide.)

The advantage of the TEXT data type over CHAR(n) and VARCHAR(m,r) is that the size of a TEXT data item has no limit except the capacity of disk storage to hold it. The disadvantages of the TEXT data type are as follows:

  • You cannot write to a portion of a TEXT column. (However, you can read from and write to any portion of a CLOB column.)
  • It is allocated in whole disk pages, so a short item wastes space.
  • Restrictions apply on how you can use a TEXT column in an SQL statement. (See "Using Simple Large Objects".)
  • A system crash under certain circumstance can result in a loss of data.
  • Overriding default characterisitcs for TEXT columns can be a time-intensive task (in comparison with CLOB columns).
  • It is not available with all Informix database servers.
You can display TEXT values in reports that you generate with INFORMIX-4GL programs or the ACE report writer. You can display TEXT values on a screen and edit them using screen forms generated with INFORMIX-4GL programs or with the PERFORM screen-form processor.

Simple Large Objects: BYTE
The BYTE data type is designed to hold any data that a program can generate: graphic images, program object files, and formatted documents saved by any word processor or spreadsheet. The database server permits any kind of data of any length in a BYTE column.

As with TEXT, BYTE data items are stored in whole disk pages in separate disk areas from normal row data.

The advantage of the BYTE data type, as opposed to TEXT or CHAR(n), is that it accepts any data. Its disadvantages are the same as those of the TEXT data type.

Using Simple Large Objects
To store columns of a TEXT or BYTE data type, you must allocate a blobspace. A blobspace is a logical storage unit that stores TEXT and BYTE data in the most efficient way possible. Normally, you use INFORMIX-ESQL/C or NewEra programs to fetch and store TEXT and BYTE data. In such a program, you can fetch, insert, or update a simple large object value in a manner similar to the way that you read or write a sequential file.

In any SQL statement, interactive or programmed, a TEXT or BYTE column cannot be used in the following ways:

  • In arithmetic or Boolean expressions
  • In a GROUP BY or ORDER BY clause
  • In a UNIQUE test
  • For indexing, either by itself or as part of a composite index
In a SELECT statement entered interactively, or in a form or report, a TEXT or BYTE column can:

  • be selected by name, optionally with a subscript to extract part of it.
  • have its length returned by selecting LENGTH(column).
  • be tested with the IS [NOT] NULL predicate.
In an interactive INSERT statement, you can use the VALUES clause to insert a simple-large-object value, but the only value that you can give that column is null. However, you can use the SELECT form of the INSERT statement to copy a simple large object value from another table.

In an interactive UPDATE statement, you can update a simple-large-object column to null or to a subquery that returns a simple-large-object column.

Changing the Data Type

After the table is built, you can use the ALTER TABLE statement to change the data type that is assigned to a column. Although such alterations are sometimes necessary, you should avoid them for the following reasons:

  • To change a data type, the database server must copy and rebuild the table. For large tables, copying and rebuilding can take a lot of time and disk space.
  • Some data type changes can cause a loss of information. For example, when you change a column from a longer to a shorter character type, long values are truncated; when you change to a less-precise numeric type, low-order digits are truncated.
  • Existing programs, forms, reports, and stored queries might also have to be changed.
Restrictions apply for using the ALTER TABLE statement to change the data type that is assigned to a column of a table in an inheritance hierarchy. For information about altering a table in an inheritance hierarchy, see "Altering the Structure of a Table in a Table Hierarchy".

Null Values

Columns in a table can be designated as containing null values. A null value means that the value for the column can be unknown or not applicable. For example, in the telephone-directory example in Chapter 8, the anniv column of the name table can contain null values; if you do not know the person's anniversary, you do not specify it. Do not confuse null value with zero or blank value. To specify that the value of a column is null, you use the NULL keyword. For example, the following statement inserts a row into the manufact table and specifies that the value for the lead_time column is null:

Columns that are collection types cannot contain null elements. For more information, see "Collection Data Types".

Default Values

A default value is the value that is inserted into a column when an explicit value is not specified in an INSERT statement. A default value can be a literal character string that either you define or one of the following SQL null, constant expressions defines:

  • USER
  • CURRENT
  • TODAY
  • DBSERVERNAME
Not all columns need default values, but as you work with your data model, you might discover instances where the use of a default value saves data-entry time or prevents data-entry error. For example, the telephone-directory model has a State column. While you are looking at the data for this column, you discover that more than 50 percent of the addresses list California as the state. To save time, you specify the string "CA" as the default value for the State column.

Check Constraints

Check constraints specify a condition or requirement on a data value before data can be assigned to a column during an INSERT or UPDATE statement. If a row evaluates to false for any of the check constraints that are defined on a table during an insert or update, the database server returns an error. To define a constraint, use the CREATE TABLE or ALTER TABLE statements. For example, the following requirement constrains the values of an integer domain to a certain range:

To express constraints on character-based domains, use the MATCHES predicate and the regular-expression syntax that it supports. For example, the following constraint restricts a telephone domain to the form of a U.S. local telephone number:

For additional information about check constraints, see the CREATE TABLE and ALTER TABLE statements in the Informix Guide to SQL: Syntax.

Domains

A domain is an alias that you create to substitute for the name of a data type. In particular, domains provide a useful shorthand notation for collection data types that have long typenames. (For information about collection data types, see "Collection Data Types".) Once you create a domain you can use it anywhere the typename would be used. You can use a domain to specify a data type only; a domain does not have any other properties such as constraints or default values.

You cannot use the CREATE DOMAIN statement to create an alias for the following data types:

  • User-defined opaque data types
  • User-defined distinct data types
  • BOOLEAN
  • CLOB
  • BLOB

Creating a Domain

To create or drop a domain you use the CREATE DOMAIN or DROP DOMAIN statement. For example, suppose you want to use the following collection data type to define columns in different tables:

The following statement creates a domain name for the collection data type.

You can use the domain anywhere you might use the typename. For example, the following statement use the d_employee domain to define the data type of a column in a table:

For more information, see the description of the CREATE DOMAIN statement in the Informix Guide to SQL: Syntax.

Dropping a Domain

To drop a domain you use the DROP DOMAIN statement. You can drop a domain that is currently being used to specify the data type of a column. For example, suppose you want to drop the d_employee domain that was used to define a column of the department table in the preceding section. The following statement shows how to drop a domain:

After you drop a domain, any columns that currently are defined on the domain continue to retain the original data type assigned to the column. For example, consider the following sequence of SQL statements:

The first SELECT statement returns the employee column, which is of type, SET(VARCHAR(30)NOT NULL), even though the d_employee domain has been dropped.The second SELECT statement returns the employee column, which is also of type SET(VARCHAR(30)NOT NULL) even though the d_employee domain has been recreated as a different data type. As the examples illustrate, once you define a column on a domain, the type of the column does not change.

For more information, see the description of the DROP DOMAIN statement in the Informix Guide to SQL: Syntax.

To change the data type of a column defined on a domain

1. Drop the domain

    2. Create a new domain

    3. Use the ALTER TABLE statement to modify the column data type

For example, to change the data type of the employees column of the department table (shown in the preceding examples), you might construct the following statements:

Although execution of the DROP DOMAIN and CREATE DOMAIN statements changes the domain definition, to change the data type of the employees column, which has been defined on the d_employee domain, you must use the ALTER TABLE statement.




Informix Guide to SQL: Tutorial, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.