Data Integrity

Informix Guide to SQL: Tutorial
Modifying Data

Data Integrity

The INSERT, UPDATE, and DELETE statements modify data in an existing database. Whenever you modify existing data, the integrity of the data can be affected. For example, an order for a nonexistent product could be entered into the orders table, a customer with outstanding orders could be deleted from the customer table, or the order number could be updated in the orders table and not in the items table. In each of these cases, the integrity of the stored data is lost.

Data integrity is actually made up of the following parts:

Entity integrity

Each row of a table has a unique identifier.

Semantic integrity

The data in the columns properly reflects the types of information the column was designed to hold.

Referential integrity

The relationships between tables are enforced.

Well-designed databases incorporate these principles so that when you modify data, the database itself prevents you from doing anything that might harm the integrity of the data.

Entity Integrity

An entity is any person, place, or thing to be recorded in a database. Each table represents an entity, and each row of a table represents an instance of that entity. For example, if order is an entity, the orders table represents the idea of an order and each row in the table represents a specific order.

To identify each row in a table, the table must have a primary key. The primary key is a unique value that identifies each row. This requirement is called the entity integrity constraint.

For example, the orders table primary key is order_num. The order_num column holds a unique system-generated order number for each row in the table. To access a row of data in the orders table, use the following SELECT statement:

SELECT * FROM orders WHERE order_num = 1001

Using the order number in the WHERE clause of this statement enables you to access a row easily because the order number uniquely identifies that row. If the table allowed duplicate order numbers, it would be almost impossible to access one single row because all other columns of this table allow duplicate values.

For more information on primary keys and entity integrity, see the Informix Guide to Database Design and Implementation.

Semantic Integrity

Semantic integrity ensures that data entered into a row reflects an allowable value for that row. The value must be within the domain, or allowable set of values, for that column. For example, the quantity column of the items table permits only numbers. If a value outside the domain can be entered into a column, the semantic integrity of the data is violated.

The following constraints enforce semantic integrity:

Data type

The data type defines the types of values that you can store in a column. For example, the data type SMALLINT allows you to enter values from -32,767 to 32,767 into a column.

Default value

The default value is the value inserted into the column when an explicit value is not specified. For example, the user_id column of the cust_calls table defaults to the login name of the user if no name is entered.

Check constraint

The check constraint specifies conditions on data inserted into a column. Each row inserted into a table must meet these conditions. For example, the quantity column of the items table might check for quantities greater than or equal to one.
For more information on how to use semantic integrity constraints in database design, see the Informix Guide to Database Design and Implementation.

Referential Integrity

Referential integrity refers to the relationship between tables. Because each table in a database must have a primary key, this primary key can appear in other tables because of its relationship to data within those tables. When a primary key from one table appears in another table, it is called a foreign key.

Foreign keys join tables and establish dependencies between tables. Tables can form a hierarchy of dependencies in such a way that if you change or delete a row in one table, you destroy the meaning of rows in other tables. For example, Figure 6-4 shows that the customer_num column of the customer table is a primary key for that table and a foreign key in the orders and cust_call tables. Customer number 106, George Watson, is referenced in both the orders and cust_calls tables. If customer 106 is deleted from the customer table, the link between the three tables and this particular customer is destroyed.


Figure 6-4 Referential Integrity in the Demonstration Database

When you delete a row that contains a primary key or update it with a different primary key, you destroy the meaning of any rows that contain that value as a foreign key. Referential integrity is the logical dependency of a foreign key on a primary key. The integrity of a row that contains a foreign key depends on the integrity of the row that it references-the row that contains the matching primary key.

By default, the database server does not allow you to violate referential integrity and gives you an error message if you attempt to delete rows from the parent table before you delete rows from the child table. You can, however, use the ON DELETE CASCADE option to cause deletes from a parent table to trip deletes on child tables. See Using the ON DELETE CASCADE Option.

To define primary and foreign keys, and the relationship between them, use the CREATE TABLE and ALTER TABLE statements. For more information on these statements, see the Informix Guide to SQL: Syntax. For information about how to build a data model with primary and foreign keys, see the Informix Guide to Database Design and Implementation.

Using the ON DELETE CASCADE Option

To maintain referential integrity when you delete rows from a primary key for a table, use the ON DELETE CASCADE option in the REFERENCES clause of the CREATE TABLE and ALTER TABLE statements. This option allows you to delete a row from a parent table and its corresponding rows in matching child tables with a single delete command.

Locking During Cascading Deletes

During deletes, locks are held on all qualifying rows of the parent and child tables. When you specify a delete, the delete that is requested from the parent table occurs before any referential actions are performed.

What Happens to Multiple Children Tables

If you have a parent table with two child constraints, one child with cascading deletes specified and one child without cascading deletes, and you attempt to delete a row from the parent table that applies to both child tables, the DELETE statement fails, and no rows are deleted from either the parent or child tables.

Logging Must Be Turned On

You must turn on logging in your current database for cascading deletes to work. Logging and cascading deletes are discussed in Transaction Logging on page 6-53.

Example of Cascading Deletes

Suppose you have two tables with referential integrity rules applied, a parent table, accounts, and a child table, sub_accounts. The following CREATE TABLE statements define the referential constraints:

CREATE TABLE accounts ( acc_num SERIAL primary key, acc_type INT, acc_descr CHAR(20)); CREATE TABLE sub_accounts ( sub_acc INTEGER primary key, ref_num INTEGER REFERENCES accounts (acc_num) ON DELETE CASCADE, sub_descr CHAR(20));

The primary key of the accounts table, the acc_num column, uses a SERIAL data type, and the foreign key of the sub_accounts table, the ref_num column, uses an INTEGER data type. Combining the SERIAL data type on the primary key and the INTEGER data type on the foreign key is allowed. Only in this condition can you mix and match data types. The SERIAL data type is an INTEGER, and the database automatically generates the values for the column. All other primary and foreign key combinations must match explicitly. For example, a primary key that is defined as CHAR must match a foreign key that is defined as CHAR.

The definition of the foreign key of the sub_accounts table, the ref_num column, includes the ON DELETE CASCADE option. This option specifies that a delete of any row in the parent table accounts will automatically cause the corresponding rows of the child table sub_accounts to be deleted.

To delete a row from the accounts table that will cascade a delete to the sub_accounts table, you must turn on logging. After logging is turned on, you can delete the account number 2 from both tables, as the following example shows:

DELETE FROM accounts WHERE acc_num = 2

Restrictions on Cascading Deletes

You can use cascading deletes for most deletes, including deletes on self-referencing and cyclic queries. The only exception is correlated subqueries. In correlated subqueries, the subquery (or inner SELECT) is correlated when the value it produces depends on a value produced by the outer SELECT statement that contains it. If you have implemented cascading deletes, you cannot write deletes that use a child table in the correlated subquery. You receive an error when you attempt to delete from a correlated subquery.

Important: You cannot define a DELETE trigger event on a table if the table defines a referential constraint with ON DELETE CASCADE.

Object Modes and Violation Detection

The object modes and violation detection features of the database can help you monitor data integrity. These features are particularly powerful when they are combined during schema changes or when insert, delete, and update operations are performed on large volumes of data over short periods.

Database objects, within the context of a discussion of the object modes feature, are constraints, indexes, and triggers, and each of them have different modes. Do not confuse database objects that are relevant to the object modes feature with generic database objects. Generic database objects are things like tables and synonyms.

Definitions of Object Modes

You can set disabled, enabled, or filtering modes for a constraint or a unique index. You can set disabled or enabled modes for a trigger or a duplicate index. You can use database object modes to control the effects of INSERT, DELETE, and UPDATE statements.

Enabled Mode

Constraints, indexes, and triggers are enabled by default.

When a database object is enabled, the database server recognizes the existence of the database object and takes the database object into consideration while it executes an INSERT, DELETE, or UPDATE statement. Thus, an enabled constraint is enforced, an enabled index updated, and an enabled trigger is executed when the trigger event takes place.

When you enable constraints and unique indexes, if a violating row exists, the data manipulation statement fails (that is no rows change) and the database server returns an error message.

You can identify the reason for the failure when you analyze the information in the violations and diagnostic tables. You can then take corrective action or roll back the operation.

Disabled Mode

When a database object is disabled, the database server does not take it into consideration during the execution of an INSERT, DELETE, or UPDATE statement. A disabled constraint is not enforced, a disabled index is not updated, and a disabled trigger is not executed when the trigger event takes place. When you disable constraints and unique indexes, any data manipulation statement that violates the restriction of the constraint or unique index succeeds, (that is the target row is changed) and the database server does not return an error message.

Filtering Mode

When a constraint or unique index is in filtering mode, the statement succeeds and the database server enforces the constraint or the unique index requirement during an INSERT, DELETE, or UPDATE statement by writing the failed rows to the violations table associated with the target table. Diagnostic information about the constraint violation is written to the diagnostics table associated with the target table.

Example of Modes with Data Manipulation Statements

An example with the INSERT statement can illustrate the differences between the enabled, disabled, and filtering modes. Consider an INSERT statement in which a user tries to add a row that does not satisfy an integrity constraint on a table. For example, assume that user joe created a table named cust_subset, and this table consists of the following columns: ssn (customer's social security number), fname (customer's first name), lname (customer's last name), and city (city in which the customer lives). The ssn column has the INT data type. The other three columns have the CHAR data type.

Assume that user joe defined the lname column as not null but has not assigned a name to the not null constraint, so the database server has implicitly assigned the name n104_7 to this constraint. Finally, assume that user joe created a unique index named unq_ssn on the ssn column.

Now user linda who has the Insert privilege on the cust_subset table enters the following INSERT statement on this table:

INSERT INTO cust_subset (ssn, fname, city)

	VALUES (973824499, "jane", "los altos")

To better understand the distinctions among enabled, disabled, and filtering modes, you can view the results of the preceding INSERT statement in the following three sections.

Results of the Insert Operation When the Constraint Is Enabled

If the not null constraint on the cust_subset table is enabled, the INSERT statement fails to insert the new row in this table. Instead user linda receives the following error message when she enters the INSERT statement:

-292 An implied insert column lname does not accept NULLs.

Results of the Insert Operation When the Constraint Is Disabled

If the not null constraint on the cust_subset table is disabled, the INSERT statement that user linda issues successfully inserts the new row in this table. The new row of the cust_subset table has the following column values.


ssn	fname	lname	city
973824499	jane	`NULL`	los altos

Results of the Insert When Constraint Is in Filtering Mode

If the not null constraint on the cust_subset table is set to the filtering mode, the INSERT statement that user linda issues fails to insert the new row in this table. Instead the new row is inserted into the violations table, and a diagnostic row that describes the integrity violation is added to the diagnostics table.

Assume that user joe has started a violations and diagnostics table for the cust_subset table. The violations table is named cust_subset_vio, and the diagnostics table is named cust_subset_dia. The new row added to the cust_subset_vio violations table when user linda issues the INSERT statement on the cust_subset target table has the following column values.


ssn	fname	lname	city	informix_tupleid	informix_optype	informix_recowner
973824499	jane	NULL	los altos	1	I	linda

This new row in the cust_subset_vio violations table has the following characteristics:

The first four columns of the violations table exactly match the columns of the target table. These four columns have the same names and the same data types as the corresponding columns of the target table, and they have the column values that were supplied by the INSERT statement that user linda entered.

The value 1 in the informix_tupleid column is a unique serial identifier that is assigned to the nonconforming row.

The value I in the informix_optype column is a code that identifies the type of operation that has caused this nonconforming row to be created. Specifically, I stands for an insert operation.

The value linda in the informix_recowner column identifies the user who issued the statement that caused this nonconforming row to be created.

The INSERT statement that user linda issued on the cust_subset target table also causes a diagnostic row to be added to the cust_subset_dia diagnostics table. The new diagnostic row added to the diagnostics table has the following column values.


informix_tupleid	objtype	objowner	objname
1	C	joe	n104_7

This new diagnostic row in the cust_subset_dia diagnostics table has the following characteristics:

This row of the diagnostics table is linked to the corresponding row of the violations table by means of the informix_tupleid column that appears in both tables. The value 1 appears in this column in both tables.

The value C in the objtype column identifies the type of integrity violation that the corresponding row in the violations table caused. Specifically, the value C stands for a constraint violation.

The value joe in the objowner column identifies the owner of the constraint for which an integrity violation was detected.

The value n104_7 in the objname column gives the name of the constraint for which an integrity violation was detected.

By joining the violations and diagnostics tables, user joe (who owns the cust_subset target table and its associated special tables) or the DBA can find out that the row in the violations table whose informix_tupleid value is 1 was created after an INSERT statement and that this row is violating a constraint. The table owner or DBA can query the sysconstraints system catalog table to determine that this constraint is a not null constraint. Now that the reason for the failure of the INSERT statement is known, user joe or the DBA can take corrective action.

Multiple Diagnostic Rows for One Violations Row

In the preceding example, only one row in the diagnostics table corresponds to the new row in the violations table. However, more than one diagnostic row can be added to the diagnostics table when a single new row is added to the violations table. For example, if the ssn value (973824499) that user linda entered in the INSERT statement had been the same as an existing value in the ssn column of the cust_subset target table, only one new row would appear in the violations table, but the following two diagnostic rows would be present in the cust_subset_dia diagnostics table.


informix_tupleid	objtype	objowner	objname
1	C	joe	n104_7
1	I	joe	unq_ssn

Both rows in the diagnostics table correspond to the same row of the violations table because both of these rows have the value 1 in the informix_tupleid column. However, the first diagnostic row identifies the constraint violation caused by the INSERT statement that user linda issued, while the second diagnostic row identifies the unique-index violation caused by the same INSERT statement. In this second diagnostic row, the value I in the objtype column stands for a unique-index violation, and the value unq_ssn in the objname column gives the name of the index for which the integrity violation was detected.

For more information about how to set database object modes, see the SET Database object mode statement in the Informix Guide to SQL: Syntax.

Violations and Diagnostics Tables

When you start a violations table for a target table, any rows that violate constraints and unique indexes during insert, update, and delete operations on the target table do not cause the entire operation to fail, but are filtered out to the violations table. The diagnostics table contains information about the integrity violations caused by each row in the violations table. By examining these tables, you can identify the cause of failure and take corrective action by either fixing the violation or rolling back the operation.

After you create a violations table for a target table, you cannot alter the columns or the fragmentation of the base table or the violations table. If you alter the constraints on a target table after you have started the violations table, nonconforming rows will be filtered to the violations table.

When you create a violations table for a target table on Enterprise Decision Server, all constraints are in filtering mode. The violations table contains fields that record the diagnostic information, thus no separate diagnostics table exists.

For information about how to start and stop the violations tables, see the START VIOLATIONS TABLE and STOP VIOLATIONS TABLE statements in the Informix Guide to SQL: Syntax.

Relationship of Violations Tables and Database Object Modes

If you set the constraints or unique indexes defined on a table to the filtering mode, but you do not create the violations and diagnostics tables for this target table, any rows that violate a constraint or unique-index requirement during an insert, update, or delete operation are not filtered to a violations table. Instead, you receive an error message that indicates that you must start a violations table for the target table.

Examples of START VIOLATIONS TABLE Statements

The following examples show different ways to execute the START VIOLATIONS TABLE statement.

Starting Violations and Diagnostics Tables Without Specifying Their Names

To start a violations and diagnostics table for the target table named customer in the demonstration database, enter the following statement:

START VIOLATIONS TABLE FOR customer

Because your START VIOLATIONS TABLE statement does not include a USING clause, the violations table is named customer_vio by default, and the diagnostics table is named customer_dia by default. The customer_vio table includes the following columns:

customer_num

fname

lname

company

address1

address2

city

state

zipcode

phone

informix_tupleid

informix_optype

informix_recowner

The customer_vio table has the same table definition as the customer table except that the customer_vio table has three additional columns that contain information about the operation that caused the bad row.

The customer_dia table includes the following columns:

informix_tupleid

objtype

objowner

objname

This list of columns shows an important difference between the diagnostics table and violations table for a target table. Whereas the violations table has a matching column for every column in the target table, the columns of the diagnostics table do not match any columns in the target table. The diagnostics table created by any START VIOLATIONS TABLE statement always has the same columns with the same column names and data types.

Starting Violations and Diagnostics Tables and Specifying Their Names

The following statement starts a violations and diagnostics table for the target table named items. The USING clause assigns explicit names to the violations and diagnostics tables. The violations table is to be named exceptions, and the diagnostics table is to be named reasons.

START VIOLATIONS TABLE FOR items
USING exceptions, reasons

Specifying the Maximum Number of Rows in the Diagnostics Table

The following statement starts violations and diagnostics tables for the target table named orders. The MAX ROWS clause specifies the maximum number of rows that can be inserted into the diagnostics table when a single statement, such as an INSERT or SET DATABASE OBJECT MODE statement, is executed on the target table.

START VIOLATIONS TABLE FOR orders MAX ROWS 50000

If you do not specify a value for MAX ROWS when you create a violations table, no maximum (other than disk space) will be imposed.

Example of Privileges on the Violations Table

The following example illustrates how the initial set of privileges on a violations table is derived from the current set of privileges on the target table.

For example, assume that we created a table named cust_subset and that this table consists of the following columns: ssn (customer's social security number), fname (customer's first name), lname (customer's last name), and city (city in which the customer lives).

The following set of privileges exists on the cust_subset table:

User alvin is the owner of the table.

User barbara has the Insert and Index privileges on the table. She also has the Select privilege on the ssn and lname columns.

User carrie has the Update privilege on the city column. She also has the Select privilege on the ssn column.

User danny has the Alter privilege on the table.

Now user alvin starts a violations table named cust_subset_viols and a diagnostics table named cust_subset_diags for the cust_subset table, as follows:

START VIOLATIONS TABLE FOR cust_subset
USING cust_subset_viols, cust_subset_diags

The database server grants the following set of initial privileges on the cust_subset_viols violations table:

User alvin is the owner of the violations table, so he has all table-level privileges on the table.

User barbara has the Insert, Delete, and Index privileges on the violations table. She also has the Select privilege on the following columns of the violations table: the ssn column, the lname column, the informix_tupleid column, the informix_optype column, and the informix_recowner column.

User carrie has the Insert and Delete privileges on the violations table. She has the Update privilege on the following columns of the violations table: the city column, the informix_tupleid column, the informix_optype column, and the informix_recowner column. She has the Select privilege on the following columns of the violations table: the ssn column, the informix_tupleid column, the informix_optype column, and the informix_recowner column.

User danny has no privileges on the violations table.

Example of Privileges on the Diagnostics Table

The following example illustrates how the initial set of privileges on a diagnostics table is derived from the current set of privileges on the target table.

For example, assume that a table called cust_subset consists of the following columns: ssn (customer's social security number), fname (customer's first name), lname (customer's last name), and city (city in which the customer lives).

The following set of privileges exists on the cust_subset table:

User alvin is the owner of the table.

User barbara has the Insert and Index privileges on the table. She also has the Select privilege on the ssn and lname columns.

User carrie has the Update privilege on the city column. She also has the Select privilege on the ssn column.

User danny has the Alter privilege on the table.

Now user alvin starts a violations table named cust_subset_viols and a diagnostics table named cust_subset_diags for the cust_subset table, as follows:

START VIOLATIONS TABLE FOR cust_subset
USING cust_subset_viols, cust_subset_diags

The database server grants the following set of initial privileges on the cust_subset_diags diagnostics table:

User alvin is the owner of the diagnostics table, so he has all table-level privileges on the table.

User barbara has the Insert, Delete, Select, and Index privileges on the diagnostics table.

User carrie has the Insert, Delete, Select, and Update privileges on the diagnostics table.

User danny has no privileges on the diagnostics table.

Informix Guide to SQL: TutorialModifying Data

Data Integrity

Entity Integrity

Semantic Integrity

Referential Integrity

Using the ON DELETE CASCADE Option

Example of Cascading Deletes

Restrictions on Cascading Deletes

Object Modes and Violation Detection

Definitions of Object Modes

Example of Modes with Data Manipulation Statements

Violations and Diagnostics Tables

Informix Guide to SQL: Tutorial
Modifying Data