INFORMIX
Informix Guide to SQL: Syntax
Chapter 1: SQL Statements
Home Contents Index Master Index New Book

SQL Statements

UPDATE STATISTICS

Use the UPDATE STATISTICS statement to:

Syntax

Element Purpose Restrictions Syntax

column name

The name of a column in the specified table

The column must exist. You cannot use the name of a BYTE or TEXT column with the MEDIUM or HIGH keywords.

Identifier, p. 1-966

conf

A measure of confidence in the accuracy of medium distribution data. Confidence level is expressed as the proportion of values obtained with the MEDIUM keyword that you project you would also obtain with the HIGH keyword. The default confidence level is 0.95.

Valid values range from the minimum confidence level of 0.80 to the maximum value of 0.99.

Literal Number,
p.
1-1001

percent

The percentage of samples in each bin of a distribution, or bin resolution. With the MEDIUM keyword, the default value of percent is 2.5. With the HIGH keyword, the default value of percent is 0.5. For further information on this parameter, see "Using the MEDIUM or HIGH Keyword".

The minimum resolution possible for a table is 1/nrows, where nrows is the number of rows in the table.

Literal Number,
p.
1-1001

Usage

Use the UPDATE STATISTICS statement to distribute data values in table columns and to optimize execution plans for routines. The information produced by the UPDATE STATISTICS statement is restricted to objects in the database from which you execute the statement.

Tables and Columns

The database server evaluates statistics to determine the optimal execution plan for queries

The UPDATE STATISTICS statement stores statistics in the systables, syscolumns, sysindexes, and sysdistrib system catalog tables. These stored statistics describe the distribution of data values in tables, columns, and indexes.

The server does not automatically update systables, syscolumns, sysindexes, and sysdistrib if a change to the database obsoletes the corresponding statistics in these system catalog tables. Issue an UPDATE STATISTICS statement to ensure that the stored distribution information reflects the state of the database.

Routines

The sysprocplan system catalog table stores execution plans for routines. Two actions update the sysprocplan system catalog table:

If you change an object, such as a table, you can run UPDATE STATISTICS to reoptimize on demand, rather than waiting until the routine next executes.

When to Update Statistics

Update the statistics in the following situations:

    You can choose to convert the indexes table by table or for the entire database at one time. Follow the conversion guidelines in the Informix Migration Guide.

Updating Statistics for Tables

The optimizer estimates the effect of a WHERE clause by examining, for each column included in the WHERE clause, the proportionate occurrence of data values contained in the column. To prepare for the optimizer, the UPDATE STATISTICS statement distributes cell values from a one column into ranges, each of which contains an equal portion of the column data.

A distribution is a mapping of the data in a column into a set of column values. The contents of the column are divided into bins, each of which represents a percentage of data. For example, if one bin holds 2 percent of the data, 50 of these 2-percent bins hold all the data. A bin contains the particular range of data values that reflects the appropriate percentage of entries in the column.

Using the FOR TABLE Keywords

Without a FOR TABLE clause, UPDATE STATISTICS updates data for every table in the current database, including the system tables.

Use FOR TABLE to exclude statistics on system tables.

The FOR TABLE clause without a table name updates the statistics for all tables, including temporary tables, in the current database.

Specify a table name or synonym name to update statistics for only that table. You can explicitly update the statistics for a temporary table or build distributions for a temporary table by specifying the name of the table.

To narrow the scope of UPDATE STATISTICS further, specify column names.

Important: You cannot create distributions for TEXT or BYTE columns. If you include a TEXT or BYTE column in UPDATE STATISTICS (MEDIUM) or UPDATE STATISTICS (HIGH), the statement does not return an error or create distributions for those columns, but it does construct distributions for other columns in the list.

Using the FOR TABLE ONLY Keywords

If the table specified in the UPDATE STATISTICS ON TABLE statement has subtables, the database server creates distributions for that table and every table under it in the hierarchy. For example, assume your database has the typed table hierarchy that appears in Figure 1-2, which shows a supertable named employee that has a subtable named sales_rep. The sales_rep table, in turn, has a subtable named us_sales_rep.

Figure 1-2

To update statistics on both tables sales_rep and us_sales_rep, you would use the following statement:

Use the ONLY keyword to collect data for one table in a hierarchy of typed tables. The following example creates a distribution of data for each column in table sales_rep but does not act on tables employee or us_sales_rep:

Tip: The more specific you make the list of objects that UPDATE STATISTICS examines, the faster it completes execution. For help in deciding what is needed, see "When to Update Statistics".
Use the FOR TABLE and ONLY keywords to specify that you want statistics updated for only some of the tables and columns in the database. Use the LOW, MEDIUM, or HIGH keyword to indicate the precision of every distribution that a single UPDATE STATISTICS statement creates. Limiting the number of columns distributed speeds the update. Similarly, precision effects the speed of the update. If all other keywords are the same, LOW works fastest, but HIGH examines the most data.

Using the LOW Keyword

To create a low distribution, use the LOW keyword or issue the UPDATE STATISTICS statement without a distribution level keyword. A low distribution update does the following:

Tip: If you want the UPDATE STATISTICS statement to do minimal work, specify a column that is not part of an index.
The following example updates statistics on the customer_num column of the customer table. All distributions associated with the customer table remain intact, even those that already exist on the customer_num column.

Using LOW with DROP DISTRIBUTIONS
The DROP DISTRIBUTIONS keywords force the removal of distribution information from the sysdistrib system catalog table to accompany the construction of a low distribution.

When you issue the statement with a table name but no column names, all the distributions for the table name are dropped. When you specify column names in the UPDATE STATISTICS, only the distribution data for those columns is dropped from sysdistrib.

You must have the DBA privilege or be the owner of the table in order to drop distributions.

The following example shows how to remove distributions for the customer_num column in the customer table:

Using the MEDIUM or HIGH Keyword

A medium or high distribution update does the following:

You must have the DBA privilege or be the owner of the table in order to create high or medium distributions.

Both the MEDIUM or HIGH keywords provide a RESOLUTION percent clause with which you can set the percentage of data distributed to every bin. Unless you change the percent value with a RESOLUTION clause:

The HIGH keyword collects data from every row to construct an exact distribution for each column. UPDATE STATISTICS (MEDIUM) samples a percentage of data rows to construct statistically significant, but not exact, distribution data. The medium distribution usually contains significantly less data and takes less time to construct than a high distribution on the same table.

You can specify a confidence ratio with the MEDIUM keyword. If you do not specify a value for conf, the default confidence is 0.95, which means that for approximately 95 percent of samples, the estimate is equivalent to using high distributions.

The HIGH keyword can take considerable time to gather the information across the database, particularly a database with large tables. The HIGH keyword might scan each table several times. The MEDIUM keyword scans tables at least once and takes longer to execute on a given table than the LOW keyword. To minimize processing time, specify a table name and column names within that table.

Tip: The amount of space that the DBUPSPACE environment variable designates determines the number of times that the database server scans a table. For information about DBUPSPACE, see Chapter 3 of the Informix Guide to SQL: Syntax.

MEDIUM or HIGH with DISTRIBUTIONS ONLY
The UPDATE STATISTICS statement reads through index pages to:

    Keys in these pages are removed from the btree cleaner list. For information on the btree cleaner list, see the INFORMIX-Universal Server Administrator's Guide.

Examining index information can consume considerable processing time. If you specify the DISTRIBUTIONS ONLY option with the MEDIUM or HIGH keywords, you do not collect statistics for index information.

The DISTRIBUTIONS ONLY keyword has no effect on information about tables, such as the number of pages used, the number of rows, and fragment information. UPDATE STATISTICS needs this data to construct accurate column distributions and requires little time and system resources to collect it.

In the following example, the UPDATE STATISTICS statement gathers distributions information, index information, and table information for the customer table:

However, in the following example, only distributions information and table information are gathered for the customer table. The DISTRIBUTIONS ONLY option prevents the construction of index information.

Updating Statistics for Routines

Use the FOR ROUTINE, FOR FUNCTION, FOR PROCEDURE, or FOR SPECIFIC clause to avoid updating tables. You specify a routine name or specific name, or a routine parameter list to limit the scope of the UPDATE STATISTICS statement to a particular definition of a routine.

Element Purpose Restrictions Syntax

routine name

The name given to the routine in a CREATE FUNCTION or CREATE PROCEDURE statement

The identifier must refer to an existing user-defined routine.

In an ANSI-compliant database, specify the owner as the prefix to the routine name.

Function Name, p. 1-963 or Procedure Name, p. 1-1008

Keyword Function

SPECIFIC

Reoptimizes the execution plan for a routine identified by specific name.

FUNCTION

Reoptimizes the execution plan for any function with the specified routine name (and parameter types that match routine parameter list, if supplied).

PROCEDURE

Reoptimizes the execution plan for any procedure with the specified routine name (and parameter types that match routine parameter list, if supplied).

ROUTINE

Reoptimizes the execution plan for functions and procedures with the specified routine name (and parameter types that match routine parameter list, if supplied).

Recommended Procedure for Updating Statistics

Informix recommends the following procedure for giving the optimizer the best possible information while incurring the lowest performance penalty:

    1. Run UPDATE STATISTICS in medium mode with the DISTRIBUTIONS ONLY option for each table. (If you are the database owner or DBA, and you want to gather statistics for the entire database, you can do that with a single command instead.). The default parameters are sufficient unless the table is very large. In this case, use a resolution of 1.00 and a confidence level of 0.99.

    2. Run UPDATE STATISTICS in high mode for all columns that head an index. For the fastest execution time of the UPDATE STATISTICS statement, you must execute one UPDATE STATISTICS statement in the high mode for each such column.

    3. For each multicolumn index, run UPDATE STATISTICS in low mode for all its columns.

This procedure executes rapidly because it constructs the index-information statistics only once for each index.

References

In the INFORMIX-Universal Server Performance Guide, see the discussion of UPDATE STATISTICS. In the Informix Migration Guide, see the discussion of how to use the dbschema utility to view distributions created with UPDATE STATISTICS.

WHENEVER

Use the WHENEVER statement to trap exceptions that occur during the execution of SQL statements.

Syntax

Element Purpose Restrictions Syntax

function name

Function or procedure that is called when an exception occurs

Function or procedure must exist at compile time.

Function or procedure name must conform to language-specific rules for functions or procedures.

label

Statement label to which program control transfers when an exception occurs

The label must be defined in same source file.

Label must conform to language-specific rules for statement labels.

Usage

Use of the WHENEVER statement is equivalent to placing an exception-checking routine after every SQL statement. The following table summarizes the types of exceptions for which you can check with the WHENEVER statement.

Type of Exception WHENEVER Clause For More Information

Errors

SQLERROR

page 1-821

Warnings

SQLWARNING

page 1-821

Not Found Condition
End of Data Condition

NOT FOUND

page 1-822

If you do not use the WHENEVER statement in a program, the program does not automatically abort when an exception occurs. Your program must explicitly check for exceptions and take whatever corrective action you desire. If you do not check for exceptions, the program simply continues running. However, as a result of the errors, the program might not perform its intended purpose.

In addition to specifying the type of exception for which to check, the WHENEVER statement also specifies what action to take when the specified exception occurs. The following table summarizes possible actions that WHENEVER can specify.

Type of Action WHENEVER Keyword For More Information

Continue program execution

CONTINUE

page 1-822

Stop program execution

STOP

page 1-822

Transfer control to a specified label

GOTO
GO TO

page 1-822

Transfer control to a named function or procedure

CALL

page 1-823

Scope of WHENEVER

The ESQL preprocessor, not the database server, handles the interpretation of the WHENEVER statement. When the preprocessor encounters a WHENEVER statement in an ESQL source file, it inserts the appropriate code into the preprocessed code after each SQL statement based on the exception and the action that WHENEVER lists. The preprocessor defines the scope of a WHENEVER statement as from the point that it encounters the statement in the source module until it encounters one of the following conditions:

Whichever condition the preprocessor encounters first as it sequentially processes the source module marks the end of the scope of the WHENEVER statement.

The following ESQL/C example program has three WHENEVER statements, two of which are WHENEVER SQLERROR statements. Line 4 uses STOP with SQLERROR to override the default CONTINUE action for errors. Line 8 specifies the CONTINUE keyword to return the handling of errors to the default behavior. For all SQL statements between lines 4 and 8, the preprocessor inserts code that checks for errors and halts program execution if an error occurs. Therefore, any errors that the INSERT statement on line 6 generates cause the program to stop.

After line 8, the preprocessor does not insert code to check for errors after SQL statements. Therefore, any errors that the INSERT statement (line 10), the SELECT statement (line 11), and DISCONNECT statement (line 12) generate are ignored. However, the SELECT statement does not stop program execution if it does not locate any rows; the WHENEVER statement on line 7 tells the program to continue if such an exception occurs.

SQLERROR Keyword

If you use the SQLERROR keyword, any SQL statement that encounters an error is handled as the WHENEVER SQLERROR statement directs. If an error occurs, the SQLCODE variable (sqlca.sqlcode) is less than zero and the SQLSTATE variable has a class code with a value greater than 02. The following statement causes a program to stop execution if an SQL error exists:

If you do not use any WHENEVER SQLERROR statements in a program, the default for WHENEVER SQLERROR is CONTINUE.

SQLWARNING Keyword

If you use the SQLWARNING keyword, any SQL statement that generates a warning is handled as the WHENEVER SQLWARNING statement directs. If a warning occurs, the first field of the warning structure in SQLCA (sqlca.sqlwarn.sqlwarn0) is set to W, and the SQLSTATE variable has a class code of 01.

In addition to setting the first field of the warning structure, a warning also sets an additional field to W. The field that is set indicates the type of warning that occurred. For more information, see the chapter on exception checking in the INFORMIX-ESQL/C Programmer's Manual.

The following statement causes a program to stop execution if a warning condition exists:

If you do not use any WHENEVER SQLWARNING statements in a program, the default for WHENEVER SQLWARNING is CONTINUE.

NOT FOUND Keywords

If you use the NOT FOUND keywords, exception handling for SELECT and FETCH statements is treated differently than for other SQL statements. The NOT FOUND keyword checks for the following cases:

In each case, the SQLCODE variable (sqlca.sqlcode) is set to 100, and the SQLSTATE variable has a class code of 02.

The following statement calls the no_rows() function each time the NOT FOUND condition exists:

If you do not use any WHENEVER NOT FOUND statements in a program, the default for WHENEVER NOT FOUND is CONTINUE.

CONTINUE Keyword

Use the CONTINUE keyword to instruct the program to ignore the exception and to continue execution at the next statement after the SQL statement. The default action for all exceptions is CONTINUE. You can use this keyword to turn off a previously specified option.

STOP Keyword

Use the STOP keyword to instruct the program to stop execution when the specified exception occurs. The following statement halts execution of an ESQL/C program each time that an SQL statement generates a warning:

GOTO Keyword

Use the GOTO clause to transfer control to the statement that the label identifies when a particular exception occurs. The GOTO keyword is the ANSI-compliant syntax of the clause. The GO TO keywords are a non-ANSI synonym for GOTO.

The following example shows a WHENEVER statement in INFORMIX-ESQL/C code that transfers control to the label missing each time that the NOT FOUND condition occurs:

You must define the labeled statement in each program block that contains SQL statements. If your program contains more than one function, you might need to include the labeled statement and its code in each function. When the preprocessor reaches the function that does not contain the labeled statement, it tries to insert the code associated with the labeled statement. However, if you do not define this labeled statement within the function, the preprocessor generates an error.

To correct this error, either put a labeled statement with the same label name in each function, issue another WHENEVER statement to reset the error condition, or use the CALL clause to call a separate function.

CALL Clause

Use the CALL clause to transfer program control to the named function or procedure when a particular exception occurs. Do not include parentheses after the function or procedure name. The following WHENEVER statement causes the program to call the error_recovery() function if the program detects an error:

When the named function completes, execution resumes at the next statement after the line that is causing the error. If you want to halt execution when an error occurs, include statements that terminate the program as part of the named function.

Observe the following restrictions on the named function:

References

See the EXECUTE FUNCTION, FETCH and GET DIAGNOSTICS statements in this manual.

See the chapter on exception checking and error checking in your SQL API product manual.




Informix Guide to SQL: Syntax, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.