informix
Extending Informix Dynamic Server 2000
Improving UDR Performance

Extending UPDATE STATISTICS

The UPDATE STATISTICS statement collects statistics about the data in your database. The optimizer uses these statistics to determine the best path for an SQL statement.

For SQL statements that use user-defined data types, the optimizer can call custom selectivity and cost functions. (For more information on creating selectivity and cost functions, refer to Optimizing Functions in SQL Statements.) Selectivity and cost functions might need to use statistics about the nature of the data in a column. When you create the statcollect() function that collects statistics for a UDT, the database server executes this function automatically when a user runs the UPDATE STATISTICS statement with the MEDIUM or HIGH keyword.

Using UPDATE STATISTICS

The syntax of UPDATE STATISTICS is the same for user-defined data types as for built-in data types. Because the data distributions provide the optimizer with equivalent statistics, the database server does not calculate colmin and colmax for user-defined data types.

The statcollect() function executes once for every row that the database server scans during UPDATE STATISTICS. The number of rows that the database server scans depends on the mode and the confidence level. Executing UPDATE STATISTICS in HIGH mode causes the database server to scan all rows in the table. In MEDIUM mode the database server chooses the number of rows to scan based on the confidence level. The higher the confidence level, the higher the number of rows that the database server scans. For general information about UPDATE STATISTICS, refer to the Informix Guide to SQL: Syntax.

The statistics that the database server collects might require a smart large object for storage. The configuration parameter SBSSPACENAME specifies an sbspace for storing this information. If SBSSPACENAME is not set, the database server might not be able to collect the specified statistics.

Support Functions for UPDATE STATISTICS

The statcollect() and statprint() functions support the collection of statistics. If you want UPDATE STATISTICS to generate statistics for a user-defined data type, you must create these functions.

The stat Data Type

The statcollect() and statprint() functions use an SQL data type called stat.

The corresponding C language structure is called mi_statretval. For an exact description of mi_statretval, see the libmi header file.

Most of the information in mi_statretval is manipulated internally. However, two fields must be filled in by statcollect():

The statcollect() Function

When you run UPDATE STATISTICS, the database server calls the appropriate statcollect() function for each column that the database server scans.

The statcollect() function takes four arguments:

On the first call to statcollect(), MI_FPARAM contains a SET_INIT value. Check for this value in statcollect() and perform any initialization operations, such as allocating memory and initializing values.

On subsequent calls to statcollect(), MI_FPARAM contains a SET_RETONE value. At this point, statcollect() should read the column value from the first argument and place it in your distribution structure.

After all rows have been processed, the last call to statcollect() puts a value of SET_END in MI_FPARAM. For this final call, statcollect() should put the statistics in the stat data type and perform any memory deallocation.

You must declare the statcollect() function with HANDLESNULLS, but the function itself can ignore nulls if desired.

Allocate any memory used across multiple invocations of statcollect() from the PER_COMMAND pool and free it as soon as possible. Any memory not used across multiple invocations of statcollect() should be allocated from the PER_ROUTINE pool.

The statprint() Function

The statprint() function converts the statistics data collected by the statcollect() function to an LVARCHAR value that the database server can use to display information. The dbschema utility executes the statprint() function.

The statprint() function has two arguments. The first argument is a dummy argument of the required data type. The database server uses this argument to resolve the function. The first time the database server executes this function, it sets the first parameter to null.

The second argument is a value of the stat data type. The stat data type is a multirepresentational data type that the database server uses to store data collected by the statcollect() function.

The statprint() function must take the histogram, which is stored in multi-representational form, and convert it to a printable form.

After you register the functions, make sure those with DBA privilege or the table owner can execute the statcollect() and statprint() UDRs.

Example of User-Defined Statistics Functions

For examples of statprint() and statcollect() functions written in C, refer to in the \%INFORMIXDIR\dbdk\examples\Types\dapi\Statistics\Box\src\c directory, after you install the DataBlade Developers Kit.


Extending Informix Dynamic Server 2000, Version 9.2
Copyright © 1999, Informix Software, Inc. All rights reserved