Writing Selectivity and Cost Functions

The query optimizer uses the selectivity and cost of a query to help select the best query plan. To help the optimizer select the best query plan, you can provide the query optimizer with information about the selectivity and cost of your UDR.

This information is extremely useful for an expensive UDR, a UDR that requires a lot of execution time or resources to execute. When the query optimizer can obtain the selectivity and cost for an expensive UDR, it can better determine when to execute the UDR in the query plan.

When you write an expensive UDR, you can indicate the following performance-related characteristics of the UDR to assist the query optimizer in developing a query plan:

This section describes how to create selectivity and cost functions for an expensive UDR. For a general description of how the query optimizer uses cost and selectivity for UDRs, see the IBM Informix: User-Defined Routines and Data Types Developer's Guide.

The IBM Informix BladeSmith development tool automatically generates C source code for selectivity and cost functions as well as the SQL statements to register the these functions. For more information, see the IBM Informix: DataBlade Developer's Kit User's Guide.

Query Selectivity

If your UDR is a Boolean function, it can be used as a filter in a query. (For a list of Boolean functions that are useful as query filters, see Table 106.) The query optimizer uses the selectivity of a query to estimate the number of rows that the query will return. The selectivity is a floating-point value between zero (0) and one (1) that represents the percentage of rows for which the query (and each filter in the query) is expected to return a true value.

For a Boolean function likely to be used as a query filter, you can use the following routine modifiers to specify a selectivity for the function.

Selectivity	Routine Modifier
Selectivity is constant for every invocation of the Boolean function.	SELCONST = selectivity_value selectivity_value is a floating-point value between 0 and 1.
Selectivity varies according to some execution conditions.	SELFUNC = selectivity_func selectivity_func is the name of a selectivity function that returns a floating-point value between 0 and 1 to indicate the selectivity of the Boolean function.

When the query optimizer needs to determine the selectivity of the Boolean function, it either uses the constant selectivity value or calls the selectivity function, depending whether the Boolean function was registered with the SELCONST or SELFUNC routine modifier.

If you need to calculate the selectivity for a Boolean function at runtime, create a selectivity function.

To create a selectivity function

Write a C user-defined function to implement the selectivity function.
The selectivity function has the following coding requirements:
- The selectivity function must take the same number of arguments as its companion Boolean function.
- Each argument of the selectivity function must be declared of type MI_FUNCARG.
- The selectivity function must return the selectivity as a floating-point value (mi_real or mi_double_precision) that is between zero and one.
Register the selectivity function with the CREATE FUNCTION statement.
The SQL selectivity function has the following registration requirements:
- The selectivity function must take the same number of arguments as its companion Boolean function.
- Each argument of the selectivity function must be declared of type SELFUNCARG.
- The selectivity function must return the selectivity as a FLOAT value.
Associate the selectivity function with its companion UDR with the SELFUNC routine modifier when you register the companion UDR.

Query Cost

If your UDR requires a lot of system resources (such as a large number of disk accesses or network accesses), you can define a cost-of-execution for the UDR. The query optimizer uses the cost value to determine the total cost of executing a query.

You can define a cost for either a user-defined procedure or a user-defined function. However, user-defined functions can appear in queries because they return a value. Because user-defined procedures do not appear in queries, the query optimizer is not usually concerned with their cost.

When you register a UDR, you can specify its cost with one of the following routine modifiers.

Cost	Routine Modifier
Cost is constant for every invocation of the UDR.	PERCALL_COST = cost_value cost_value is a floating-point value between 0 and 1.
Cost varies according to some execution conditions.	COSTFUNC = cost_func cost_func is the name of a cost UDR that returns a floating-point value between 0 and 1 to indicate the selectivity of the UDR.

When the query optimizer needs to determine the cost of the UDR, it either uses the constant cost value or calls the cost function, depending whether the UDR was registered with the PERCALL_COST or COSTFUNC routine modifier.

To create a cost function

Write a C user-defined function to implement the cost function.
The cost function has the following coding requirements:
- The cost function must take the same number of arguments as its companion UDR.
- Each argument of the cost function must be declared of type MI_FUNCARG.
  For more information, see MI_FUNCARG Data Type.
- The cost function must return the cost as an integer value (mi_integer or mi_smallint).
Register the cost function with the CREATE FUNCTION statement.
The SQL cost function has the following registration requirements:
- The cost function must take the same number of arguments as its companion UDR.
- Each argument of the cost function must be declared of type SELFUNCARG.
- The cost function must return the selectivity as an INTEGER or SMALLINT value.
Associate the cost function with its companion UDR with the COSTFUNC routine modifier when you register the companion UDR.

MI_FUNCARG Data Type

The MI_FUNCARG data type is an Informix-defined opaque type that contains information about the companion UDR of a selectivity or cost function. Selectivity and cost functions both have the same number of arguments as their companion UDRs. To calculate selectivity or cost effectively, however, your user-defined function might need to know additional information about the context in which the UDR was called. The DataBlade API provides this contextual information in the MI_FUNCARG structure.

Each argument of a cost or selectivity function is of type MI_FUNCARG. The DataBlade API provides accessor functions for the MI_FUNCARG structure. You can use any of these functions to extract information about the companion-UDR arguments from the selectivity or cost function. Table 107 lists the DataBlade API accessor functions that obtain information from the MI_FUNCARG structure.

Table 107. Argument Information in the MI_FUNCARG Structure
MI_FUNCARG Information	DataBlade API Function
Information about the companion UDR:
	The identifier of the companion UDR	mi_funcarg_get_routine_id( )
The name of the companion UDR	mi_funcarg_get_routine_name( )
General expensive-UDR argument information:
	Whether the expensive-UDR argument is a column, constant, or parameter	mi_funcarg_get_argtype( )
The data type of expensive-UDR argument	mi_funcarg_get_datatype( )
The length of the expensive-UDR argument	mi_funcarg_get_datalen( )
Constant argument (MI_FUNCARG_CONSTANT):
	The constant value of the expensive-UDR argument	mi_funcarg_get_constant( )
Whether the value of the expensive-UDR argument is the SQL NULL value	mi_funcarg_isnull( )
Column-value argument (MI_FUNCARG_COLUMN):
	The column number of the column associated with the expensive-UDR argument	mi_funcarg_get_colno( )
The table identifier of the table that contains the column associated with the expensive-UDR argument	mi_funcarg_get_tabid( )
The distribution information for the column associated with the expensive-UDR argument	mi_funcarg_get_distrib( )

Table 108 lists the DataBlade API accessor functions that obtain general information about a companion UDR from the MI_FUNCARG structure.

Table 108. General Companion-UDR Information in the MI_FUNCARG Structure
MI_FUNCARG Information	DataBlade API Function
Information about the companion UDR:
	The identifier of the companion UDR	mi_funcarg_get_routine_id( )
The name of the companion UDR	mi_funcarg_get_routine_name( )
General companion-UDR argument information:
	Whether the companion-UDR argument is a column, constant, or parameter	mi_funcarg_get_argtype( )
The data type of companion-UDR argument	mi_funcarg_get_datatype( )
The length of the expensive-UDR argument	mi_funcarg_get_datalen( )

To a DataBlade API module, the MI_FUNCARG data type is an opaque data type. Do not access its internal fields directly. The internal structure of this opaque data type may change in future releases. Therefore, to create portable code, always use the accessor functions in Table 107 to obtain values in this data type.

The MI_FUNCARG structure categorizes each argument of the companion UDR arguments. The MI_FUNCARG data type identifies the following kinds of arguments in the companion UDR.

Companion-UDR Argument Type	Argument-Type Constant
Argument is a constant value	MI_FUNCARG_CONSTANT
Argument is a column value	MI_FUNCARG_COLUMN
Argument is a parameter	MI_FUNCARG_PARAM

In addition to the general companion-UDR information that the functions in Table 108 obtain, you can also obtain information about the arguments themselves. The information that you can obtain depends on the particular category of the companion-UDR argument. Table 109 lists the DataBlade API accessor functions that obtain argument information from the MI_FUNCARG structure.

Table 109. Argument Information in the MI_FUNCARG Structure
MI_FUNCARG Information	DataBlade API Function
Constant argument (MI_FUNCARG_CONSTANT):
	The constant value of the companion-UDR argument	mi_funcarg_get_constant( )
Determines if the value of the expensive-UDR argument is the SQL NULL value	mi_funcarg_isnull( )
Column-value argument (MI_FUNCARG_COLUMN):
	The column number of the column associated with the expensive-UDR argument	mi_funcarg_get_colno( )
The table identifier of the table that contains the column associated with the expensive-UDR argument	mi_funcarg_get_tabid( )
The distribution information for the column associated with the expensive-UDR argument	mi_funcarg_get_distrib( )

Suppose you register the meets_cost( ) function with a selectivity function named meets_cost_selfunc( ), as follows:

Because the meets_cost( ) function returns a BOOLEAN value, you can write a selectivity function for the function. You write meets_cost_selfunc( ) so that it expects two arguments of the data type MI_FUNCARG. The following table shows what different MI_FUNCARG accessor functions return when you invoke them for each of the arguments of the meets_cost( ) function.

Obtaining Information About Constant Arguments

DataBlade API Function	Argument 1	Argument 2
mi_funcarg_get_argtype( )	MI_FUNCARG_COLUMN	MI_FUNCARG_CONSTANT
mi_funcarg_get_datatype( )	Type identifier for data type of tab1.int_col	Type identifier for INTEGER data type
mi_funcarg_get_datalen( )	Length of tab1.int_col	Length of INTEGER
mi_funcarg_get_tabid( )	Table identifier of tab1	Undefined
mi_funcarg_get_colno( )	Column number of int_col	Undefined
mi_funcarg_isnull( )	FALSE	FALSE
mi_funcarg_get_constant( )	Undefined	An MI_DATUM structure that holds the value of `20`

When the companion UDR receives an argument that is a constant, you can obtain the following information about this constant from within the cost or selectivity function.

Obtaining Information About Column Arguments

MI_FUNCARG Information	DataBlade API Function
The constant value of the companion-UDR argument	mi_funcarg_get_constant( )
Determines if the value of the companion-UDR argument is the SQL NULL value	mi_funcarg_isnull( )

When the companion UDR receives an argument that is a column, you can obtain the following information about this column from the associated MI_FUNCARG argument of the cost or selectivity function.

MI_FUNCARG Information	DataBlade API Function
The column number of the column associated with the companion-UDR argument	mi_funcarg_get_colno( )
The table identifier of the table that contains the column associated with the companion-UDR argument	mi_funcarg_get_tabid( )
The data-distribution information for the column associated with the companion-UDR argument	mi_funcarg_get_distrib( )

The column number and table identifier are useful in a selectivity or cost function to obtain additional information about the column argument from the syscolumns or systables system catalog tables. The data distribution is useful if the determination of selectivity or cost depends on how the column values are distributed; that is, how many values in each range of values. Data distributions only make sense for data types that can be ordered.

The mi_funcarg_get_distrib( ) function obtains the contents of the encdat column of the sysdistrib system catalog table. The encdat column stores the data distribution for the column associated with the companion-UDR argument, as follows:

The mi_funcarg_get_distrib( ) function returns the data distribution in an mi_bitvarying structure as an mi_statret structure. The mi_statret structure can store the data distribution either directly in the structure (in the statdata.buffer field) or in a smart large object (in the statdata.mr field).