Selectivity Functions

Home | Previous Page | Next Page Creating User-Defined Routines > Extending Data Types > Optimizing Queries >

Selectivity Functions

The optimizer bases query-cost estimates on the number of rows to be retrieved from each table. In turn, the estimated number of rows is based on the selectivity of each conditional expression that is used within the WHERE clause. A conditional expression that is used to select rows is termed a filter.

The optimizer can use data distributions to calculate selectivities for the filters in a query. However, in the absence of data distributions, the database server calculates selectivities for filters of different types based on table indexes. The following table lists some of the selectivities that the optimizer assigns to filters of different types.

Filter Expression	Selectivity (F)
any-col IS NULL	F = 1/10
any-col = any-expression	F = 1/10
any-col > any-expression	F = 1/3
any-col < any-expression	F = 1/3
any-col MATCHES any-expression	F = 1/5
any-col LIKE any-expression	F = 1/5
...

Selectivities calculated using data distributions are even more accurate than the ones that the preceding table shows, as follows:

Your IBM Informix: Performance Guide describes the filter expressions that can appear in WHERE clauses with their selectivities when no data distributions exist for a column (any-col). These selectivities are those that the database server calculates by default.
The UPDATE STATISTICS statement can generate statistics (data distributions) for columns of built-in data types. However, it cannot generate data distributions for columns of user-defined data types.
Columns of user-defined types require implementation of user-defined statistics for UPDATE STATISTICS to generate statistics (for example, for it to store data distributions in sysdistrib).

Query filters can include user-defined functions. You can improve selectivity of filters that include user-defined functions with the following features:

Functional indexes
You can create a functional index on the resulting values of a user-defined function on one or more columns. The function can be a built-in function or a user-defined function. When you create a functional index, the database server computes the return values of the function and stores them in the index. The database server can locate the return value of the function in an appropriate index without executing the function for each qualifying column.
User-defined selectivity functions
You can write a user-defined selectivity function that calculates the expected fraction of rows that qualify for a particular user-defined function that acts as a filter.
An end-user function
For queries that use an end-user function as a filter, you can improve performance by writing a selectivity function for this end-user function.
An operator function
For queries that use relational operators (<,>, ...) as filters, you can improve performance by writing a selectivity function for the associated operator function (lessthan( ), greaterthan( ), ...). For built-in types, the relational-operator functions are built-in functions. They have selectivity functions that can use data distributions, which the UPDATE STATISTICS statement can automatically generate.

For user-defined types, relational-operator functions do not automatically exist. You must write versions of these functions that handle your user-defined type. In addition, you must write any selectivity functions. If you want these selectivity functions to use data distributions, you must take the following actions:
- Provide user-defined statistics so that UPDATE STATISTICS saves the data distributions in the sysdistrib system catalog table.
- Access the sysdistrib system catalog table from within the selectivity function to obtain the data distributions for the column.