The optimizer bases query-cost estimates on the number of rows to be retrieved from each table. In turn, the estimated number of rows is based on the selectivity of each conditional expression that is used within the WHERE clause. A conditional expression that is used to select rows is termed a filter.
The optimizer can use data distributions to calculate selectivities for the filters in a query. However, in the absence of data distributions, the database server calculates selectivities for filters of different types based on table indexes. The following table lists some of the selectivities that the optimizer assigns to filters of different types.
Filter Expression | Selectivity (F) |
---|---|
any-col IS NULL | F = 1/10 |
any-col = any-expression | F = 1/10 |
any-col > any-expression | F = 1/3 |
any-col < any-expression | F = 1/3 |
any-col MATCHES any-expression | F = 1/5 |
any-col LIKE any-expression | F = 1/5 |
... |
Selectivities calculated using data distributions are even more accurate than the ones that the preceding table shows, as follows:
Query filters can include user-defined functions. You can improve selectivity of filters that include user-defined functions with the following features:
You can create a functional index on the resulting values of a user-defined function on one or more columns. The function can be a built-in function or a user-defined function. When you create a functional index, the database server computes the return values of the function and stores them in the index. The database server can locate the return value of the function in an appropriate index without executing the function for each qualifying column.
You can write a user-defined selectivity function that calculates the expected fraction of rows that qualify for a particular user-defined function that acts as a filter.
For queries that use an end-user function as a filter, you can improve performance by writing a selectivity function for this end-user function.
For queries that use relational operators (<,>, ...) as filters, you can improve performance by writing a selectivity function for the associated operator function (lessthan( ), greaterthan( ), ...). For built-in types, the relational-operator functions are built-in functions. They have selectivity functions that can use data distributions, which the UPDATE STATISTICS statement can automatically generate.
For user-defined types, relational-operator functions do not automatically exist. You must write versions of these functions that handle your user-defined type. In addition, you must write any selectivity functions. If you want these selectivity functions to use data distributions, you must take the following actions: