informix
Informix DataBlade API Programmer's Manual
Creating Special-Purpose UDRs

Optimizing a UDR

The DataBlade API provides support for you to create the following kinds of special-purpose UDRs to optimize UDR performance:

Creating Selectivity and Cost Functions

The query optimizer uses the selectivity and cost of a query to help select the best query plan. To help the optimizer select the best query plan, you can provide the query optimizer with information about the selectivity and cost of your UDR, This information is extremely useful for an expensive UDR, a UDR that requires a lot of execution time or resources to execute. When the query optimizer can obtain the selectivity and cost for an expensive UDR, it can better determine when to execute the UDR in the query plan.

When you write an expensive UDR, you can indicate the following performance-related characteristics of the UDR to assist the query optimizer in developing a query plan:

This section describes how to create selectivity and cost functions for an expensive UDR.

Tip: The Informix BladeSmith development tool automatically generates C source code for selectivity and cost functions as well as the SQL statements to register the these functions. For more information, see the "DataBlade Developers Kit User's Guide."

Query Selectivity

The query optimizer uses the selectivity to estimate the number of rows that a query will return. The selectivity is a floating-point value between zero (0) and one that represents the percentage of rows for which the UDR is expected to return a true value (MI_TRUE). If your UDR returns a BOOLEAN value (mi_boolean), you can specify the selectivity of a UDR when you register it, as follows.

Selectivity Routine Modifier
Selectivity is constant for every invocation of the expensive UDR SELCONST = selectivity_value selectivity_value is a floating-point value between 0 and 1
Selectivity varies according to some execution conditions SELFUNC = seletivity_func selectivity_func is the name of a selectivity function that returns a floating-point value between 0 and 1 to indicate the selectivity of the expensive UDR

To determine the selectivity of the expensive UDR, the query optimizer either uses the constant selectivity value or calls the selectivity function. A selectivity function is a user-defined function that is associated with an expensive UDR. It has the following restrictions:

Query Cost

The query optimizer uses a cost value to determine the total cost of executing a query. If your UDR returns a BOOLEAN value (mi_boolean), you can specify the cost of a UDR when you register it, as follows.

Cost Routine Modifier
Cost is constant for every invocation of the UDR PERCALL_COST = cost_value cost_value is a floating-point value between
0 and 1
Cost varies according to some execution conditions COSTFUNC = cost_func cost_func is the name of a cost UDR that returns a floating-point value between 0 and 1 to indicate the selectivity of the UDR

To determine the cost of the expensive UDR, the query optimizer either uses the constant cost value or calls the cost function. A cost function is a user-defined function that is associated with an expensive UDR. It has the following restrictions:

MI_FUNCARG Data Type

The MI_FUNCARG data type is an Informix-defined opaque type that contains information about the expensive UDR of a selectivity or cost function. This data type can provide information about the following kinds of arguments in the expensive UDR.

Expensive-UDR Argument Type Argument-Type Constant
Argument is a constant value MI_FUNCARG_CONSTANT
Argument is a column value MI_FUNCARG_COLUMN
Argument is parameter MI_FUNCARG_PARAM

The DataBlade API provides the following accessor functions for the MI_FUNCARG data type. You can use any of these functions to extract information about the expensive-UDR arguments from the selectivity or cost function.

Figure 13-27 lists the DataBlade API accessor functions that obtain information from the MI_FUNCARG data type.

Figure 13-27
Argument Information in the MI_FUNCARG Data Type

MI_FUNCARG Information DataBlade API Function
Information about the expensive UDR
The routine identifier for the expensive UDR mi_funcarg_get_routine_id()
The routine name for the expensive UDR mi_funcarg_get_routine_name()
General expensive-UDR argument information
Whether the expensive-UDR argument is a column, constant, or parameter mi_funcarg_get_argtype()
The data type of expensive-UDR argument mi_funcarg_get_datatype()
The length of the expensive-UDR argument mi_funcarg_get_datalen()
Constant argument (MI_FUNCARG_CONSTANT)
The constant value of the expensive-UDR argument mi_funcarg_get_constant()
Determines if the value of the expensive-UDR argument is the SQL NULL value mi_funcarg_isnull()
Column-value argument (MI_FUNCARG_COLUMN)
The column number of the column associated with the expensive-UDR argument mi_funcarg_get_colno()
The table identifier of the table that contains the column associated with the expensive-UDR argument mi_funcarg_get_tabid()
The distribution information for the column associated with the expensive-UDR argument mi_funcarg_get_distrib()

Important: To a DataBlade API module, the MI_FUNCARG data type is an opaque data type. Do not access its internal fields directly. Informix does not guarantee that the internal structure of this opaque data type will not change in future releases. Therefore, to create portable code, always use the accessor functions in Figure 13-27 to obtain values in this data type.

For example, you can write the following query:

Suppose you register the meets_cost() function with a selectivity function named meets_cost_selfunc(), as follows:

Because the meets_cost() function returns a BOOLEAN value, you can write a selectivity function for the function. You write meets_cost_selfunc() so that it expects two arguments of the data type MI_FUNCARG. The following table shows what different MI_FUNCARG accessor function return when you invoke them for each of the arguments of the meets_cost() function.

DataBlade API Function Argument 1 Argument 2
mi_funcarg_get_argtype() MI_FUNCARG_COLUMN MI_FUNCARG_CONSTANT
mi_funcarg_get_tabid() Table identifier of tab1 Undefined
mi_funcarg_get_colno() Column number of int_col Undefined
mi_funcarg_get_datatype() Type identifier for data type of tab1.int_col Type identifier for INTEGER data type
mi_funcarg_get_datalen() Length of tab1.int_col Length of INTEGER
mi_funcarg_isnull() FALSE FALSE
mi_funcarg_get_constant() Undefined An MI_DATUM that holds the value of 20

Creating Negator Functions

A negator function is a special UDR that is associated with a Boolean user-defined function. It evaluates the Boolean NOT condition for its associated user-defined function. For example, if an expression in a WHERE clause invokes a Boolean user-defined function (UDR-Boolfunc), the SQL optimizer can decide whether it is more efficient to replace occurrences of the expression:

NOT (UDR-Boolfunc)

with a call to the negator function (UDR-func-negator).

To implement a negator function with a C user-defined function, follow these steps:

  1. Declare the negator function so that its parameters are exactly the same as its associated user-defined function and its return value is BOOLEAN (mi_boolean).
  2. Within the negator function, perform the tasks to evaluate the NOT condition of the associated Boolean user-defined function.
  3. Register the negator function as a user-defined function with the CREATE FUNCTION statement.
  4. Associate the Boolean user-defined function and its negator function when you register the user-defined function.
  5. Specify the name of the negator function with the NEGATOR routine modifier in the CREATE FUNCTION statement that registers the user-defined function.

For more information about Boolean user-defined functions and negator functions, see Extending Informix Dynamic Server 2000.

Creating Parallelizable UDRs

The Parallel Database Query (PDQ) feature allows the database server to run a single SQL statement in parallel. When you send a query to the database server, it breaks your request into a set of discrete subqueries, each of which can be assigned to a different CPU virtual processor. A parallelizable query is a query that can be executed in parallel. PDQ is especially effective when your tables are fragmented and your server computer has more than one CPU.

A parallelizable UDR is a C UDR that can be executed in parallel when it is invoke within a parallelizable query. If you write your C UDR to be parallelizable, it can be executed in parallel when the query that invokes it is executed in parallel. That is, the C UDR can execute on subsets of table data just as the query itself can. A query that invokes a non-parallelizable UDR can still run in parallel. However, the calls to the UDR do not run in parallel. Similarly, prepared queries that invoke a parallelizable query do not run the UDR in parallel.

To create a parallelizable C UDR, take the following steps:

Writing the Parallelizable UDR

To write a parallelizable C UDR, you must ensure that the UDR does not include any calls to the non-PDQ-thread-safe DataBlade API functions that Figure 13-28 lists.

Figure 13-28
Non-PDQ-Thread-Safe DataBlade API Function

Category of Non-PDQ-Thread-Safe Function DataBlade API Function
Statement processing
Statement execution A parallelizable UDR cannot parse an SQL statement. mi_exec(), mi_prepare()
Current-statement processing No current statement exists in a parallelizable UDR. Therefore, these functions are not useful. mi_binary_query(), mi_command_is_finished(), mi_get_result(), mi_get_row_desc_without_row(), mi_next_row(), mi_query_finish(), mi_query_interrupt(), mi_result_command_name(), mi_result_row_count(), mi_value(), mi_value_by_name()
Prepared statements No prepared statement exists because you cannot prepare one in a parallelizable UDR. Therefore, these functions are not useful. mi_close_statement(), mi_drop_prepared_statement(), mi_exec_prepared_statement(), mi_fetch_statement(), mi_get_statement_row_desc(), mi_open_prepared_statement(), mi_statement_command_name() All input-parameter accessor functions: mi_parameter_*
(see
Figure 8-8 on page 8-21)
Transfer of data Even though these type-transfer functions are PDQ-thread-safe, they are usually called within the send and receive support functions of an opaque type and are likely to be called during statement processing. All type-transfer functions: mi_get_*, mi_put_*
(see
Figure 14-5 on page 14-19)
Other mi_current_command_name()
Save-set handling All save-set functions: mi_save_set_*
(see
Figure 8-18 on page 8-84)
Complex-type (collections and row types) handling
Collection processing All collection functions: mi_collection_*
(see
Collections)
Row-type processing mi_get_row_desc(), mi_get_row_desc_from_type_desc(), mi_get_row_desc_without_row(), mi_get_statement_row_desc() mi_row_create(), mi_row_free(), mi_row_desc_create(), mi_row_desc_free()
Complex-type processing Type-descriptor accessor functions if they access a complex type: mi_type_*
(see
Figure 2-1 on page 2-6) Column functions if they access a complex type: mi_column_*
(see
Figure 5-11 on page 5-24)
Operating-system file access All file-access functions: mi_file_*
(see
Figure 12-22 on page 12-76)
Tracing
Even though the files listed here are not PDQ-thread-safe, you can include most statements that generate trace output in a parallelizable UDR. mi_tracefile_set(), mi_tracelevel_set() GL_DPRINTF
Miscellaneous mi_get_connection_option(), mi_get_database_info(), mi_get_type_source_type()
s

A C UDR cannot call (either explicitly or implicitly) any of the DataBlade API functions in Figure 13-28. If you attempt to run a UDRthat contains a non-PDQ-thread-safe function, the database server generates an error. If your UDR must call one of the functions in Figure 13-28, it cannot be parallelizable.

Keep the following considerations in mind when you write a UDR to be parallelizable:

Registering the Parallelizable UDR

When you register a UDR with the PARALLELIZABLE routine modifier, you tell the database server that the UDR was written according to the guidelines in Writing the Parallelizable UDR. That is, the UDR does not call any DataBlade API functions that are non-PDQ-thread-safe. However, registering the UDR with PARALLELIZABLE modifier does not guarantee that every invocation of the UDR executes in parallel. The decision whether to parallelize a query and any accompanying UDRs is made when the query is parsed and optimized.

Executing the Parallelizable UDR

When a query with a parallelizable UDR executes in parallel, each routine instance might have more than one routine sequence. For a parallelized UDR, the routine manager creates a routine sequence for each scan thread of the query.

For example, suppose you have the following query:

Suppose also that the table1 table in the preceding query is fragmented into four fragments and the a_func() user-defined function was registered with the PARALLELIZABLE routine modifier. When this query executes in serial, it contains two routine instances (one in the select list and one in the WHERE clause) and two routine sequences. However, when this query executes in parallel over table1, it still contains two routine instances but it now has six routine sequences:

The MI_FPARAM structure holds the information of the routine sequence. Therefore, the routine manager allocates an MI_FPARAM structure for each scan thread of the parallelized query. All invocations of the UDR that execute in the scan thread can share the information in an MI_FPARAM structure. However, UDRs in different scan threads cannot share MI_FPARAM information across scan threads.

Tip: The DataBlade API also supports memory locking for a parallelizable UDR that uses data that is shared with other UDRs or shared by multiple instances of the same routine. This memory locking allows the UDR to implement concurrency control on its data. However, the memory-locking feature is only available to DataBlade partners for special DataBlade functionality. For more information on the memory-locking feature, see your Informix sales representative.

For more information about how the routine manager creates a routine sequence, see Creating the Routine Sequence.

Debugging the Parallelizable UDR

You can use the SQL statement SET EXPLAIN to determine whether a parallelizable query is actually being executed in parallel. The SET EXPLAIN statement executes when the database server optimizes a statement. It creates a file that contains:

For more information on SET EXPLAIN, see its description in the Informix Guide to SQL: Syntax and the Performance Guide.

The following onstat options are useful to track execution of parallel activities:

For more information on the onstat utility, see your Administrator's Reference.


Informix DataBlade API Programmer's Manual, Version 9.2
Copyright © 1999, Informix Software, Inc. All rights reserved