|
The DataBlade API provides support for you to create the following kinds of special-purpose UDRs to optimize UDR performance:
The query optimizer uses the selectivity and cost of a query to help select the best query plan. To help the optimizer select the best query plan, you can provide the query optimizer with information about the selectivity and cost of your UDR, This information is extremely useful for an expensive UDR, a UDR that requires a lot of execution time or resources to execute. When the query optimizer can obtain the selectivity and cost for an expensive UDR, it can better determine when to execute the UDR in the query plan.
When you write an expensive UDR, you can indicate the following performance-related characteristics of the UDR to assist the query optimizer in developing a query plan:
This section describes how to create selectivity and cost functions for an expensive UDR.
Tip: The Informix BladeSmith development tool automatically generates C source code for selectivity and cost functions as well as the SQL statements to register the these functions. For more information, see the "DataBlade Developers Kit User's Guide."
Query Selectivity
The query optimizer uses the selectivity to estimate the number of rows that a query will return. The selectivity is a floating-point value between zero (0) and one that represents the percentage of rows for which the UDR is expected to return a true value (MI_TRUE). If your UDR returns a BOOLEAN value (mi_boolean), you can specify the selectivity of a UDR when you register it, as follows.
To determine the selectivity of the expensive UDR, the query optimizer either uses the constant selectivity value or calls the selectivity function. A selectivity function is a user-defined function that is associated with an expensive UDR. It has the following restrictions:
The query optimizer uses a cost value to determine the total cost of executing a query. If your UDR returns a BOOLEAN value (mi_boolean), you can specify the cost of a UDR when you register it, as follows.
To determine the cost of the expensive UDR, the query optimizer either uses the constant cost value or calls the cost function. A cost function is a user-defined function that is associated with an expensive UDR. It has the following restrictions:
The MI_FUNCARG data type is an Informix-defined opaque type that contains information about the expensive UDR of a selectivity or cost function. This data type can provide information about the following kinds of arguments in the expensive UDR.
Expensive-UDR Argument Type | Argument-Type Constant |
---|---|
Argument is a constant value | MI_FUNCARG_CONSTANT |
Argument is a column value | MI_FUNCARG_COLUMN |
Argument is parameter | MI_FUNCARG_PARAM |
The DataBlade API provides the following accessor functions for the MI_FUNCARG data type. You can use any of these functions to extract information about the expensive-UDR arguments from the selectivity or cost function.
Figure 13-27 lists the DataBlade API accessor functions that obtain information from the MI_FUNCARG data type.
Figure 13-27
Important: To a DataBlade API module, the MI_FUNCARG data type is an opaque data type. Do not access its internal fields directly. Informix does not guarantee that the internal structure of this opaque data type will not change in future releases. Therefore, to create portable code, always use the accessor functions in Figure 13-27 to obtain values in this data type.
For example, you can write the following query:
Suppose you register the meets_cost() function with a selectivity function named meets_cost_selfunc(), as follows:
Because the meets_cost() function returns a BOOLEAN value, you can write a selectivity function for the function. You write meets_cost_selfunc() so that it expects two arguments of the data type MI_FUNCARG. The following table shows what different MI_FUNCARG accessor function return when you invoke them for each of the arguments of the meets_cost() function.
A negator function is a special UDR that is associated with a Boolean user-defined function. It evaluates the Boolean NOT condition for its associated user-defined function. For example, if an expression in a WHERE clause invokes a Boolean user-defined function (UDR-Boolfunc), the SQL optimizer can decide whether it is more efficient to replace occurrences of the expression:
with a call to the negator function (UDR-func-negator).
To implement a negator function with a C user-defined function, follow these steps:
For more information about Boolean user-defined functions and negator functions, see Extending Informix Dynamic Server 2000.
The Parallel Database Query (PDQ) feature allows the database server to run a single SQL statement in parallel. When you send a query to the database server, it breaks your request into a set of discrete subqueries, each of which can be assigned to a different CPU virtual processor. A parallelizable query is a query that can be executed in parallel. PDQ is especially effective when your tables are fragmented and your server computer has more than one CPU.
A parallelizable UDR is a C UDR that can be executed in parallel when it is invoke within a parallelizable query. If you write your C UDR to be parallelizable, it can be executed in parallel when the query that invokes it is executed in parallel. That is, the C UDR can execute on subsets of table data just as the query itself can. A query that invokes a non-parallelizable UDR can still run in parallel. However, the calls to the UDR do not run in parallel. Similarly, prepared queries that invoke a parallelizable query do not run the UDR in parallel.
To create a parallelizable C UDR, take the following steps:
To write a parallelizable C UDR, you must ensure that the UDR does not include any calls to the non-PDQ-thread-safe DataBlade API functions that Figure 13-28 lists.
Figure 13-28
A C UDR cannot call (either explicitly or implicitly) any of the DataBlade API functions in Figure 13-28. If you attempt to run a UDRthat contains a non-PDQ-thread-safe function, the database server generates an error. If your UDR must call one of the functions in Figure 13-28, it cannot be parallelizable.
Keep the following considerations in mind when you write a UDR to be parallelizable:
When you register a UDR with the PARALLELIZABLE routine modifier, you tell the database server that the UDR was written according to the guidelines in Writing the Parallelizable UDR. That is, the UDR does not call any DataBlade API functions that are non-PDQ-thread-safe. However, registering the UDR with PARALLELIZABLE modifier does not guarantee that every invocation of the UDR executes in parallel. The decision whether to parallelize a query and any accompanying UDRs is made when the query is parsed and optimized.
When a query with a parallelizable UDR executes in parallel, each routine instance might have more than one routine sequence. For a parallelized UDR, the routine manager creates a routine sequence for each scan thread of the query.
For example, suppose you have the following query:
Suppose also that the table1 table in the preceding query is fragmented into four fragments and the a_func() user-defined function was registered with the PARALLELIZABLE routine modifier. When this query executes in serial, it contains two routine instances (one in the select list and one in the WHERE clause) and two routine sequences. However, when this query executes in parallel over table1, it still contains two routine instances but it now has six routine sequences:
The MI_FPARAM structure holds the information of the routine sequence. Therefore, the routine manager allocates an MI_FPARAM structure for each scan thread of the parallelized query. All invocations of the UDR that execute in the scan thread can share the information in an MI_FPARAM structure. However, UDRs in different scan threads cannot share MI_FPARAM information across scan threads.
Tip: The DataBlade API also supports memory locking for a parallelizable UDR that uses data that is shared with other UDRs or shared by multiple instances of the same routine. This memory locking allows the UDR to implement concurrency control on its data. However, the memory-locking feature is only available to DataBlade partners for special DataBlade functionality. For more information on the memory-locking feature, see your Informix sales representative.
For more information about how the routine manager creates a routine sequence, see Creating the Routine Sequence.
You can use the SQL statement SET EXPLAIN to determine whether a parallelizable query is actually being executed in parallel. The SET EXPLAIN statement executes when the database server optimizes a statement. It creates a file that contains:
For more information on SET EXPLAIN, see its description in the Informix Guide to SQL: Syntax and the Performance Guide.
The following onstat options are useful to track execution of parallel activities:
For more information on the onstat utility, see your Administrator's Reference.