The built-in aggregates provide basic aggregations. However, if your data requires some special aggregation, you can create a custom aggregate function, called a user-defined aggregate. To implement your custom aggregation, you design an aggregate algorithm, which consists of the following parts:
After you complete these steps, you can use the aggregate in an SQL statement.
The following sections describe each of these development steps in more detail and use the SQSUM1 user-defined aggregate (which Table 103 describes) as an example.
User-Defined Aggregate | Description | Definition |
---|---|---|
SQSUM1 | Sums all values and calculates the square of this sum | (x1 + x2 + x3 + ... )2 |
An aggregate is a series of iterations. Each iteration processes one aggregate argument (which contains one column value) and performs the necessary computations to merge it into a partial result. The partial result is a snapshot of the aggregate arguments that the aggregate has merged so far. Once the aggregate has received all column values, it returns a value to the calling statement, based on the final partial result.
Each iteration of the aggregate is a separate invocation of a user-defined function. If user-allocated memory has the default PER_ROUTINE memory duration, the database server automatically deallocates it after only one iteration of the iteration function. (For more information, see Choosing the Memory Duration.) Therefore, while an iteration executes, it can access only the following information:
During its invocation, each aggregate iteration merges the aggregate argument into the aggregate state and returns the updated state. The database server preserves the updated aggregate state and passes it into the next iteration of the aggregate. When you create a user-defined aggregate, you must determine what nonlocal information each iteration needs and then define an aggregate state to contain this information. Otherwise, the aggregate iterations cannot obtain the information they need to perform their computations.
As a starting point, try using the data type of the aggregate argument for the aggregate state. Such a state is called a simple state. For more information, see Aggregate Support Functions for the Aggregate State.
For example, to determine the state for the SQSUM1 user-defined aggregate (which Table 103 describes), assess what tasks need to be performed in each iteration of SQSUM1. For each aggregate argument, SQSUM1 needs to add together the following values:
The data type that you choose for the aggregate state affects how the state must be managed. When the SQSUM1 aggregate receives INTEGER values as its aggregate arguments, the sum of these values is also an INTEGER value. Therefore, the SQSUM1 aggregate has an integer state, which holds the partial sum.
An aggregate support function is a special-purpose user-defined function that implements some task in the aggregate algorithm. You write aggregate support functions to initialize, calculate, and return the aggregate result to the calling code. An aggregate algorithm can include the following kinds of aggregate support functions.
The following sections summarize each of the aggregate support functions.
The INIT aggregate support function performs the initialization for the user-defined aggregate. Table 104 summarizes possible initialization tasks.
Initialization Task | More Information |
---|---|
Set-up for any additional resources outside the state that the aggregation might need | Aggregate Support Functions That the Algorithm Requires |
Initial calculations that the user-defined aggregate might need | Aggregate Support Functions That the Algorithm Requires |
Allocation and initialization of the aggregate state that the rest of the aggregation computation might need | Aggregate Support Functions for the Aggregate State |
Handling of an optional set-up argument | Implementing a Set-Up Argument |
The INIT support function is an optional aggregate support function. If your aggregate algorithm does not require any of the tasks in Table 104, you do not need to define an INIT function. When you omit the INIT function from your user-defined aggregate, the database server performs the state management for the aggregate state. For more information, see Handling a Simple State.
To declare an INIT support function as a C function, use the following syntax:
agg_state init_func(dummy_arg, set_up_arg) agg_arg_type dummy_arg; set_up_type set_up_arg; /* optional */
In the execution of a UDA, the database server calls the INIT function before it begins the actual aggregation computation. It passes in any optional set-up argument (set_up_arg) from the user-defined aggregate and copies any initialized aggregate state that INIT returns into the state buffer. For more information on the state buffer, see Aggregate Support Functions for the Aggregate State.
The first argument of the INIT function serves only to identify the data type of the aggregate argument that the user-defined aggregate handles. The routine manager uses this argument in routine resolution to determine the correct version of the overloaded INIT function. At the time of invocation, the routine manager just passes in a NULL value for the first argument of the INIT function, as the following syntax shows:
agg_state init_func(NULL, optional set-up argument)
Figure 96 shows the INIT aggregate support function that handles an INTEGER argument for the SQSUM1 user-defined aggregate (which Table 103 describes).
/* SQSUM1 INIT support function on INTEGER */ mi_integer init_sqsum1(dummy_arg) mi_integer dummy_arg; { return (0); }
For other aggregate support functions of SQSUM1, see Figure 97, Figure 98, and Figure 99.
The ITER aggregate support function performs the sequential aggregation or iteration for the user-defined aggregation. It merges a single aggregate argument into the partial result, which the aggregate state contains. The ITER function is a required aggregate support function; that is, you must define an ITER function for every user-defined aggregate. If you do not define an ITER function for your user-defined aggregate, the database server generates an error.
If the UDA was registered with the HANDLESNULLS modifier in the CREATE AGGREGATE statement, the database server calls the ITER support function once for each aggregate argument that passed to the user-defined aggregate. Each aggregate argument is one column value. If you omit the HANDLESNULLS modifier from CREATE AGGREGATE, the database server does not call ITER for any NULL-valued aggregate arguments. Therefore, NULL-valued aggregate arguments do not contribute to the aggregate result.
For example, suppose you execute the SQSUM1 user-defined aggregate (which Table 103 describes) in the following query:
SELECT SQSUM1(col3) FROM tab1;
The tab1 table (which Figure 95 defines) contains 6 rows. Therefore, the preceding query (which contains no WHERE clause) causes 6 invocations of the SQSUM1 ITER function. Each invocation of this ITER function processes one value from the col3 column.
To declare an ITER function as a C function, use the following syntax:
agg_state iter_func(current_state, agg_argument) agg_state current_state; agg_arg_type agg_argument;
Figure 97 shows the ITER aggregate support function that handles an INTEGER argument for the SQSUM1 user-defined aggregate (which Table 103 describes).
/* SQSUM1 ITER support function on INTEGER */ mi_integer iter_sqsum1(state, value) mi_integer state; mi_integer value; { /* add 'state' and 'value' together */ return (state + value); }
For other aggregate support functions of SQSUM1, see Figure 96, Figure 98, and Figure 99.
The COMBINE aggregate support function allows your user-defined aggregate to execute in a parallel query. When a query that contains a user-defined aggregate is processed in parallel, each parallel thread operates on a one subset of selected rows. The COMBINE function merges the partial results from two such subsets. This aggregate support function ensures that the result of aggregating over a group of rows sequentially is the same as aggregating over two subsets of the rows in parallel and then combining the results.
The COMBINE function is required for parallel execution. When a query includes a user-defined aggregate, the database server uses parallel execution when the query includes only aggregates. However, the COMBINE function might be used even when a query is not parallelized. For example, when a query contains both distinct and non-distinct aggregates, the database server can decompose the computation of the non-distinct aggregate into sub-aggregates based on the distinct column values. Therefore, you must provide a COMBINE function for every user-defined aggregate.
If you do not define an COMBINE function for your user-defined aggregate, the database server generates an error. However, if your user-defined aggregate uses a simple state, the COMBINE function can be the same as the ITER function. For more information, see Handling a Simple State.
To declare a COMBINE function as a C function, use the following syntax:
agg_state combine_func(agg_state1, agg_state2) agg_state agg_state1, agg_state2;
In the execution of a UDA, the database server calls the COMBINE once for each pair of threads (agg_state1 and agg_state2) that execute a parallel query that contains the user-defined aggregate. When the COMBINE function combines two partial results, it might also need to release resources associated with one of the partial results.
Figure 98 shows the COMBINE aggregate support function that handles an INTEGER argument for the SQSUM user-defined aggregate (which Table 103 describes).
/* SQSUM1 COMBINE support function on INTEGER */ mi_integer combine_sqsum1(state1, state2) mi_integer state1, state2; { /* Return the new partial sum from two parallel partial * sums */ state1 += state2; return (state1); }
For other aggregate support functions of SQSUM1, see Figure 96, Figure 97, and Figure 99. For more information on parallel execution of a UDA, see Executing a User-Defined Aggregate in Parallel Queries.
The FINAL aggregate support function performs the post-iteration tasks for the user-defined aggregate. Table 105 summarizes possible post-iteration tasks.
Post-Iteration Task | More Information |
---|---|
Type conversion of the final state into the return type of the user-defined aggregate | Returning an Aggregate Result Different from the Aggregate State |
Post-iteration calculations that the user-defined aggregate might need | Aggregate Support Functions That the Algorithm Requires |
Deallocation of memory that the INIT aggregate support function has allocated | Managing a Pointer-Valued State |
The FINAL function is an optional aggregate support function. If your user-defined aggregate does not require one of the tasks in Table 105, you do not need to define a FINAL function. When you omit the FINAL function from your user-defined aggregate, the database server returns the final aggregate state as the return value of the user-defined aggregate. If this state does not match the expected data type of the aggregate return value, the database server generates a data type mismatch error.
To declare a FINAL function as a C function, use the following syntax:
agg_type final_func(final_state) agg_state final_state;
In the execution of a UDA, the database server calls the FINAL function after all iterations of the ITER function are complete.
Figure 99 shows the aggregate support functions that handle an INTEGER argument for the SQSUM user-defined aggregate (which Table 103 describes).
/* SQSUM1 FINAL support function on INTEGER */ mi_integer final_sqsum1(state) mi_integer state; { /* Calculate square of sum */ state *= state; return (state); }
For other aggregate support functions of SQSUM1, see Figure 96, Figure 97, and Figure 98.
You can define the user-defined aggregate before you create the C implementation of the aggregate support functions. However, you must ensure that the names of the C functions match the names in the CREATE FUNCTION statements that register them.
The development steps for a UDA list the definition of the UDA after the aggregate support functions are written. However, the CREATE AGGREGATE statement does not verify that the aggregate support functions it lists were registered nor does the CREATE FUNCTION statement verify that the executable C code exists.
Figure 100 shows the CREATE AGGREGATE statement that defines the SQSUM1 user-defined aggregate (which Table 103 describes) in the database.
CREATE AGGREGATE sqsum1 WITH (INIT = init_sqsum1, ITER = iter_sqsum1, COMBINE = combine_sqsum1, FINAL = final_sqsum1);
Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the SQSUM1 aggregate are compiled and linked into a shared-object module named sqsum.
On UNIX or Linux, the executable code for the SQSUM1 aggregate support functions would be in a shared library named sqsum.so.
Figure 101 shows the CREATE FUNCTION statements that register the aggregate support functions for the SQSUM1 user-defined aggregate to handle the INTEGER data type.
CREATE FUNCTION init_sqsum1(dummy_arg INTEGER) RETURNS INTEGER WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so' LANGUAGE C; CREATE FUNCTION iter_sqsum1(state INTEGER, one_value INTEGER) RETURNS INTEGER EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so' LANGUAGE C; CREATE FUNCTION combine_sqsum1(state1 INTEGER, state2 INTEGER) RETURNS INTEGER EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so' LANGUAGE C; CREATE FUNCTION final_sqsum1(state INTEGER) RETURNS INTEGER EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so' LANGUAGE C;
The registered names of the aggregate support functions must match the names that the CREATE AGGREGATE statement lists. For example, in Figure 101, a CREATE FUNCTION statement registers the INIT support function as init_sqsum1, which is the same name that the INIT option lists in the CREATE AGGREGATE statement (see Figure 100).
In addition, the CREATE FUNCTION for the INIT support function must include the HANDLESNULLS routine modifier so that the database server can pass the INIT function the dummy NULL-valued argument.
For more information on the use of the CREATE AGGREGATE and CREATE FUNCTION statements to define user-defined aggregates, see the chapter on aggregates in the IBM Informix: User-Defined Routines and Data Types Developer's Guide. For the syntax of these statements, see the IBM Informix: Guide to SQL Syntax.
After you complete the aggregate-development steps, the end user can use the user-defined aggregate on the defined data type in SQL statements. However, use of the user-defined aggregate does assume the appropriate privileges.
For the tab1 table, which Figure 95 defines, the following query uses the new SQSUM1 aggregate function on the INTEGER column, col3:
SELECT SQSUM1(col3) FROM tab1;
With the rows that Figure 95 has inserted, the preceding query yields an INTEGER value of 10201.
To be able to use SQSUM1 on other data types, you need to ensure that the appropriate aggregate support functions exist for this data type. For example, SQSUM2 User-Defined Aggregate shows the definition of a version of the SQSUM aggregate on both an INTEGER and a named row type.
Figure 102 shows the execution sequence of aggregate support functions for a user-defined aggregate that is not executed in a parallel query.
These aggregate support functions use the aggregate state to pass shared information between themselves.
As you design your aggregate algorithm, you must determine which of the support functions the algorithm requires. As a minimum, the user-defined aggregate must have an ITER function. It is the ITER function that performs a single iteration of the aggregate on one aggregate argument. Although Figure 102 shows the execution of both the INIT and FINAL support functions, these functions are optional for a user-defined aggregate. In addition, the COMBINE function, though required, often does not require separate code; it can simply call the ITER function.
Writing aggregate support functions that your user-defined aggregate does not require means unnecessary coding and execution time. Therefore, it is important to assess your aggregate for required functions. The following table shows the design decisions in the determination of required aggregate support functions.
Design Decision | Aggregate Support
Functions Involved |
More Information |
---|---|---|
Does the algorithm require initialization or clean-up tasks? | INIT and FINAL | Aggregate Support Functions That the Algorithm Requires |
Does the aggregate have a simple aggregate state? | INIT, COMBINE,
and FINAL |
Yes: Handling a Simple State |
Does the aggregate have a
set-up argument? |
INIT | Implementing a Set-Up Argument |
Does the aggregate return a value whose data type is different from the aggregate state? | FINAL | Returning an Aggregate Result Different from the Aggregate State |
Does the aggregate have special needs to run in a parallel query? | COMBINE | Executing a User-Defined Aggregate in Parallel Queries |
You can overload the aggregate support functions to provide support for different data types. Any overloaded version of the UDA, however, cannot omit any of the aggregate support functions that the CREATE AGGREGATE statement has listed or use any support function that CREATE AGGREGATE has not specified.
When the database server executes a UDA (regardless of the data type of the aggregate argument), the database expects to find all the aggregate support functions that the CREATE AGGREGATE statement has registered. Therefore, if you omit a support function for one of the reasons in the preceding table, all versions of the aggregate for all data types must be able to execute using only the aggregate support functions that CREATE AGGREGATE specifies.
To implement a user-defined aggregate, you must develop an algorithm that calculates an aggregate return value based on the aggregate-argument values. You need to break this algorithm into the following steps.
All aggregate algorithms must include an ITER function to handle each aggregate argument. However, the INIT and FINAL support functions are optional. To determine whether your algorithm requires an INIT or FINAL function, make the following design assessments:
If the algorithm requires additional resources to perform its task (such as operating-system files or smart large objects), use the INIT function to set up these resources. The INIT function can also initialize the partial result.
If the INIT function has set up resources to perform the aggregation, the FINAL function can deallocate or close resources so that they are free for other users. In addition, if the aggregation requires calculations that must be performed on the final partial result, use the FINAL function to perform these calculations.
For example, the following algorithm defines the SQSUM1 aggregate (which Table 103 describes):
(x1 + x2 + ... )2
where each xi is one column value; that is, one aggregate argument. The ITER function for SQSUM1 takes a single aggregate argument and adds it to a partial sum (see Figure 97). The algorithm does not require initialization of additional resources. Therefore, no INIT function is required for this task. However, the INIT function can initialize the partial result (see Figure 96).
The SQSUM1 user-defined aggregate does require post-iteration calculations. When the last iteration is reached, the partial sum needs to be squared to obtain the aggregate return value. This final calculation is performed in a FINAL function and returned as the return value for the SQSUM1 aggregate (see Figure 99).
The SUMSQ user-defined aggregate (described on page SUMSQ User-Defined Aggregate) is an example of a user-defined aggregate that requires neither initialization nor post-iteration tasks. Therefore, it does not require the INIT and FINAL support functions.
The aggregate support functions pass information about the aggregate among themselves in the aggregate state.
The database server invokes all aggregate support functions as regular UDRs. Therefore, each support function has a default memory duration of PER_ROUTINE, which means that the database server frees any memory that the support function allocates when that function terminates. However, the aggregate state must be valid across all invocations of all aggregate support functions, including possible multiple iterations of the ITER function. Therefore, a special state buffer (with a PER_COMMAND memory duration) must exist so that the aggregate state is available to all aggregate support functions. The database server then passes this state buffer to each invocation of an aggregate support function.
The purpose of the INIT support function is to return a pointer to the initialized aggregate state. To determine whether your aggregate state requires an INIT support function for state management, assess the data type of this state. The aggregate-state data type determines whether the INIT function must handle state management, as follows:
If your user-defined aggregate uses a simple state, the database server can perform the state management. It can automatically allocate the state buffer for the aggregate state. Therefore, the INIT support function does not need to handle this allocation.
There are several types of non-simple states possible. The database server cannot perform state management for these non-simple states. Instead, the INIT support functions must perform the state-management tasks to handle non-simple states.
Once the user-defined aggregate has an aggregate state, the database server passes a pointer to this state buffer to each invocation of the ITER function and to the FINAL function.
A simple state is an aggregate state whose data type is the same as the aggregate argument. At any point in the iteration, a simple state contains only the partial result of the aggregation. For example, the SUM built-in aggregate uses a simple state because its state contains only the partial result: the running total of the aggregate arguments. When the SUM aggregate operates on INTEGER aggregate arguments, it creates an integer partial sum for these arguments. Therefore, the data type of its aggregate argument and aggregate state is the same.
However, the AVG built-in aggregate does not use a simple state. Because it must divide the total by the number of values processed, its state requires two values: the running total and the number of arguments processed. When the AVG aggregate operates on INTEGER aggregate arguments, it creates an integer partial sum and an integer count for these arguments. Therefore, the data type of its aggregate argument (INTEGER) cannot be not the same as its aggregate state (two INTEGER values).
When a user-defined aggregate has a simple state, the following items apply:
In this case, the database server automatically performs the state management. If a UDA with a simple state does not include any other tasks that require an INIT support function (see Table 104), you can omit the INIT function from the definition of the UDA. The only possible state-management task you might want to perform in the INIT function is state initialization. For more information, see When to Allocate and Deallocate a State.
In this case, you do not have to create special code in the COMBINE function for the handling of parallel execution. Instead, the ITER function can perform the merge of two partial results. For more information, see Executing a User-Defined Aggregate in Parallel Queries.
In this case, the aggregate argument, aggregate state, and aggregate result have the same data type. Such a user-defined aggregate is called a simple binary operator. If a UDA is a simple binary operator and does not include any other tasks that require a FINAL support function (see Table 105), you can omit the FINAL function from the definition of the UDA. For more information, see Returning an Aggregate Result Different from the Aggregate State.
When a UDA does not include an INIT function, the database server takes the following state-management steps:
The database server can determine the size of the state buffer from the data type of the aggregate argument, which is passed into the user-defined aggregate.
The ITER support function can just use the system-allocated state buffer to hold the state information. If you have some minor initialization tasks that you need to perform, the ITER function can check for a NULL-valued aggregate state on its first iteration. If the state is NULL, ITER can initialize the state to its appropriate value. In this way, you can perform minor state initialization without the overhead of a separate invocation of the INIT function.
When a UDA does not include a FINAL function, the database server passes the final state as the aggregate result of the user-defined aggregate.
The implementation on the SQSUM1 aggregate includes an INIT support function that initializes the aggregate state (see Figure 96). However, because SQSUM1 has a simple state, this INIT function is not required. Instead, an ITER function can check for a NULL-valued state and perform the state initialization. The ITER support function of the SQSUM2 aggregate (see Figure 106) shows this type of implementation.
The SUMSQ user-defined aggregate (described on page SUMSQ User-Defined Aggregate) also has a simple state and therefore does not require an INIT support function for state management.
When the data type of the aggregate argument is not adequate for the state information, you must use a nonsimple state for your UDA. A nonsimple state is an aggregate state whose data type is not the same as the aggregate argument. Possible uses for a nonsimple state include an aggregate state that contains:
When the aggregate-argument data type is not adequate for the state information, you must determine the appropriate data structure to hold the state information. This data structure depends upon the size of the aggregate state that you need to maintain, as the following table shows.
Nonsimple State | Description | More Information |
---|---|---|
Single-valued state | Consists of information that can be stored in an Informix built-in data type | Managing a Single-Valued State |
Opaque-type state | Consists of several values, but these values do not exceed the maximum size of an opaque data type | Managing an Opaque-Type State. |
Pointer-valued state | Consists of values whose size does exceed the maximum size of an opaque data type | Managing a Pointer-Valued State. |
If your user-defined aggregate uses a single-valued or opaque-type state, the database server can still perform the state management. However, it cannot provide all necessary state management for a pointer-valued state.
A single-valued state uses a built-in SQL data type to hold the aggregate state. Use a single-valued state for an aggregate state that can fit into a built-in data type but whose data type does not match that of the aggregate argument.
Declare the state parameters and return values of the aggregate support functions to use the DataBlade API data type that corresponds to the built-in SQL data type that your state requires. For a list of these data type correspondences, see Table 4. Information on how to handle state management for a single-valued state is provided below.
Specify the built-in SQL data type as the data type of the state parameters and return values in the function signatures of the aggregate support functions.
The database server can perform memory management for a single-valued state because it can determine the size of a built-in data type. Therefore, you do not need to allocate memory for a single-valued state in the INIT support function.
In the ITER function, you can initialize or update a single-valued state in either of the following ways:
To modify the state, change the value (or values) of the DataBlade API variable that the database server has passed into the ITER function as the state argument.
To modify the state, declare a local variable or allocate PER_ROUTINE memory for a new variable and put the new values into this variable.
For more information, see When to Allocate and Deallocate a State.
For a single-valued state, the FINAL support function does not need to perform any state-management tasks. However, it must convert to the data type of the final aggregate state to that of the aggregate result. For more information, see Returning an Aggregate Result Different from the Aggregate State.
An opaque-type state uses an opaque data type to hold the aggregate state. A possible use for an opaque-type state is to include an aggregate state that contains more information than the aggregate-argument data type or a built-in data type can hold. The size of an aggregate state that can be implemented as an opaque-type state is limited by the maximum size of an opaque type.
To use an opaque-type state for a UDA, write the appropriate aggregate support functions so that they handle an opaque-type state.
Declare the state parameters and return values of the aggregate support functions to use the internal format of the opaque type. This internal format is usually a C struct structure. For more information, see Determining Internal Representation.
Register the opaque data type in the database with the CREATE OPAQUE TYPE statement.
After you register the opaque data type, the database server can obtain information about the data type of the state value when the CREATE FUNCTION statement registers the function signatures of the aggregate support functions.
You need to write the opaque-type support functions only if you need such functionality in the aggregate support functions. For example, the input and output support functions might be useful when you debug your UDA. If you do write opaque-type support functions, you must compile and link them into a shared-object module, as Compiling a C UDR describes.
Register the aggregate support functions with the CREATE FUNCTION statement.
Specify the registered opaque type as the data type of the state parameters and return values in the function signatures of the aggregate support functions.
The database server can perform memory management for an opaque-type state because it can determine the size of the opaque data type. The CREATE OPAQUE TYPE statement registers the opaque type, including its size, in the system catalog tables of the database. From the system catalog tables, the database server can determine the size of the aggregate state to allocate. Therefore, you do not need to allocate the opaque-type state in the INIT support function.
In the ITER function, you can initialize or update an opaque-type state in either of the following ways:
To modify the state, change the values in the internal opaque-type structure that the database server has passed into the ITER function as an argument.
To modify the state, allocate PER_ROUTINE memory for a new internal opaque-type structure and put the new values into this structure.
If you need to initialize the internal structure of the opaque type, the INIT or ITER function can allocate PER_ROUTINE memory for the structure perform the appropriate initializations. When the support function exits, the database server copies the contents of this PER_ROUTINE structure into the PER_COMMAND system-allocated state buffer.
For more information, see When to Allocate and Deallocate a State. For an example of a user-defined aggregate that uses an opaque-type state, see the description of the PERCENT_GTR aggregate on page PERCENT_GTR User-Defined Aggregate.
A pointer-valued state uses the POINTER data type as the aggregate state. The mi_pointer data type is the DataBlade API type that represents the SQL data type, POINTER. (For more information, see Pointer Data Types (Server).) Use a pointer-valued state when an aggregate state might contain more information than can fit into the maximum opaque-type size.
Declare the state parameters and return values of the aggregate support functions to use the mi_pointer data type. Information on how to handle state management of a pointer-valued state follows.
Specify the POINTER data type for the state parameters and return values in the function signatures of the aggregate support functions.
The database server cannot perform state management for a pointer-valued state because it cannot determine the size of the state. The DataBlade API data type mi_pointer is a typedef for the following C data type:
void *
Because this data type is only a pointer, the database server cannot determine how large the aggregate state is. Therefore, it cannot allocate the PER_COMMAND system-allocated state buffer. In this case, the INIT and FINAL aggregate support functions are not optional. They must perform state management of the nonsimple aggregate state, as follows:
The INIT function must also allocate any related resources that the aggregate state might need. Keep in mind that the database server does not interpret the contents of the pointer-valued state. It cannot manage any objects that the state type might reference. Therefore, use states with embedded pointers with caution.
Once you allocate the pointer-valued state, the database server passes a pointer to this state to the other aggregate support functions. Initialize or update the pointer-valued state only with an in-place update. For more information, see When to Allocate and Deallocate a State.
For a pointer-valued state, the FINAL function must always deallocate the aggregate state. If your INIT support function has allocated related resources that the aggregate state uses, make sure that the FINAL function deallocates these resources.
To each invocation of the ITER support function, the database server automatically passes a pointer to the state buffer. When you need to initialize or update the state information, the ITER function can handle the modification in either of two ways, as the following table describes.
Changing the State | State Memory Duration | Results |
---|---|---|
Merge the aggregate argument into the existing state in-place and return the existing state. | The existing state has a PER_COMMAND memory
duration:
|
The new state value is at the address that the database server has passed into the ITER function. The ITER function then returns this address as the updated state. Because the state memory has a PER_COMMAND memory duration, the database server can re-use the same state for subsequent invocations of ITER. |
Allocate fresh memory for a new state, merge the existing state with the new aggregate argument into this state, and return this new state. | The new state has a PER_ROUTINE memory duration:
|
The new state value is at the address of the new state. The ITER function then returns the address of the new state as the updated state. Because this memory has a PER_ROUTINE memory duration, the database server must copy the returned state back into the PER_COMMAND buffer. |
The new state method can be slower than the in-place method. Design your ITER support function to use the in-place method whenever possible. When the database server can skip the copy operation, you can improve performance of your UDA.
To determine which of these methods was used in the ITER support function, the database server compares the state value that ITER returns and the state value that was passed into ITER. If these two pointers identify the same memory location, the ITER function has modified the state in-place. Therefore, the database server does not need to perform the copy operation. If these two pointers identify different memory locations, the database server proceeds with the copy operation.
Aggregate support functions have the following restrictions on the deallocation of an aggregate state:
You can define a UDA so that the end user can supply a set-up argument to the aggregate. The set-up argument can customize the aggregate for a particular invocation. For example, the PERCENT_GTR user-defined aggregate (see page PERCENT_GTR User-Defined Aggregate) determines the percentage of numbers greater than a particular value. The UDA could have been implemented so that the value to compare against is hardcoded into the UDA. However, this would mean a separate user-defined aggregate that checks for values greater than 10, another that checks for values greater than 15, and so on.
Instead, the PERCENT_GTR aggregate accepts the value to compare against as a set-up argument. In this way, the end user can determine what values are needed, as follows:
SELECT PERCENT_GTR(col1, 10) FROM tab1; -- values > 10; SELECT PERCENT_GTR(col1, 15) FROM tab1; -- values > 15;
The database server passes in the set-up argument as the second argument to the INIT function. Therefore, the INIT support function must handle the set-up argument. Usually, this handling involves performing any initial processing required for the value and then saving this value in the aggregate state. It might also check for a possible SQL NULL value as a set-up argument.
This set-up argument is optional, in the sense that you can define a UDA with one or without one. However, if you define your UDA to include a set-up argument, the end user must provide a value for this argument. When the UDA is invoked with two arguments (aggregate argument and set-up argument), the database server looks for an INIT function with two arguments. If you omit the set-up argument when you invoke the UDA, the database server looks for an INIT function with just one argument.
To indicate no set-up argument, the end user can provide the SQL NULL value as a set-up value. However, if you really want to make the set-up argument truly optional for the end user, you must create and register two INIT functions:
In this case, you could assign the set-up argument some known default value.
As the writer of the UDA, you need to decide whether this feature is useful.
The aggregate result is the value that the user-defined aggregate returns to the calling SQL statement. If the user-defined aggregate does not include a FINAL support function, the database server returns the final aggregate state; that is, it returns the value of the aggregate state after the last aggregate iteration. However, if your UDA needs to return a value whose data type is different from the aggregate state, use a FINAL support function to convert the final aggregate state to the data type that you want to return from the aggregate.
For example, the PERCENT_GTR user-defined aggregate (see page PERCENT_GTR User-Defined Aggregate) returns the percentage of values greater than some value as a percentage; that is, as a fixed-point number in the range 0.00 to 100.00. To handle integer values, the user-defined aggregate would require an aggregate state that holds the following values:
However, the aggregate result of PERCENT_GTR is a fixed-point number. Therefore, you would not want the aggregate to return the final state to the calling SQL statement. Instead, the FINAL support function needs to perform the following steps:
For the complete example of the PERCENT_GTR user-defined aggregate, see page PERCENT_GTR User-Defined Aggregate.
The database server can break up the aggregation computation into several pieces and compute them in parallel. Each piece is computed sequentially as follows:
The database server then calls the COMBINE support function to merge the partial states, two at a time, into a final state. For example, for the AVG built-in aggregate, the COMBINE function would add the two partial sums and adds the two partial counts. Finally, the database server calls the FINAL support function on the final state to generate the aggregate result.
Figure 103 shows the execution sequence of aggregate support functions for a user-defined aggregate that is executed in two parallel threads.
Figure 103 shows how the COMBINE function is used to execute a user-defined aggregate with two parallel threads. For more than two parallel threads, the database server calls COMBINE on two thread states to obtain one, combines this state with another thread state, and so on until it has processed all parallel threads. The database server makes the decision whether to parallelize execution of a user-defined aggregate and the degree of such parallelism. However, these decisions are invisible to the end user.
Parallel aggregation must give the same results as an aggregate that is not computed in parallel. Therefore, you must write the COMBINE function so that the result of aggregating over the entire group of selected rows is the same as aggregating over two partitions of the group separately and then combining the results.
For an example of COMBINE functions in user-defined aggregates, see Sample User-Defined Aggregates.
This section provides the sample user-defined aggregates that the following table describes.
User-Defined
Aggregate |
Description |
---|---|
SUMSQ | Squares each value and calculates the sum of these squared values |
SQSUM2 | Sums all values and calculates the square of this sum |
PERCENT_GTR | Determines the percentage of values greater than a user-specified value |
X_PERCENTILE | Determines the value in a group of values that is the x-percentile, where x is a percent that the end user specifies. |
Each description includes the aggregate support functions written in C and the SQL statements to define the user-defined aggregate in the database.
The SUMSQ user-defined aggregate squares each value and calculates the sum of these squared values. It has the following algorithm:
x12 + x22 + ...
where each xi is one column value; that is, one aggregate argument.
To determine the aggregate state for SUMSQ, examine what information needs to be available for each iteration of the aggregate. To perform one iteration of SUMSQ, the ITER function must:
The ITER function has access to the aggregate argument because the database server passes it in. Therefore, ITER does not require additional information to perform this step.
To add in the squared argument, the aggregate must keep a partial sum of the previous squared values. For the ITER function to have access to the partial sum from the previous iterations, the aggregate state must contain it.
The SUMSQ has a simple state because the data type of the partial sum is the same as that of the aggregate argument. For example, when the SUMSQ aggregate receives INTEGER values, this partial sum is also an INTEGER value. Therefore, SUMSQ can allow the database server to manage this state, which has the following effect on the design of the aggregate support functions:
An aggregate with a simple state does not need to explicitly handle the allocation and deallocation of the aggregate state. Instead, the database server automatically allocates the aggregate state and initializes it to NULL. Therefore, the INIT function does not require other INIT-function tasks (see Table 104). Therefore, this support function can safely be omitted from the aggregate definition.
No special processing is required to merge two partial states. The ITER function can adequately perform this merge.
Before the iterations begin, the partial sum needs to be initialized to zero (0). However, because the INIT function is not required for state management, this aggregate initializes the state in the first invocation of its ITER function. The ITER function then calculates the square of a single aggregate argument, and adds this value to a partial sum. When the last iteration is reached, the final partial sum is the value that the SUMSQ aggregate returns. Therefore, the SUMSQ algorithm does not require a FINAL function for post-iteration tasks.
Figure 104 shows the required aggregate support functions that handle an INTEGER argument for the SUMSQ user-defined aggregate.
/* SUMSQ ITER support function on INTEGER */ mi_integer iter_sumsq(state, value, fparam) mi_integer state; mi_integer value; MI_FPARAM *fparam; { /* If 'state' is NULL, this is the first invocation. * Just return square of 'value'. */ if ( mi_fp_argisnull(fparam, 0) ) return (value * value); else /* add 'state' and square of 'value' together */ return (state + (value * value)); } /* SUMSQ COMBINE support function on INTEGER */ mi_integer combine_sumsq(state1, state2) mi_integer state1, state2; { /* Return the new partial sum from two parallel partial * sums */ return (iter_sumsq(state1, state2)); }
The following SQL statement registers the SUMSQ user-defined aggregate in the database:
CREATE AGGREGATE sumsq WITH (ITER = iter_sumsq, COMBINE = combine_sumsq);
This CREATE AGGREGATE statement lists only the aggregate support functions that are required to implement SUMSQ: ITER and COMBINE.
Suppose that the ITER and COMBINE aggregate support functions for the SUMSQ aggregate are compiled and linked into a shared-object module named sumsq.
On UNIX or Linux, the executable code for the SUMSQ aggregate support functions would be in a shared library named sumsq.so.
Figure 105 shows the CREATE FUNCTION statements that register the aggregate support functions for SUMSQ to handle INTEGER aggregate arguments.
CREATE FUNCTION iter_sumsq(state INTEGER, one_value INTEGER) RETURNS INTEGER WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/sums/sumsq.so' LANGUAGE C; CREATE FUNCTION combine_sumsq(state1 INTEGER, state2 INTEGER) RETURNS INTEGER EXTERNAL NAME '/usr/udrs/aggs/sums/sumsq.so' LANGUAGE C;
For the tab1 table, which Figure 95 defines, the following query uses the new SUMSQ aggregate function on the INTEGER column, col3:
SELECT SUMSQ(col3) FROM tab1;
With the rows that Figure 95 has inserted, the preceding query yields an INTEGER value of 2173. To be able to use SUMSQ on other data types, you need to ensure that the appropriate aggregate support functions exist for this data type.
The SQSUM2 user-defined aggregate is another version of the SQSUM1 aggregate, which Table 103 describes. Its algorithm is the same as SQSUM1:
(x1 + x2 + ... )2
where each xi is one column value; that is, one aggregate argument.
However, the SQSUM2 aggregate takes advantage of the fact that this aggregate has a simple state. Because the database server automatically handles state management, the SQSUM2 aggregate can safely omit the INIT function.
Figure 106 shows the aggregate support functions that handle an INTEGER argument for the SQSUM2 user-defined aggregate.
/* SQSUM2 ITER support function on INTEGER */ mi_integer iter_sqsum2(state, value, fparam) mi_integer state; mi_integer value; MI_FPARAM *fparam; { /* If 'state' is NULL, this is the first invocation. * Just return 'value'. */ if ( mi_fp_argisnull(fparam, 0) ) return (value); else /* add 'state' and 'value' together */ return (state + value); } /* SQSUM2 COMBINE support function on INTEGER */ mi_integer combine_sqsum2(state1, state2) mi_integer state1, state2; { /* Return the new partial sum from two parallel partial * sums */ return (iter_sqsum2(state1, state2)); } /* SQSUM2 FINAL support function on INTEGER */ mi_integer final_sqsum2(state) mi_integer state; { /* Calculate square of sum */ state *= state; return (state); }
In its first invocation, the ITER function performs the state initialization. It then takes a single aggregate argument and adds it to a partial sum. For aggregates with a simple state, the COMBINE function can be the same as the ITER function. Therefore, this COMBINE function just calls iter_sumsq2( ) to perform the merge of two partial states.
The data type of the aggregate result is also the same as the aggregate state. Therefore, SQSUM2 is a simple binary operator and the FINAL support function is not needed to convert the data type of the final state. However, the SQSUM2 aggregate still does require a FINAL support function. The SQSUM2 algorithm involves a post-iteration calculation: it must square the final sum to obtain the aggregate return value. The FINAL function performs this final calculation and returns it as the aggregate result for the SQSUM2 aggregate.
Suppose that the ITER, COMBINE, and FINAL aggregate support functions for the SQSUM2 aggregate are compiled and linked into a shared-object module named sqsum.
On UNIX or Linux, the executable code for the SQSUM2 aggregate support functions would be in a shared library named sqsum.so.
Once you have successfully compiled and linked the aggregate support functions, you can define the SQSUM2 aggregate in the database. Figure 107 shows the CREATE AGGREGATE statement that registers the SQSUM2 user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions.
CREATE AGGREGATE sqsum2 WITH (ITER = iter_sqsum2, COMBINE = combine_sqsum2, FINAL = final_sqsum2);
Figure 108 shows the CREATE FUNCTION statements that register the SQSUM2 aggregate support functions for the aggregate argument of the INTEGER data type.
CREATE FUNCTION iter_sqsum2(state INTEGER, one_value INTEGER) RETURNS INTEGER WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so' LANGUAGE C; CREATE FUNCTION combine_sqsum2(state1 INTEGER, state2 INTEGER) RETURNS INTEGER EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so' LANGUAGE C; CREATE FUNCTION final_sqsum2(state INTEGER) RETURNS INTEGER EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so' LANGUAGE C;
In Figure 108, the CREATE FUNCTION statement that registers the ITER support function requires the HANDLESNULLS routine modifier because the aggregate does not have an INIT support function.
For the tab1 table, which Figure 95 defines, the following query uses the new SQSUM2 aggregate function on the INTEGER column, col3:
SELECT SQSUM2(col3) FROM tab1;
With the rows that Figure 95 has inserted, the preceding query yields an INTEGER value of 10201, which is the same value that the SQSUM1 aggregate returned for these same rows.
Now, suppose that you want to define the SQSUM2 user-defined aggregate on the complexnum_t named row type, which Figure 93 defines. This version of SQSUM2 must have the same aggregate support functions as the version that handles INTEGER (see Figure 107).
Aggregate Support
Function |
SQL Function Name | C Function Name |
---|---|---|
ITER | iter_sqsum2( ) | iter_sqsum2_complexnum( ) |
COMBINE | combine_sqsum2( ) | combine_sqsum2_complexnum( ) |
FINAL | final_sqsum2( ) | final_sqsum2_complexnum( ) |
The following code shows the aggregate support functions that handle a complexnum_t named row type as an argument for the SQSUM2 user-defined aggregate:
/* SQSUM2 ITER support function for complexnum_t */ MI_ROW *iter_sqsum2_complexnum(state, value, fparam) MI_ROW *state; MI_ROW *value; MI_FPARAM *fparam; { /* Compute the new partial sum using the complex_plus( ) * function. Put the sum in a new MI_ROW, which * complex_plus( ) allocates (and returns a pointer to) */ return (complex_plus(state, value, fparam)); } /* SQSUM2 COMBINE support function for complexnum_t */ MI_ROW *combine_sqsum2_complexnum(state1, state2, fparam) MI_ROW *state1, *state2; MI_FPARAM *fparam; { MI_ROW *ret_state; ret_state = iter2_sqsum2_complexnum(state1, state2, fparam); mi_free(state1); mi_free(state2); return (ret_state); } /* SQSUM2 FINAL support function for complexnum_t */ MI_ROW *final_sqsum2_complexnum(state) MI_ROW *state; { MI_CONNECTION *conn; MI_TYPEID *type_id; MI_ROW_DESC *row_desc; MI_ROW *ret_row; MI_DATUM values[2]; mi_boolean nulls[2] = {MI_FALSE, MI_FALSE}; mi_real *real_value, *imag_value; mi_integer real_len, imag_len; mi_real sqsum_real, sqsum_imag; /* Extract complex values from state row structure */ mi_value_by_name(state, "real_part", (MI_DATUM *)&real_value, &real_len); mi_value_by_name(state, "imaginary_part", (MI_DATUM *)&imag_value, &imag_len); /* Calculate square of sum */ sqsum_real = (*real_value) * (*real_value); sqsum_imag = (*imag_value) * (*imag_value); /* Put final result into 'values' array */ values[0] = (MI_DATUM)&sqsum_real; values[1] = (MI_DATUM)&sqsum_imag; /* Generate return row type */ conn = mi_open(NULL, NULL, NULL); type_id = mi_typestring_to_id(conn, "complexnum_t"); row_desc = mi_row_desc_create(type_id); ret_row = mi_row_create(conn, row_desc, values, nulls); return (ret_row); }
Figure 109 shows the CREATE FUNCTION statements that register the SQSUM2 aggregate support functions for an aggregate argument of the complexnum_t data type.
CREATE FUNCTION iter_sqsum2(state complexnum_t, one_value complexnum_t) RETURNS complexnum_t WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so(iter_sqsum2_complexnum)' LANGUAGE C; CREATE FUNCTION combine_sqsum2(state1 complexnum_t, state2 complexnum_t) RETURNS complexnum_t EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so(combine_sqsum2_complexnum)' LANGUAGE C; CREATE FUNCTION final_sqsum2(state complexnum_t) RETURNS complexnum_t EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so(final_sqsum2_complexnum)' LANGUAGE C;
The following query uses the SQSUM2 aggregate function on the complexnum_t column, col2:
SELECT SQSUM2(col2) FROM tab1;
With the rows that Figure 95 has inserted, the preceding query yields a complexnum_t value of:
ROW(817.96, 1204.09)
Suppose you want to create a user-defined aggregate that determines the percentage of values greater than some user-specified value and returns this percentage as a fixed-point number in the range 0 to 100. The implementation of this UDA uses the following aggregate features:
The PERCENT_GTR user-defined aggregate needs the following state information:
Therefore, it uses the following C structure, named percent_state_t, to hold the aggregate state:
typedef struct percent_state { mi_integer gtr_than; mi_integer total_gtr; mi_integer total; } percent_state_t;
Because the size of the percent_state_t structure never exceeds the maximum opaque-type size, PERCENT_GTR can use an opaque-type state to hold its aggregate state. The following code shows the INIT aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:
/* PERCENT_GTR INIT support function for INTEGER */ percent_state_t *init_percentgtr(dummy_arg, gtr_than, fparam) mi_integer dummy_arg; mi_integer gtr_than; MI_FPARAM *fparam; { percent_state_t *state; /* Allocate PER_ROUTINE memory for state and initialize it */ state = mi_alloc(sizeof(percent_state_t)); /* Check for a NULL-valued set-up argument */ if ( mi_fp_argisnull(fparam, 1) ) state->gtr_than = 0; else state->gtr_than = gtr_than; state->total_gtr = 0; state->total = 0; return (state); }
This INIT function performs the following tasks:
This set-up argument is the value that the end user specifies so that the aggregate knows which value to compare the aggregate arguments against. If the end user provides a NULL value for the set-up argument, PERCENT_GTR checks for values greater than zero (0).
The INIT function does not need to allocate memory for an opaque-type state because the database server can perform the state management. However, because PERCENT_GTR already requires an INIT function to handle the set-up argument, INIT allocates a PER_ROUTINE percent_state_t structure so that it can initialize the opaque-type state.
The following code implements the ITER aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:
/* PERCENT_GTR ITER support function for INTEGER */ percent_state_t *iter_percentgtr(curr_state, agg_arg, fparam) percent_state_t *curr_state; mi_integer agg_arg; MI_FPARAM *fparam; { if ( mi_fp_argisnull(fparam, 1) == MI_TRUE ) agg_arg = 0; if ( agg_arg > curr_state->gtr_than ) curr_state->total_gtr += 1; curr_state->total += 1; return (curr_state); }
The PERCENT_GTR aggregate is defined to handle NULL values (see Figure 110). This ITER function must check for a possible NULL aggregate argument. The function converts any NULL value to a zero (0) so that the numeric comparison can occur.
The following COMBINE aggregate support function handles an INTEGER argument for the PERCENT_GTR aggregate:
/* PERCENT_GTR COMBINE support function for INTEGER */ percent_state_t *combine_percentgtr(state1, state2) percent_state_t *state1; percent_state_t *state2; { state1->total += state2->total; state1->total_gtr += state2->total_gtr; mi_free(state2); return(state1); }
Because PERCENT_GTR does not have a simple state, its COMBINE function must explicitly perform the merging of two parallel threads, as follows:
The following code shows the FINAL aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:
/* PERCENT_GTR FINAL support function for INTEGER */ mi_decimal *final_percentgtr(final_state) percent_state_t *final_state; { mi_double_precision quotient; mi_decimal return_val; mi_integer ret; quotient = ((mi_double_precision)(final_state->total_gtr)) / ((mi_double_precision)(final_state->total)) * 100; if ( (ret = deccvdbl(quotient, &return_val)) < 0 ) ret = deccvasc("0.00", 4, &return_val); return (&return_val); }
The PERCENT_GTR aggregate returns a data type different from the aggregate state. The FINAL function must convert the final state from the aggregate-state data type (percent_state_t) to the aggregate-result data type (DECIMAL).
Once you have successfully compiled and linked the aggregate support functions, you can define the PERCENT_GTR aggregate in the database. For a user-defined aggregate that uses an opaque-type state, this definition includes the following steps:
The PERCENT_GTR aggregate uses a fixed-length opaque type, percent_state_t, to hold its opaque-type state. The following CREATE OPAQUE TYPE statement registers this opaque type:
CREATE OPAQUE TYPE percent_state_t (INTERNALLENGTH = 12);
The INTERNALLENGTH modifier specifies the size of the fixed-length C data structure, percent_state_t, that holds the opaque-type state.
Figure 110 shows the CREATE AGGREGATE statement that defines the PERCENT_GTR user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions. It also includes the HANDLESNULLS modifier to indicate that the PERCENT_GTR aggregate does process NULL values as aggregate arguments. By default, the database server does not pass a NULL value to an aggregate.
CREATE AGGREGATE percent_gtr WITH (INIT = init_percent_gtr, ITER = iter_percent_gtr, COMBINE = combine_percent_gtr, FINAL = final_percent_gtr, HANDLESNULLS);
Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the PERCENT_GTR aggregate are compiled and linked into a shared-object module named percent.
On UNIX or Linux, the executable code for the PERCENT_GTR aggregate support functions would be in a shared library named percent.so.
The following CREATE FUNCTION statements register the PERCENT_GTR aggregate support functions for an aggregate argument of the INTEGER data type:
CREATE FUNCTION init_percent_gtr(dummy INTEGER, gtr_val INTEGER) RETURNING percent_state_t WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(init_percentgtr)' LANGUAGE C; CREATE FUNCTION iter_percent_gtr(state percent_state_t, one_value INTEGER) RETURNS percent_state_t WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(iter_percentgtr)' LANGUAGE C; CREATE FUNCTION combine_percent_gtr(state1 percent_state_t, state2 percent_state_t) RETURNS percent_state_t WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(combine_percentgtr)' LANGUAGE C; CREATE FUNCTION final_percent_gtr (state percent_state_t) RETURNS DECIMAL(5,2) WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(final_percentgtr)' LANGUAGE C;
These CREATE FUNCTION statements register an SQL name for each of the aggregate support functions that you have written in C. They must all include the HANDLESNULLS routine modifier because the PERCENT_GTR aggregate handles NULL values.
The following query uses the PERCENT_GTR aggregate function on the INTEGER column, col3, to determine the percentage of values greater than 25:
SELECT PERCENT_GTR(col3, 20) FROM tab1;
With the rows that Figure 95 has inserted, the preceding query yields a DECIMAL(5,2) value of 33.33 percent: 2 of the 6 values are greater than 20 (24 and 31).
Suppose you want to create a user-defined aggregate that calculates the x-percentile for a group of values. The x-percentile is the number within the group of values that separates x percent of the values below and (100-x) percent above. The median is a special case of the x-percentile. It represents the 50th-percentile:
X_PERCENTILE(y, 50)
That is, the above aggregate returns the value within a sample of y values that has an equal number of values (50 percent) above and below it in the sample.
The implementation of this UDA uses the following aggregate features:
The X_PERCENTILe user-defined aggregate needs the following state information:
Therefore, X_PERCENTILE uses a C structure named percentile_state_t to hold the aggregate state:
#define MAX_N 1000 typedef struct percentile_state { mi_integer percentile; mi_integer count; mi_integer value_array[MAX_N]; mi_integer value_is_null[MAX_N]; } percentile_state_t;
The size of the percentile_state_t structure depends on the number of aggregate arguments stored in the value_array array; that is, values less then or equal to the MAX_N constant. On a system with four-byte mi_integer values, the size of this structure is:
8 + 4(MAX_N)
If X_PERCENTILE used an opaque-type state, this structure must be less than the maximum opaque-type size. For systems that have a 32 kilobyte maximum opaque-type size, the X_PERCENTILE aggregate could use an opaque-type state as long as it is called in a query that finds 8190 or fewer rows. If the query finds more than 8190 rows, the state would not fit into an opaque type. To avoid this restriction, X_PERCENTILE implements the aggregate state as a pointer-valued state.
The following code shows the INIT aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE INIT support function on INTEGER */ mi_pointer init_xprcnt(dummy, prcntile, fparam) mi_integer dummy; mi_integer prcntile; MI_FPARAM *fparam; { percentile_state_t *state; /* Allocate memory for the state from the PER_COMMAND * pool */ state = (percentile_state_t *) mi_dalloc(sizeof(percentile_state_t), PER_COMMAND); /* Initialize the aggregate state */ if ( mi_fp_argisnull(fparam, 1) ) state->percentile = 50; /* median */ else state->percentile = prcntile; state->count = 0; return ((mi_pointer)state); }
This INIT support function performs the following tasks:
This set-up argument is the value that the end user specifies so that the aggregate can determine the value that has x percent values below and (100-x) percent above. If the end user provides an SQL NULL for the set-up argument, X_PERCENTILE assumes a value of 50 and therefore calculates the median.
The database server does not perform state management for pointer-valued states. Therefore, the INIT function must allocate the memory for the state. It also assigns the appropriate values to the percentile_state_t structure to initialize the state.
The following code implements the ITER aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE ITER support function on INTEGER */ mi_pointer iter_xprcnt(state_ptr, value, fparam) mi_pointer state_ptr; mi_integer value; MI_FPARAM *fparam; { mi_integer i, j; mi_integer is_null = 0; percentile_state_t *state = (percentile_state_t *)state_ptr; /* Check for NULL-valued 'value' */ if ( mi_fp_argisnull(fparam, 1) ) { value = 0; is_null = 1; } /* Find position of 'value' in ordered 'value_array' */ for ( i=0; i < state->count; i++ ) { if ( state->value_array[i] > value ) break; } /* Increment number of values (count) */ ++state->count; /* Put value into ordered list of existing values */ for (j=state->count - 1; j > i; j--) { state->value_array[j] = state->value_array[j-1]; state->value_is_null[j] = state->value_is_null[j-1]; } state->value_array[i] = value; state->value_is_null[i] = is_null; return ((mi_pointer)state); }
The ITER support function updates the aggregate state in-place with the following information:
The ITER function also handles a possible NULL-valued aggregate argument. Because the X_PERCENTILE aggregate is defined to handle NULL values (see Figure 111), the database server calls ITER for NULL-valued aggregate arguments.
The following COMBINE aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE COMBINE support function on INTEGER */ mi_pointer combine_xprcnt(state1_ptr, state2_ptr) mi_pointer state1_ptr, state2_ptr; { mi_integer i; percentile_state_t *state1 = (percentile_state_t *)state1_ptr; percentile_state_t *state2 = (percentile_state_t *)state2_ptr; /* Merge the two ordered value arrays */ for ( i=0; i < state2->count; i++ ) (void) iter_xprcnt(state1_ptr, state2->value_array[i]); /* Free the PER_COMMAND memory allocated to the state of * the 2nd parallel thread (state2). The memory * allocated to the state of the 1st parallel thread * (state1) holds the updated state. It is in the FINAL * support function. */ mi_free(state2); return (state1_ptr); }
This COMBINE support function merges two aggregate states, as follows:
The following FINAL aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE FINAL support function on INTEGER */ mi_integer final_xprcnt(state_ptr, fparam) mi_pointer state_ptr; MI_FPARAM *fparam; { mi_integer index, trunc_int; mi_integer x_prcntile; percentile_state_t *state = (percentile_state_t *)state_ptr; /* Obtain index position of x-percentile value */ trunc_int = (state->count) * (state->percentile); index = trunc_int/100; if ( (trunc_int % 100) >= 50 ) index++; /* Obtain x-percentile value from sorted 'value_array' */ x_prcntile = state->value_array[index]; /* Check for NULL value so it can be returned as such */ if ( state->value_is_null[index] ) mi_fp_setreturnisnull(fparam, 0, MI_TRUE); /* Free the PER_COMMAND memory allocated to the state */ mi_free(state); /* Return retrieved x-percentile value */ return (x_prcntile); }
This FINAL support function performs the following tasks:
The FINAL function must obtain the index position for the value that represents the x-percentile, where x is the percentage that the end user has passed in as a set-up argument.
The database server does not perform any state management for pointer-valued states. Therefore, the FINAL function must deallocate the PER_COMMAND state memory that the INIT function has allocated.
After you successfully compile and link the aggregate support functions, you can define the PERCENT_GTR aggregate in the database. Figure 111 shows the CREATE AGGREGATE statement that defines the X_PERCENTILE user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions. It also includes the HANDLESNULLS modifier to indicate that the PERCENT_GTR aggregate does process NULL values as aggregate arguments. By default, the database server does not pass a NULL value to an aggregate.
CREATE AGGREGATE x_percentile WITH (INIT = init_x_prcntile, ITER = iter_x_prcntile, COMBINE = combine_x_prcntile, FINAL = final_x_prcntile, HANDLESNULLS);
Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the X_PERCENTILE aggregate are compiled and linked into a shared-object module named percent.
On UNIX or Linux, the executable code for the X_PERCENTILE aggregate support functions would be in a shared library named percent.so.
The following CREATE FUNCTION statements register the X_PERCENTILE aggregate support functions for an aggregate argument of the INTEGER data type:
CREATE FUNCTION init_x_prcntile(dummy INTEGER, x_percent INTEGER) RETURNING POINTER WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(init_xprcnt)' LANGUAGE C; CREATE FUNCTION iter_x_prcntile(agg_state POINTER, one_value INTEGER) RETURNS POINTER WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(iter_xprcnt)' LANGUAGE C; CREATE FUNCTION combine_x_prcntile(agg_state1 POINTER, agg_state2 POINTER) RETURNS POINTER WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(combine_xprcnt)' LANGUAGE C; CREATE FUNCTION final_x_prcntile(agg_state POINTER) RETURNS INTEGER WITH (HANDLESNULLS) EXTERNAL NAME '/usr/udrs/aggs/percent/percent.so(final_xprcnt)' LANGUAGE C;
These CREATE FUNCTION statements use the SQL data type, POINTER, to indicate that the aggregate support functions accept a generic C pointer and perform their own memory management. They must all include the HANDLESNULLS routine modifier because the X_PERCENTILE aggregate handles NULL values.
The following query uses the X_PERCENTILE aggregate function on the INTEGER column, col3, to determine the quartile (the 25th percentile) for the values in col3:
SELECT X_PERCENTILE(col3, 25) FROM tab1;
For the tab1 rows that Figure 95 has inserted, X_PERCENTILE creates the following sorted list for the col3 values:
5, 9, 13, 19, 24, 31
Because 25 percent of 6 values is 1.5, X_PERCENTILE obtains the list item that has 2 values (1.5 rounded up to the nearest integer) below it. The preceding query returns 13 as the quartile for col3.
Suppose you add the following row to the tab1 table:
INSERT INTO tab1 (7, NULL:complexnum_t, NULL);
This INSERT statement adds a NULL value to the col3 column. Because X_PERCENTILE handles NULLs, the database server calls the X_PERCENTILE aggregate on this new row as well. After this seventh row is inserted, X_PERCENTILE would generate the following sorted list for col3:
(NULL), 5, 9, 13, 19, 24, 31
Twenty-five percent of 7 values is 1.75, so X_PERCENTILE obtains the list item that has 2 (1.75 truncated to the nearest integer) values below it. Now the quartile for col3 would be 9. If X_PERCENTILE was not registered with the HANDLESNULLS modifier, however, the database server would not call X_PERCENTILE for this newest row and the quartile for col3 would have been 13 (the quartile for 6 rows, even though col3 actually has 7 rows).
If you called the X_PERCENTILE aggregate with an x-percentile that would return the first value in the list (the NULL value), the FINAL support function uses the DataBlade API function mi_fp_setreturnisnull( ) to set the aggregate result to NULL. For example, suppose you execute the following query on the col3:
SELECT X_PERCENTILE(col3, 5) FROM tab1;
This query asks for the 5th percentile for the seven values in col3. Because 5 percent of 7 values is 0.35, X_PERCENTILE obtains the list item that has zero values (0.35 truncated to the nearest integer) below it. The preceding query returns NULL as the quartile for col3. The ITER function has stored NULL values as zeros in the sorted value_array. For the FINAL support function to determine when a value of zero indicates a NULL and when it indicates zero, it checks the value_is_null array. If the zero indicates a NULL value, FINAL calls the DataBlade API function mi_fp_setreturnisnull( ) to set the aggregate result to NULL.
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]