informix
Informix DataBlade API Programmer's Manual
Creating Special-Purpose UDRs

Writing an Aggregate Function

An aggregate is a function that returns one value for a group of queried rows. The aggregate function performs one iteration for each of the queried rows. For each row, an aggregate iteration receives one column value (called the aggregate argument). The value that the aggregate returns to the SQL statement is called the aggregate result. For example, the following query calls the built-in SUM aggregate to determine the total cost of item numbers in order 1002:

For this invocation of the SUM aggregate, each value of the total_price column that is passed into SUM is one aggregate argument. The total sum of all total_price values, which SUM returns to the SELECT, is the aggregate result.

The database server supports extension of aggregates in the following ways:

You can write C user-defined functions to implement these aggregate extensions. For an overview of how to create aggregate functions and how to write them in an SPL routine, see the chapter on this topic in the Extending Informix Dynamic Server 2000 manual. The following sections provide information specific to the creation of aggregate functions as C user-defined functions.

Tip: The Informix BladeSmith development tool automatically generates C source code for a user-defined aggregate as well as the SQL statements to register the aggregate function. For more information, see the "DataBlade Developers Kit User's Guide."

Extending a Built-In Aggregate

To extend a built-in aggregate function with a C user-defined function, follow these steps:

  1. Determine the appropriate operator function that you must overload to implement the desired built-in aggregate function.
  2. For a list of built-in aggregate functions and the associated operator functions to overload, see the chapter on aggregate functions in the Extending Informix Dynamic Server 2000 manual.

  3. Write the C UDR that implements the required operator function for the data type that you want the aggregate to handle.
  4. To extend built-in aggregates so that they handle user-defined data types, write an operator function that accepts the user-defined data type as an argument. Compile the C UDR and link it into a shared-object file.

  5. Register the overloaded operator function with the CREATE FUNCTION statement.
  6. Use the newly extended aggregate on the data.

Suppose you want to be able to use the SUM aggregate on complex numbers, which are stored in the following user-defined data type: a named row type called complexnum_t. Figure 13-5 shows the CREATE ROW TYPE statement that registers the complexnum_t named row type.

Figure 13-5
A Named Row Type to Hold a
Complex Number

The following sections show how to extend the SUM aggregate on the complexnum_t named row type.

Choosing the Operator Function

The SUM built-in aggregate function uses the plus operator (+), which the plus() user-defined function implements. The database server provides implementations of the plus() function over the built-in data types. Therefore, the SUM aggregate function works over built-in data types. To have the SUM aggregate operate on the complexnum_t row type, you implement a plus() function that handles this named row type; that is, it adds the two parts of the complex number and returns a complex number with the summed parts.

The following C function, complex_plus(), defines such a plus() function:

Writing the Operator Function

The code segment shows the implementation of the complex_plus() function, which implements a plus() function for the complexnum_t data type:

This version of the plus() function performs the following tasks:

Once the complex_plus() function is written, you compile it and put it into a shared-object file. Suppose that complex_plus() is compiled and linked into a shared-object module called sqsum.

On a Solaris system, the executable code for the complex_plus() operator function would be in a shared library called sqsum.so.

For more information, see Compiling a C UDR.

To extend a built-in aggregate over a user-defined data type, you overload the appropriate operator function to handle the user-defined type. However, operator functions can also be used as part of an expression that does not involve aggregates. Therefore, aggregate support functions for built-in aggregates on user-defined data types (opaque types, distinct types, and named row types) must allocate a new state when they need to modify the state.

For example, the following SUM aggregate uses the overloaded plus() operator to calculate the sum of values in the col1 column:

For each aggregate argument, the SUM aggregate invokes the plus() operator to add the aggregate argument (agg_arg) into the sum of the previous values in the aggregate state (agg_state), as follows:

When you modify the aggregate state in-place, the value of the agg_state argument to plus() changes. When plus() exits, the agg_state argument holds the new sum of the aggregate arguments, which includes the agg_arg value.

However, the plus() function is also valid in expressions that do not involve aggregates, as in the following query:

In this WHERE clause, the database server invokes the plus() operator to add 4 to the col1 value, as follows:

If the plus() operator modifies the aggregate state in-place, the value of its first argument changes to hold the sum of col1 and 4. It is not safe to modify arguments in place because the values of arguments (col1 and 4) must not change. Therefore, when you modify the aggregate state in an operator function of a built-in aggregate, you must be careful not to use the "in-place" modification method.

Registering the Overloaded Operator Function

With the operator function written, compiled, and linked into a shared-object file, you can register this function in the database with the CREATE FUNCTION statement. You must have the appropriate privileges for this registration to be successful. For more information, see the chapter on user-defined aggregates in the Extending Informix Dynamic Server 2000.

Figure 13-6 shows the CREATE FUNCTION statement that overloads the plus() function with a new version that handles the complexnum_t named row type.

Figure 13-6
Registering the Overloaded plus() Function

Tip: Because SUM is a built-in aggregate, you do not have to use the CREATE AGGREGATE statement to define the SUM aggregate.

Using the Extended Aggregate

Once you execute the CREATE FUNCTION statement in Figure 13-6 on page 13-22, you can use the SUM aggregate on complexnum_t columns. For example, suppose you create the tab1 table as Figure 13-7 shows.

Figure 13-7
A Table With a complexnum_t Column

The following query uses the SUM aggregate function on the complexnum_t column, col2:

With the rows that Figure 13-7 has inserted, the preceding query yields a complexnum_t value of:

As a side effect of the new plus() function, you can also add two complexnum_t columns in an SQL expression, as follows:

Creating a User-Defined Aggregate

The built-in aggregates provide basic aggregations. However, if your data requires some special aggregation, you can create a custom aggregate function, called a user-defined aggregate. To implement your custom aggregation, you design an aggregate algorithm, which consists of the following parts:

To implement a user-defined aggregate function with C user-defined functions, follow these steps:

  1. Determine the content and data type of the aggregate state.
  2. Write the C UDRs that implements the required aggregate support functions for the data type on which you want to implement the user-defined aggregate.
  3. Define the user-defined aggregate in the database with the CREATE AGGREGATE and CREATE FUNCTION statements.

Once you have completed these steps, you can use the aggregate in an SQL statement.

The following sections describe each of these development steps in more detail and use the SQSUM1 user-defined aggregate (which Figure 13-8 describes) as an example.

Figure 13-8
A Sample User-Defined Aggregate

User-Defined Aggregate Description Definition
SQSUM1 Sums all values and calculates the square of this sum (x1 + x2 + x3 + ... )2

Determining the Aggregate State

An aggregate is a series of iterations. Each iteration processes one aggregate argument (which contains one column value) and performs the necessary computations to merge it into a partial result. The partial result is a snapshot of the aggregate arguments that the aggregate has merged so far. Once the aggregate has received all column values, it returns a value to the calling statement, based on the final partial result.

Each iteration of the aggregate is a separate invocation of a user-defined function. If user-allocated memory has the default PER_ROUTINE memory duration, the database server automatically deallocates it after only one iteration of the iteration function. (For more information, see Determining Memory Duration.) Therefore, while an iteration executes, it can access only the following information:

During its invocation, each aggregate iteration merges the aggregate argument into the aggregate state and returns the updated state. The database server preserves the updated aggregate state and passes it into the next iteration of the aggregate. When you create a user-defined aggregate, you must determine what non-local information each iteration needs and then define an aggregate state to contain this information. Otherwise, the aggregate iterations cannot obtain the information they need to perform their computations.

Important: Design the aggregate state so that each aggregate support function can obtain all the state information that it needs.

As a starting point, try using the data type of the aggregate argument for the aggregate state. Such a state is called a simple state. For more information, see Aggregate Support Functions for the Aggregate State.

For example, to determine the state for the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes), assess what tasks need to be performed in each iteration of SQSUM1. For each aggregate argument, SQSUM1 needs to add together the following values:

The data type that you choose for the aggregate state affects how the state must be managed. When the SQSUM1 aggregate receives INTEGER values as its aggregate arguments, the sum of these values is also an INTEGER value. Therefore, the SQSUM1 aggregate has an integer state, which holds the partial sum.

Writing the Aggregate Support Functions

An aggregate support function is a special-purpose user-defined function that implements some task in the aggregate algorithm. You write aggregate support functions to initialize, calculate, and return the aggregate result to the calling code. An aggregate algorithm can include the following kinds of aggregate support functions.

Aggregate Support Function Algorithm Step
INIT Possible initialization tasks that must be performed before the iterations can begin
ITER An iteration step, which is performed on each aggregate argument and merges this argument into a partial result
COMBINE Merging one partial result with another partial result, thus allowing parallel execution of the user-defined aggregate
FINAL Post-iteration tasks that must be performed after all aggregate arguments were merged into a partial result

The following sections summarize each of the aggregate support functions.

INIT Function

The INIT aggregate support function performs the initialization for the user-defined aggregate. Figure 13-9 summarizes possible initialization tasks.

Figure 13-9
Initializations Tasks for the INIT Aggregate Support Function

Initialization Task For More Information
Set-up for any additional resources outside the state that the aggregation might need Aggregate Support Functions that the Algorithm Requires
Initial calculations that the user-defined aggregate might need Aggregate Support Functions that the Algorithm Requires
Allocation and initialization of the aggregate state that the rest of the aggregation computation might need Aggregate Support Functions for the Aggregate State
Handling of an optional set-up argument Implementing a Set-Up Argument

The INIT support function is an optional aggregate support function. If your aggregate algorithm does not require any of the tasks in Figure 13-9, you do not need to define an INIT function. When you omit the INIT function from your user-defined aggregate, the database server performs the state management for the aggregate state. For more information, see Handling a Simple State.

To declare an INIT support function as a C function, use the following syntax:

In the execution of a UDA, the database server calls the INIT function before it begins the actual aggregation computation. It passes in any optional set-up argument (set_up_arg) from the user-defined aggregate and copies any initialized aggregate state that INIT returns into the state buffer. For more information on the state buffer, see Aggregate Support Functions for the Aggregate State.

The first argument of the INIT function serves only to identify the data type of the aggregate argument that the user-defined aggregate handles. The routine manager uses this argument in routine resolution to determine the correct version of the overloaded INIT function. At the time of invocation, the routine manager just passes in a NULL value for the first argument of the INIT function, as the following syntax shows:

Tip: For more information on how the routine manager resolves the overloaded aggregate support functions, see the chapter on aggregates in the "Extending Informix Dynamic Server 2000."

Figure 13-10 shows the INIT aggregate support function that handles an INTEGER argument for the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes).

Figure 13-10
INIT Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-11 on page 13-30, Figure 13-12 on page 13-32, and Figure 13-14 on page 13-34.

ITER Function

The ITER aggregate support function performs the sequential aggregation or iteration for the user-defined aggregation. It merges a single aggregate argument into the partial result, which the aggregate state contains. The ITER function is a required aggregate support function; that is, you must define an ITER function for every user-defined aggregate. If you do not define an ITER function for your user-defined aggregate, the database server generates an error.

Tip: If the UDA does not have an INIT support function, the ITER support function can initialize the aggregate state. For more information, see Handling a Simple State.

If the UDA was registered with the HANDLESNULLS modifier in the CREATE AGGREGATE statement, the database server calls the ITER support function once for each aggregate argument that passed to the user-defined aggregate. Each aggregate argument is one column value. If you omit the HANDLESNULLS modifier from CREATE AGGREGATE, the database server does not call ITER for any NULL-valued aggregate arguments. Therefore, NULL-valued aggregate arguments do not contribute to the aggregate result.

For example, suppose you execute the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes) in the following query:

The tab1 table (which Figure 13-7 on page 13-23 defines) contains 6 rows. Therefore, the preceding query (which contains no WHERE clause) causes 6 invocations of the SQSUM1 ITER function. Each invocation of this ITER function processes one value from the col3 column.

To declare a ITER function as a C function, use the following syntax:

Important: Make sure that the ITER support function obtains all state information that it needs from its "current_state" argument. The INIT function cannot maintain additional state information as user data in its MI_FPARAM structure because MI_FPARAM is not shared among the other aggregate support functions. However, the ITER function can store user data in MI_FPARAM that is not part of the aggregate result.

Figure 13-11 shows the ITER aggregate support function that handles an INTEGER argument for the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes).

Figure 13-11
ITER Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-10 on page 13-28, Figure 13-12 on page 13-32, and Figure 13-14 on page 13-34.

COMBINE Function

The COMBINE aggregate support function allows your user-defined aggregate to execute in a parallel query. When a query that contains a user-defined aggregate is processed in parallel, each parallel thread operates on a one subset of selected rows. The COMBINE function merges the partial results from two such subsets. This aggregate support function ensures that the result of aggregating over a group of rows sequentially is the same as aggregating over two subsets of the rows in parallel and then combining the results.

The COMBINE function is required for parallel execution. When a query includes a user-defined aggregate, the database server uses parallel execution when the query includes only aggregates. However, the COMBINE function might be used even when a query is not parallelized. For example, when a query contains both distinct and non-distinct aggregates, the database server can decompose the computation of the non-distinct aggregate into sub-aggregates based on the distinct column values. Therefore, you must provide a COMBINE function for every user-defined aggregate.

If you do not define an COMBINE function for your user-defined aggregate, the database server generates an error. However, if your user-defined aggregate uses a simple state, the COMBINE function can be the same as the ITER function. For more information, see Handling a Simple State.

To declare a COMBINE function as a C function, use the following syntax:

In the execution of a UDA, the database server calls the COMBINE once for each pair of threads (agg_state1 and agg_state2) that execute a parallel query that contains the user-defined aggregate. When the COMBINE function combines two partial results, it might also need to release resources associated with one of the partial results.

Figure 13-12 shows the COMBINE aggregate support function that handles an INTEGER argument for the SQSUM user-defined aggregate (which Figure 13-8 on page 13-24 describes).

Figure 13-12
COMBINE Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-10 on page 13-28, Figure 13-11 on page 13-30, and Figure 13-14 on page 13-34. For more information on parallel execution of a UDA, see Executing a User-Defined Aggregate in Parallel Queries.

FINAL Function

The FINAL aggregate support function performs the post-iteration tasks for the user-defined aggregate. Figure 13-13 summarizes possible post-iteration tasks.

Figure 13-13
Post-Iteration Tasks for the FINAL Aggregate Support Function

Post-Iteration Task For More Information
Type conversion of the final state into the return type of the user-defined aggregate Returning an Aggregate Result Different from the Aggregate State
Post-iteration calculations that the user-defined aggregate might need Aggregate Support Functions that the Algorithm Requires
Deallocation of memory that the INIT aggregate support function has allocated Managing a Pointer-Valued State

The FINAL function is an optional aggregate support function. If your user-defined aggregate does not require one of the tasks in Figure 13-13, you do not need to define a FINAL function. When you omit the FINAL function from your user-defined aggregate, the database server returns the final aggregate state as the return value of the user-defined aggregate. If this state does not match the expected data type of the aggregate return value, the database server generates a data-type-mismatch error.

Important: In general, the FINAL support function must not deallocate the aggregate state. Only for a pointer-valued state (in which the aggregate support functions must handle all state management) does the FINAL support function need to deallocate the state. For more information, see Managing a Pointer-Valued State.

To declare a FINAL function as a C function, use the following syntax:

In the execution of a UDA, the database server calls the FINAL function after all iterations of the ITER function are complete.

Figure 13-14 shows the aggregate support functions that handle an INTEGER argument for the SQSUM user-defined aggregate (which Figure 13-8 on page 13-24 describes).

Figure 13-14
FINAL Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-10 on page 13-28, Figure 13-11 on page 13-30, and Figure 13-12 on page 13-32.

Defining the User-Defined Aggregate

To define a user-defined aggregate in a database, you perform these SQL steps:

The development steps for a UDA list the definition of the UDA after the aggregate support functions are written. However, the CREATE AGGREGATE statement does not verify that the aggregate support functions it lists were registered nor does the CREATE FUNCTION statement verify that the executable C code exists. Therefore, you can define the UDA before you create the C implementation of the aggregate support functions. However, you must ensure that the names of the C functions match the names in the CREATE FUNCTION statements that register them.

Figure 13-15 shows the CREATE AGGREGATE statement that defines the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes) in the database.

Figure 13-15
Registering the SQSUM1
User-Defined Aggregate

Suppose that the INIT, ITER, COMBINE and FINAL aggregate support functions for the SQSUM1 aggregate are compiled and linked into a shared-object module called sqsum.

On a Solaris system, the executable code for the SQSUM1 aggregate support functions would be in a shared library called sqsum.so.

Figure 13-16 shows the CREATE FUNCTION statements that register the aggregate support functions for the SQSUM1 user-defined aggregate to handle the INTEGER data type.

Figure 13-16
Registering the Aggregate Support Functions for SQSUM1
to Handle INTEGER

Notice that the registered names of the aggregate support functions must match the names that the CREATE AGGREGATE statement lists. For example, in Figure 13-16, a CREATE FUNCTION statement registers the INIT support function as init_sqsum1, which is the same name that the INIT option lists in the CREATE AGGREGATE statement (see Figure 13-15 on page 13-35).

In addition, the CREATE FUNCTION for the INIT support function must include the HANDLESNULLS routine modifier so that the database server can pass the INIT function the dummy NULL-valued argument.

For more information on the use of the CREATE AGGREGATE and CREATE FUNCTION statements to define user-defined aggregates, see the chapter on aggregates in the Extending Informix Dynamic Server 2000. For the syntax of these statements, see the Informix Guide to SQL: Syntax.

Using the User-Defined Aggregate

Once you have completed the aggregate-development steps, the end user can use the user-defined aggregate on the defined data type in SQL statements. However, use of the user-defined aggregate does assume the appropriate privileges.

For the tab1 table, which Figure 13-7 on page 13-23 defines, the following query uses the new SQSUM1 aggregate function on the INTEGER column, col3:

With the rows that Figure 13-7 has inserted, the preceding query yields an INTEGER value of 10201.

To be able to use SQSUM1 on other data types, you need to ensure that the appropriate aggregate support functions exist for this data type. For example, SQSUM2 User-Defined Aggregate shows the definition of a version of the SQSUM aggregate on both an INTEGER and a named row type.

Determining Required Aggregate Support Functions

Figure 13-17 shows the execution sequence of aggregate support functions for a user-defined aggregate that is not executed in a parallel query.

Figure 13-17
Execution of a
Non-Parallel UDA

Tip: For information about how to execute a user-defined aggregate in parallel queries, see Executing a User-Defined Aggregate in Parallel Queries.

Notice how these aggregate support functions use the aggregate state to pass shared information between themselves.

As you design your aggregate algorithm, you must determine which of the support functions the algorithm requires. As a minimum, the user-defined aggregate must have an ITER function. It is the ITER function that performs a single iteration of the aggregate on one aggregate argument. Although Figure 13-17 on page 13-37 shows the execution of both the INIT and FINAL support functions, these functions are optional for a user-defined aggregate. In addition, the COMBINE function, though required, often does not require separate code; it can simply call the ITER function.

Writing aggregate support functions that your user-defined aggregate does not require means unnecessary coding and execution time. Therefore, it is important to assess your aggregate for required functions. The following table shows the design decisions in the determination of required aggregate support functions.

Design Decision Aggregate Support Functions Involved For More Information
Does the algorithm require initialization or clean-up tasks? INIT and FINAL Aggregate Support Functions that the Algorithm Requires
Does the aggregate have a simple aggregate state? INIT, COMBINE, and FINAL YES: Handling a Simple State NO: Handling a Non-Simple State
Does the aggregate have a set-up argument? INIT Implementing a Set-Up Argument
Does the aggregate return a value whose data type is different from the aggregate state? FINAL Returning an Aggregate Result Different from the Aggregate State
Does the aggregate have special needs to run in a parallel query? COMBINE Executing a User-Defined Aggregate in Parallel Queries

You can overload the aggregate support functions to provide support for different data types. However, any overloaded version of the UDA cannot:

When the database server executes a UDA (regardless of the data type of the aggregate argument), the database expects to find all the aggregate support functions that the CREATE AGGREGATE statement has registered. Therefore, if you omit a support function for one of the reasons in the preceding table, all versions of the aggregate for all data types must be able to execute using only the aggregate support functions that CREATE AGGREGATE specifies.

Aggregate Support Functions that the Algorithm Requires

To implement a user-defined aggregate, you must develop an algorithm that calculates an aggregate return value based on the aggregate-argument values. You need to break this algorithm into the following steps.

Algorithm Step Aggregate Support Function
Calculations or initializations that must be done before the iterations can begin INIT
Calculations that must be done on each aggregate argument to merge this argument into a partial result ITER
Post-iteration tasks must be performed after all aggregate arguments were merged into a partial result FINAL

All aggregate algorithms must include an ITER function to handle each aggregate argument. However, the INIT and FINAL support functions are optional. To determine whether your algorithm requires an INIT or FINAL function, make the following design assessments:

For example, the following algorithm defines the SQSUM1 aggregate (which Figure 13-8 on page 13-24 describes):

where each xi is one column value; that is, one aggregate argument. The ITER function for SQSUM1 takes a single aggregate argument and adds it to a partial sum (see Figure 13-11 on page 13-30). The algorithm does not require initialization of additional resources. Therefore, no INIT function is required for this task. However, the INIT function can initialize the partial result (see Figure 13-10 on page 13-28).

The SQSUM1 user-defined aggregate does require post-iteration calculations. When the last iteration is reached, the partial sum needs to be squared to obtain the aggregate return value. This final calculation is performed in a FINAL function and returned as the return value for the SQSUM1 aggregate (see Figure 13-14 on page 13-34).

The SUMSQ user-defined aggregate (described on page 13-56) is an example of a user-defined aggregate that requires neither initialization nor post-iteration tasks. Therefore, it does not require the INIT and FINAL support functions.

Aggregate Support Functions for the Aggregate State

The aggregate support functions pass information about the aggregate among themselves in the aggregate state.

Tip: For an explanation of the aggregate state, see Determining the Aggregate State.

The database server invokes all aggregate support functions as regular user-defined routines. Therefore, each support function has a default memory duration of PER_ROUTINE, which means that the database server frees any memory that the support function allocates when that function terminates. However, the aggregate state must be valid across all invocations of all aggregate support functions, including possible multiple iterations of the ITER function. Therefore, a special state buffer (with a PER_COMMAND memory duration) must exist so that the aggregate state is available to all aggregate support functions. The database server then passes this state buffer to each invocation of an aggregate support function.

The purpose of the INIT support function is to return a pointer to the initialized aggregate state. To determine whether your aggregate state requires an INIT support function for state management, assess the data type of this state. The aggregate-state data type determines whether the INIT function must handle state management, as follows:

Once the user-defined aggregate has an aggregate state, the database server passes a pointer to this state buffer to each invocation of the ITER function and to the FINAL function.

Handling a Simple State

A simple state is an aggregate state whose data type is the same as the aggregate argument. At any point in the iteration, a simple state contains only the partial result of the aggregation. For example, the SUM built-in aggregate uses a simple state because its state contains only the partial result: the running total of the aggregate arguments. When the SUM aggregate operates on INTEGER aggregate arguments, it creates an integer partial sum for these arguments. Therefore, the data type of its aggregate argument and aggregate state is the same.

However, the AVG built-in aggregate does not use a simple state. Because it must divide the total by the number of values processed, its state requires two values: the running total and the number of arguments processed. When the AVG aggregate operates on INTEGER aggregate arguments, it creates an integer partial sum and an integer count for these arguments. Therefore, the data type of its aggregate argument (INTEGER) cannot be not the same as its aggregate state (two INTEGER values).

When a user-defined aggregate has a simple state, you can take the following steps in your aggregate support functions:

When a UDA does not include an INIT function, the database server takes the following state-management steps:

When a UDA does not include a FINAL function, the database server passes the final state as the aggregate result of the user-defined aggregate.

The implementation on the SQSUM1 aggregate includes an INIT support function that initializes the aggregate state (see Figure 13-10 on page 13-28). However, because SQSUM1 has a simple state, this INIT function is not required. Instead, an ITER function can check for a NULL-valued state and perform the state initialization. The ITER support function of the SQSUM2 aggregate (see Figure 13-21 on page 13-59) shows this type of implementation.

The SUMSQ user-defined aggregate (described on page 13-56) also has a simple state and therefore does not require an INIT support function for state management.

Handling a Non-Simple State

When the data type of the aggregate argument is not adequate for the state information, you must use a non-simple state for your UDA. A non-simple state is an aggregate state whose data type is not the same as the aggregate argument. Possible uses for a non-simple state include an aggregate state that contains:

When the aggregate-argument data type is not adequate for the state information, you must determine the appropriate data structure to hold the state information. This data structure depends upon the size of the aggregate state that you need to maintain, as the following table shows.

Non-Simple State Description For More Information
Single-valued state Consists of information that can be stored in an Informix built-in data type Managing a Single-Valued State
Opaque-type state Consists of several values, but these values do not exceed the maximum size of an opaque data type Managing an Opaque-Type State.
Pointer-valued state Consists of values whose size does exceed the maximum size of an opaque data type Managing a Pointer-Valued State.

If your user-defined aggregate uses a single-valued or opaque-type state, the database server can still perform the state management. However, it cannot provide all necessary state management for a pointer-valued state.

Managing a Single-Valued State

A single-valued state uses a built-in SQL data type to hold the aggregate state. Use a single-valued state for an aggregate state that can fit into a built-in data type but whose data type does not match that of the aggregate argument.

Tip: Built-in SQL data types are those data types that Informix provides. For more information, see the chapter on data types in the "Informix Guide to SQL: Reference."

To use a single-valued state for a UDA, follow these steps:

  1. Write the appropriate aggregate support functions so that they handle a single-valued state.
  2. Declare the state parameters and return values of the aggregate support functions to use the DataBlade API data type that corresponds to the built-in SQL data type that your state requires. For a list of these data type correspondences, see Figure 1-1 on page 1-11. Information on how to handle state management for a single-valued state is provided below.

  3. Register the aggregate support functions with the CREATE FUNCTION statement.
  4. Specify the built-in SQL data type as the data type of the state parameters and return values in the function signatures of the aggregate support functions.

The database server can perform memory management for a single-valued state because it can determine the size of a built-in data type. Therefore, you do not need to allocate memory for a single-valued state in the INIT support function.

In the ITER function, you can initialize or update a single-valued state in either of the following ways:

For more information, see When to Allocate and Deallocate a State.

For a single-valued state, the FINAL support function does not need to perform any state-management tasks. However, it must convert to the data type of the final aggregate state to that of the aggregate result. For more information, see Returning an Aggregate Result Different from the Aggregate State.

Managing an Opaque-Type State

An opaque-type state uses an opaque data type to hold the aggregate state. A possible use for an opaque-type state include an aggregate state that contains more information than the aggregate-argument data type or a built-in data type can hold. The size of an aggregate state that can be implemented as an opaque-type state is limited by the maximum size of an opaque type.

Important: The maximum size of an opaque type is system dependent. On many systems, this limit is 32 kilobytes. However, consult your Machine Notes for the limit on your system. If your aggregate state might contain more data than the opaque-type limit, you must use a pointer-valued state instead. For more information, see Managing a Pointer-Valued State.

To use an opaque-type state for a UDA, write the appropriate aggregate support functions so that they handle an opaque-type state.

Declare the state parameters and return values of the aggregate support functions to use the internal format of the opaque type. This internal format is usually a C struct structure.For more information, see Determining the Internal Format.

Handling State Management for an Opaque-Type State

Register the opaque data type in the database with the CREATE OPAQUE TYPE statement.

Once you register the opaque data type, the database server can obtain information about the data type of the state value when the CREATE FUNCTION statement registers the function signatures of the aggregate support functions.

You need to write the opaque-type support functions only if you need such functionality in the aggregate support functions. For example, the input and output support functions might be useful when you debug your UDA. If you do write opaque-type support functions, you must compile and link them into a shared-object module, as Compiling a C UDR describes.

Register the aggregate support functions with the CREATE FUNCTION statement.

Specify the registered opaque type as the data type of the state parameters and return values in the function signatures of the aggregate support functions.

The database server can perform memory management for an opaque-type state because it can determine the size of the opaque data type. The CREATE OPAQUE TYPE statement registers the opaque type, including its size, in the system catalog tables of the database. From the system catalog tables, the database server can determine the size of the aggregate state to allocate. Therefore, you do not need to allocate the opaque-type state in the INIT support function.

In the ITER function, you can initialize or update an opaque-type state in either of the following ways:

For more information, see When to Allocate and Deallocate a State. For an example of a user-defined aggregate that uses an opaque-type state, see the description of the PERCENT_GTR aggregate on page 13-64.

Managing a Pointer-Valued State

A pointer-valued state uses the POINTER data type as the aggregate state. The mi_pointer data type is the DataBlade API type that represents the SQL data type, POINTER. (For more information, see POINTER Data Type.) Use a pointer-valued state when an aggregate state might contains more information than can fit into the maximum opaque-type size.

Important: The maximum size of an opaque type is system dependent. On many systems, this limit is 32 kilobytes. However, consult your Machine Notes for the limit on your system. If your aggregate state contains less data than the opaque-type limit, you use an opaque-type state instead. For more information, see Managing an Opaque-Type State.

To use a pointer-valued state for a UDA, follow these steps:

  1. Write the appropriate aggregate support functions so that they handle a pointer-valued state.
  2. Declare the state parameters and return values of the aggregate support functions to use the mi_pointer data type. Information on how to handle state management of a pointer-valued state is provided below.

  3. Register the aggregate support functions with the CREATE FUNCTION statement.
  4. Specify the POINTER data type for the state parameters and return values in the function signatures of the aggregate support functions.

The database server cannot perform state management for a pointer-valued state because it cannot determine the size of the state. The DataBlade API data type mi_pointer is a typedef for the following C data type:

Because this data type is only a pointer, the database server cannot determine how large the aggregate state is. Therefore, it cannot allocate the PER_COMMAND system-allocated state buffer. In this case, the INIT and FINAL aggregate support functions are not optional. They must perform state management of the non-simple aggregate state, as follows:

When to Allocate and Deallocate a State

To each invocation of the ITER support function, the database server automatically passes a pointer to the state buffer. When you need to initialize or update the state information, the ITER function can handle the modification in either of the following ways.

Changing the State State Memory Duration Results
Merge the aggregate argument into the existing state in-place and return the existing state. The existing state has a PER_COMMAND memory duration:
  • For single-valued and opaque-type states, this state is the system-allocated state buffer.
  • For a pointer-valued state, this state is a user-allocated state buffer.
  • The new state value is at the address that the database server has passed into the ITER function. The ITER function then returns this address as the updated state. Because the state memory has a PER_COMMAND memory duration, the database server can re-use the same state for subsequent invocations of ITER.
    Allocate fresh memory for a new state, merge the existing state with the new aggregate argument into this state, and return this new state. The new state has a PER_ROUTINE memory duration:
  • For a single-valued state, this state can be either a declared local variable or user-allocated PER_ROUTINE memory.
  • For an opaque-type state, the new state must be user-allocated PER_ROUTINE memory.
  • For a pointer-valued state, this state is user-allocated memory with either a PER_ROUTINE or PER_COMMAND memory duration. However, for PER_COMMAND memory, you must also handle deallocation of the old state. For more information, see Managing a Pointer-Valued State.
  • The new state value is at the address of the new state. The ITER function then returns the address of the new state as the updated state. Because this memory has a PER_ROUTINE memory duration, the database server must copy the returned state back into the PER_COMMAND buffer.

    The new state method can be slower than the in-place method because the database server must perform the additional copy operation. Informix recommends that you design your ITER support function to use the in-place method whenever possible. When the database server can skip the copy operation, you can improve performance of your UDA.

    To determine which of these methods was used in the ITER support function, the database server compares the state value that ITER returns and the state value that was passed into ITER. If these two pointers identify the same memory location, the ITER function has modified the state in-place. Therefore, the database server does not need to perform the copy operation. If these two pointers identify different memory locations, the database server proceeds with the copy operation.

    Aggregate support functions have the following restrictions on the deallocation of an aggregate state:

    Implementing a Set-Up Argument

    You can define a UDA so that the end user can supply a set-up argument to the aggregate. The set-up argument can customize the aggregate for a particular invocation. For example, the PERCENT_GTR user-defined aggregate (see page 13-64) determines the percentage of numbers greater than a particular value. The UDA could have been implemented so that the value to compare against is hardcoded into the UDA. However, this would mean a separate user-defined aggregate that checks for values greater than 10, another that checks for values greater than 15, and so on.

    Instead, the PERCENT_GTR aggregate accepts the value to compare against as a set-up argument. In this way, the end user can determine what values are needed, as follows:

    The database server passes in the set-up argument as the second argument to the INIT function. Therefore, the INIT support function must handle the set-up argument. Usually, this handling involves performing any initial processing required for the value and then saving this value in the aggregate state. It might also check for a possible SQL NULL value as a set-up argument.

    This set-up argument is optional, in the sense that you can define a UDA with one or without one. However, if you define your UDA to include a set-up argument, the end user must provide a value for this argument. When the UDA is invoked with two arguments (aggregate argument and set-up argument), the database server looks for an INIT function with two arguments. If you omit the set-up argument when you invoke the UDA, the database server looks for an INIT function with just one argument.

    To indicate no set-up argument, the end user can provide the SQL NULL value as a set-up value. However, if you really want to make the set-up argument truly optional for the end user, you must create and register two INIT functions:

    As the writer of the UDA, you need to decide whether this feature is useful.

    Returning an Aggregate Result Different from the Aggregate State

    The aggregate result is the value that the user-defined aggregate returns to the calling SQL statement. If the user-defined aggregate does not include a FINAL support function exists, the database server returns the final aggregate state; that is, it returns the value of the aggregate state after the last aggregate iteration. However, if your UDA needs to return a value whose data type is different from the aggregate state, use a FINAL support function to convert the final aggregate state to the data type that you want to return from the aggregate.

    For example, the PERCENT_GTR user-defined aggregate (see page 13-64) returns the percentage of values greater than some value as a percentage; that is, as a fixed-point number in the range 0.00 to 100.00. To handle integer values, the user-defined aggregate would require an aggregate state that holds the following values:

    However, the aggregate result of PERCENT_GTR is a fixed-point number. Therefore, you would not want the aggregate to return the final state to the calling SQL statement. Instead, the FINAL support function needs to perform the following steps:

    For the complete example of the PERCENT_GTR user-defined aggregate, see page 13-64.

    Executing a User-Defined Aggregate in Parallel Queries

    The database server can break up the aggregation computation into several pieces and compute them in parallel. Each piece is computed sequentially as follows:

    1. The INIT support function initializes execution in the parallel thread.
    2. For each aggregate argument in the subset, the ITER support function merges the aggregate argument into a partial result.

    The database server then calls the COMBINE support function to merge the partial states, two at a time, into a final state. For example, for the AVG built-in aggregate, the COMBINE function would add the two partial sums and adds the two partial counts. Finally, the database server calls the FINAL support function on the final state to generate the aggregate result.

    Figure 13-18 shows the execution sequence of aggregate support functions for a user-defined aggregate that is executed in two parallel threads.

    Figure 13-18
    Parallel Execution of a UDA

    Figure 13-18 shows how the COMBINE function is used to execute a user-defined aggregate with two parallel threads. For more than two parallel threads, the database server calls COMBINE on two thread states to obtain one, combines this state with another thread state, and so on until it has processed all parallel threads. The database server makes the decision whether to parallelize execution of a user-defined aggregate and the degree of such parallelism. However, these decisions are invisible to the end user.

    Parallel aggregation must give the same results as an aggregate that is not computed in parallel. Therefore, you must write the COMBINE function so that the result of aggregating over the entire group of selected rows is the same as aggregating over two partitions of the group separately and then combining the results.

    For an example of COMBINE functions in user-defined aggregates, see "Sample User-Defined Aggregates."

    Sample User-Defined Aggregates

    This section provides the sample user-defined aggregates that the following table describes.

    User-Defined Aggregate Description
    SUMSQ Squares each value and calculates the sum of these squared values
    SQSUM2 Sums all values and calculates the square of this sum
    PERCENT_GTR Determines the percentage of values greater than a user-specified value
    X_PERCENTILE Determines the value in a group of values that is the x-percentile, where x is a percent that the end user specifies.

    Each description includes the aggregate support functions written C and the SQL statements to define the user-defined aggregate in the database.

    SUMSQ User-Defined Aggregate

    The SUMSQ user-defined aggregate squares each value and calculates the sum of these squared values. It has the following algorithm:

    where each xi is one column value; that is, one aggregate argument.

    To determine the aggregate state for SUMSQ, examine what information needs to be available for each iteration of the aggregate. To perform one iteration of SUMSQ, the ITER function must:

    1. Square the aggregate argument.
    2. The ITER function has access to the aggregate argument because the database server passes it in. Therefore, ITER does not require additional information to perform this step.

    3. Add the squared argument to the partial sum of previous squared values.
    4. To add in the squared argument, the aggregate must keep a partial sum of the previous squared values. For the ITER function to have access to the partial sum from the previous iterations, the aggregate state must contain it.

    The SUMSQ has a simple state because the data type of the partial sum is the same as that of the aggregate argument. For example, when the SUMSQ aggregate receives INTEGER values, this partial sum is also an INTEGER value. Therefore, SUMSQ can allow the database server to manage this state, which has the following effect on the design of the aggregate support functions:

    Before the iterations begin, the partial sum needs to be initialized to zero (0). However, because the INIT function is not required for state management, this aggregate initializes the state in the first invocation of its ITER function. The ITER function then calculates the square of a single aggregate argument, and adds this value to a partial sum. When the last iteration is reached, the final partial sum is the value that the SUMSQ aggregate returns. Therefore, the SUMSQ algorithm does not require a FINAL function for post-iteration tasks.

    Figure 13-19 shows the required aggregate support functions that handle an INTEGER argument for the SUMSQ user-defined aggregate.

    Figure 13-19
    Aggregate Support Functions for SUMSQ on INTEGER

    The following SQL statement registers the SUMSQ user-defined aggregate in the database:

    This CREATE AGGREGATE statement lists only the aggregate support functions that are required to implement SUMSQ: ITER and COMBINE.

    Suppose that the ITER and COMBINE aggregate support functions for the SUMSQ aggregate are compiled and linked into a shared-object module called sumsq.

    On a Solaris system, the executable code for the SUMSQ aggregate support functions would be in a shared library called sumsq.so.

    Figure 13-20 shows the CREATE FUNCTION statements that register the aggregate support functions for SUMSQ to handle INTEGER aggregate arguments.

    Figure 13-20
    Registering the SUMSQ
    Aggregate Support Functions for INTEGER

    For the tab1 table, which Figure 13-7 on page 13-23 defines, the following query uses the new SUMSQ aggregate function on the INTEGER column, col3:

    With the rows that Figure 13-7 has inserted, the preceding query yields an INTEGER value of 2173. To be able to use SUMSQ on other data types, you need to ensure that the appropriate aggregate support functions exist for this data type.

    SQSUM2 User-Defined Aggregate

    The SQSUM2 user-defined aggregate is another version of the SQSUM1 aggregate, which Figure 13-8 on page 13-24 describes. Its algorithm is the same as SQSUM1, that is:

    where each xi is one column value; that is, one aggregate argument.

    However, the SQSUM2 aggregate takes advantage of the fact that this aggregate has a simple state. Because the database server automatically handles state management, the SQSUM2 aggregate can safely omit the INIT function.

    Figure 13-21 shows the aggregate support functions that handle an INTEGER argument for the SQSUM2 user-defined aggregate.

    Figure 13-21
    Aggregate Support Functions for SQSUM2 on INTEGER

    In its first invocation, the ITER function performs the state initialization. It then takes a single aggregate argument and adds it to a partial sum. For aggregates with a simple state, the COMBINE can be the same as the ITER function. Therefore, this COMBINE function just calls iter_sumsq2() to perform the merge of two partial states.

    Tip: The ITER function in Figure 13-21 could use the binary operator plus() to perform the addition. This operator is already defined on the INTEGER data type and therefore would not need to be written or registered. To use plus() in ITER, you would need to ensure that it is defined for the data type on which the SQSUM2 aggregate is defined.

    The data type of the aggregate result is also the same as the aggregate state. Therefore, SQSUM2 is a simple binary operator and the FINAL support function is not needed to convert the data type of the final state. However, the SQSUM2 aggregate still does require a FINAL support function. The SQSUM2 algorithm involves a post-iteration calculation: it must square the final sum to obtain the aggregate return value. The FINAL function performs this final calculation and returns it as the aggregate result for the SQSUM2 aggregate.

    Suppose that the ITER, COMBINE, and FINAL aggregate support functions for the SQSUM2 aggregate are compiled and linked into a shared-object module called sqsum.

    On a Solaris system, the executable code for the SQSUM2 aggregate support functions would be in a shared library called sqsum.so.

    Once you have successfully compiled and linked the aggregate support functions, you can define the SQSUM2 aggregate in the database. Figure 13-22 shows the CREATE AGGREGATE statement that registers the SQSUM2 user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions.

    Figure 13-22
    Registering the SQSUM2 User-Defined Aggregate

    Figure 13-23 shows the CREATE FUNCTION statements that register the SQSUM2 aggregate support functions for the aggregate argument of the INTEGER data type.

    Figure 13-23
    Registering the SQSUM2
    Aggregate Support Functions for INTEGER

    In Figure 13-23, the CREATE FUNCTION that registers the ITER support function requires the HANDLESNULLS routine modifier because the aggregate does not have an INIT support function.

    For the tab1 table, which Figure 13-7 on page 13-23 defines, the following query uses the new SQSUM2 aggregate function on the INTEGER column, col3:

    With the rows that Figure 13-7 has inserted, the preceding query yields an INTEGER value of 10201, which is the same value that the SQSUM1 aggregate returned for these same rows.

    Now, suppose that you want to define the SQSUM2 user-defined aggregate on the complexnum_t named row type, which Figure 13-5 on page 13-17 defines. This version of SQSUM2 must have the same aggregate support functions as the version that handles INTEGER (see Figure 13-22 on page 13-60).

    Aggregate Support Function SQL Function Name C Function Name
    ITER iter_sqsum2() iter_sqsum2_complexnum()
    COMBINE combine_sqsum2() combine_sqsum2_complexnum()
    FINAL final_sqsum2() final_sqsum2_complexnum()

    The following code shows the aggregate support functions that handle a complexnum_t named row type as an argument for the SQSUM2 user-defined aggregate.

    Figure 13-24 shows the CREATE FUNCTION statements that register the SQSUM2 aggregate support functions for an aggregate argument of the complexnum_t data type.

    Figure 13-24
    Registering the SQSUM2
    Aggregate Support Functions for the complexnum_t Named Row Type

    The following query uses the SQSUM2 aggregate function on the complexnum_t column, col2:

    With the rows that Figure 13-7 on page 13-23 has inserted, the preceding query yields a complexnum_t value of:

    PERCENT_GTR User-Defined Aggregate

    Suppose you want to create a user-defined aggregate that determines the percentage of values greater than some user-specified value and returns this percentage as a fixed-point number in the range 0 to 100. The implementation of this UDA uses the following aggregate features:

    The PERCENT_GTR user-defined aggregate needs the following state information:

    Therefore, it uses the following C structure, called percent_state_t, to hold the aggregate state:

    Because the size of the percent_state_t structure never exceeds the maximum opaque-type size, PERCENT_GTR can use an opaque-type state to hold its aggregate state. The following code shows the INIT aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:

    This INIT function performs the following tasks:

    The following code implements the ITER aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:

    The PERCENT_GTR aggregate is defined to handle NULL values (see Figure 13-25 on page 13-68). Therefore, this ITER function must check for a possible NULL aggregate argument. It converts any NULL value to a zero (0) so that the numeric comparison can occur.

    The following COMBINE aggregate support function handles an INTEGER argument for the PERCENT_GTR aggregate:

    Because PERCENT_GTR does not have a simple state, its COMBINE function must explicitly perform the merging of two parallel threads, as follows:

    The following code shows the FINAL aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:

    The PERCENT_GTR aggregate returns a data type is different from aggregate state. Therefore, the FINAL function must convert the final state from the aggregate-state data type (percent_state_t) to the aggregate-result data type (DECIMAL).

    Once you have successfully compiled and linked the aggregate support functions, you can define the PERCENT_GTR aggregate in the database. For a user-defined aggregate that uses an opaque-type state, this definition includes the following steps:

    The PERCENT_GTR aggregate uses a fixed-length opaque type, percent_state_t, to hold its opaque-type state. The following CREATE OPAQUE TYPE statement registers this opaque type:

    The INTERNALLENGTH modifier specifies the size of the fixed-length C data structure, percent_state_t, that holds the opaque-type state.

    Figure 13-25 shows the CREATE AGGREGATE statement that defines the PERCENT_GTR user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions. It also includes the HANDLESNULLS modifier to indicate that the PERCENT_GTR aggregate does process NULL values as aggregate arguments. By default, the database server does not pass a NULL value to an aggregate.

    Figure 13-25
    Registering the PERCENT_GTR User-Defined Aggregate

    Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the PERCENT_GTR aggregate are compiled and linked into a shared-object module called percent.

    On a Solaris system, the executable code for the PERCENT_GTR aggregate support functions would be in a shared library called percent.so.

    The following CREATE FUNCTION statements register the PERCENT_GTR aggregate support functions for an aggregate argument of the INTEGER data type:

    These CREATE FUNCTION statements register an SQL name for each of the aggregate support functions that you have written in C. They must all include the HANDLESNULLS routine modifier because the PERCENT_GTR aggregate handles NULL values.

    The following query uses the PERCENT_GTR aggregate function on the INTEGER column, col3, to determine the percentage of values greater than 25:

    With the rows that Figure 13-7 on page 13-23 has inserted, the preceding query yields a DECIMAL(5,2) value of 33.33 percent: 2 of the 6 values are greater than 20 (24 and 31).

    X_PERCENTILE User-Defined Aggregate

    Suppose you want to create a user-defined aggregate that calculates the x-percentile for a group of values. The x-percentile is the number within the group of values that separates x% of the values below and (100-x)% above. The median is a special case of the x-percentile. It represents the 50th-percentile:

    That is, the above aggregate returns the value within a sample of y values that has an equal number of values (50%) above and below it in the sample.

    The implementation of this UDA uses the following aggregate features:

    The X_PERCENTILe user-defined aggregate needs the following state information:

    Therefore, X_PERCENTILE uses a C structure called percentile_state_t to hold the aggregate state:

    Important: The percentile_state_t structure stores the values processes in an in-memory array within the state. You could also choose to store these values elsewhere, such as in an operating-system file or in a separate location in memory. Each of these locations has advantages and disadvantages. Choose the structure that best fits your application needs.

    The size of the percentile_state_t structure depends on the number of aggregate arguments stored in the value_array array; that is, values less then or equal to the MAX_N constant. On a system with 4-byte mi_integer values, the size of this structure is:

    If X_PERCENTILE used an opaque-type state, this structure must be less than the maximum opaque-type size. For systems that have a 32 kilobyte maximum opaque-type size, the X_PERCENTILE aggregate could use an opaque-type state as long as it is called in a query that finds 8190 or fewer rows. If the query finds more than 8190 rows, the state would not fit into an opaque type. To avoid this restriction, X_PERCENTILE implements the aggregate state as a pointer-valued state.

    The following code shows the INIT aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:

    This INIT support function performs the following tasks:

    The following code implements the ITER aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:

    The ITER support function updates the aggregate state in-place with the following information:

    The ITER function also handles a possible NULL-valued aggregate argument. Because the X_PERCENTILE aggregate is defined to handle NULL values (see Figure 13-26 on page 13-76), the database server calls ITER for NULL-valued aggregate arguments.

    The following COMBINE aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:

    This COMBINE support function merges two aggregate states, as follows:

    The following FINAL aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:

    This FINAL support function performs the following tasks:

    Once you have successfully compiled and linked the aggregate support functions, you can define the PERCENT_GTR aggregate in the database. Figure 13-26 shows the CREATE AGGREGATE statement that defines the X_PERCENTILE user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions. It also includes the HANDLESNULLS modifier to indicate that the PERCENT_GTR aggregate does process NULL values as aggregate arguments. By default, the database server does not pass a NULL value to an aggregate.

    Figure 13-26
    Registering the X_PERCENTILE User-Defined Aggregate

    Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the X_PERCENTILE aggregate are compiled and linked into a shared-object module called percent.

    On a Solaris system, the executable code for the X_PERCENTILE aggregate support functions would be in a shared library called percent.so.

    The following CREATE FUNCTION statements register the X_PERCENTILE aggregate support functions for an aggregate argument of the INTEGER data type:

    These CREATE FUNCTION statements use the SQL data type, POINTER, to indicate that the aggregate support functions accept a generic C pointer and perform their own memory management. They must all include the HANDLESNULLS routine modifier because the X_PERCENTILE aggregate handles NULL values.

    The following query uses the X_PERCENTILE aggregate function on the INTEGER column, col3, to determine the quartile (the 25th-percentile) for the values in col3:

    For the tab1 rows that Figure 13-7 on page 13-23 has inserted, X_PERCENTILE creates the following sorted list for the col3 values:

    Because 25% of 6 values is 1.5, X_PERCENTILE obtains the item from the data set that has 2 values (1.5 rounded up to the nearest integer) below it. Therefore, the preceding query returns 13 as the quartile for col3.

    Now suppose you add the following row to the tab1 table:

    This INSERT statement adds a NULL value to the col3 column. Because X_PERCENTILE handles NULLs, the database server calls the X_PERCENTILE aggregate on this new row as well. After this seventh row is inserted, X_PERCENTILE would generate the following sorted list for col3:

    Twenty-five percent of 7 values is 1.75. Therefore, X_PERCENTILE obtains the item from the data set that has 2 (1.75 truncated to the nearest integer) values below it. Now the quartile for col3 would be 9. However, if X_PERCENTILE was not registered with the HANDLESNULLS modifier, the database server would not call X_PERCENTILE for this newest row and the quartile for col3 would have been 13 (the quartile for 6 rows, even though col3 actually has 7 rows).

    Notice also that if you called the X_PERCENTILE aggregate with an x-percentile that would return the first value in the list (the NULL value), the FINAL support function uses the DataBlade API function mi_fp_setreturnisnull() to set the aggregate result to NULL. For example, suppose you execute the following query on the col3:

    This query asks for the 5%-tile for the seven values in col3. The first element Because 5% of 7 values is 0.35, X_PERCENTILE obtains the item from the data set that has zero values (0.35 truncated to the nearest integer) below it. Therefore, the preceding query returns NULL as the quartile for col3. The ITER function has stored NULL values as zeros in the sorted value_array. For the FINAL support function to determine when a value of zero indicates a NULL and when it indicates zero, it checks the value_is_null array. If the zero indicates a NULL value, FINAL calls the DataBlade API function mi_fp_setreturnisnull() to set the aggregate result to NULL.


    Informix DataBlade API Programmer's Manual, Version 9.2
    Copyright © 1999, Informix Software, Inc. All rights reserved