Home | Previous Page | Next Page   Creating User-Defined Aggregates > Creating User-Defined Aggregates >

Support Functions

The CREATE AGGREGATE statement expects information about four support functions. The following table summarizes these support functions. You must provide support functions for each data type that will use the aggregate.

Function Type
Purpose
INIT
Initializes the data structures required for computing the aggregate
ITER
Merges a single (row) value with the previous partial result
COMBINE
Merges one partial result with another partial result, thus allowing parallel execution of the aggregate
FINAL
Converts the partial result into the final value

It can perform clean-up operations and release resources.

You can write the support functions in SPL, C, or Java. For information about SPL, refer to the IBM Informix: Guide to SQL Syntax. For information about writing functions in external languages, refer to the IBM Informix: DataBlade API Programmer's Guide or the IBM Informix: J/Foundation Developer's Guide.

The following CREATE AGGREGATE statement registers the SUMSQ aggregate with support functions named init_func, iter_func, combine_func, and final_func. You can register an aggregate even though you have not yet written the support functions.

CREATE AGGREGATE sumsq
    (INIT = init_func,
     ITER = iter_func,
     COMBINE = combine_func,
     FINAL = final_func);

When you create a user-defined aggregate, you must overload each support function to provide for each data type on which the aggregate will operate. That is, if you create a new aggregate, SUMSQ, whose iterator function is iter_func, you must overload the iter_func function for each applicable data type. Aggregate names are not case sensitive. When you create and use an aggregate, you can use either uppercase or lowercase.

INIT Function

The INIT function initializes the data structures required by the rest of the computation of the aggregate. For example, if you write a C function, the INIT function can set up large objects or temporary files for storing intermediate results. The INIT function returns the initial result of the aggregate, which is of the state type.

The INIT function can take one or two arguments. The first argument must be the same type as the column that is aggregated. The database server uses the type of the first argument to resolve overloaded INIT functions.

C and JAVA Language Support

The first argument of the INIT function is a dummy argument and always has a null value. Therefore, all functions that serve as INIT functions must be defined with the HANDLESNULLS modifier.

End of C and JAVA Language Support
Omitting the INIT Function

You can omit the INIT function for simple binary operators whose state type is the same as the type of the first argument of the aggregate. In that case, the database server uses null as the initial result value.

Using the Optional Second Argument

You can use the optional second argument of the INIT function as a setup argument to customize the aggregate computation. For example, you could prepare an aggregate that would exclude the N largest and N smallest values from its calculation of an average. In that case, the value of N would be the second argument of the aggregate expression.

The setup expression must come from the group-by columns because the value of the setup should remain the same throughout the computation of the aggregate.

C Language Support

The setup expression cannot be a lone host variable reference.

End of C Language Support

ITER Function

The iteration function, ITER, merges a single value with a partial result and returns a partial result. The ITER function does the main job of processing the information from each row that your query selects. For example, for the AVG aggregate, the ITER function adds the current value to the current sum and increments the row count by one.

The ITER function is required for all user-defined aggregates. If no INIT function is defined for a user-defined aggregate, the ITER function must explicitly handle nulls.

The ITER function obtains the state of the aggregate computation from its state argument.

SPL routines handle null arguments by default. In C and Java functions, you must explicitly handle null values in the ITER function and register the function with the HANDLESNULLS modifier.

C Language Support

The ITER function should not maintain additional states in its FPARAM structure because the FPARAM structure is not shared among support functions. However, you can use the FPARAM structure to cache information that does not affect the aggregate result.

End of C Language Support

FINAL Function

The FINAL function converts the internal result to the result type that it returns to the user. For example, for the AVG aggregate, the FINAL function returns the current sum divided by the current row count.

The FINAL function is not required for aggregates that are derived from simple binary operators whose result type is the same as the state type and the column type. If you do not define a FINAL function, the database server simply returns the final state.

C and JAVA Language Support

The FINAL function can perform cleanup work to release resources that the INIT function allocated. However, it must not free the state itself.

End of C and JAVA Language Support

COMBINE Function

The COMBINE function merges one partial result with another partial result and returns the updated partial result. For example, for the AVG aggregate, the COMBINE function adds the two partial results and adds the two partial counts.

If the aggregate is derived from a simple binary operator whose result type is the same as the state type and the column type, the COMBINE function can be the same as the ITER function. For example, for the AVG aggregate, the COMBINE function adds the current sum and the row count of one partial result to the same values for another partial result and returns the new values.

The database server uses the COMBINE function for parallel execution. When a query includes an aggregate, the database server uses parallel execution when the query includes only aggregates. However, the COMBINE function might also be used even when a query is not parallelized. For example, when a query contains both distinct and nondistinct aggregates, the database server can decompose the computation of the nondistinct aggregate into subaggregates based on the distinct column values. Therefore, you must provide a COMBINE function for each user-defined aggregate.

Parallel aggregation must give the same results as an aggregate that is not computed in parallel. You must write the COMBINE function so that the result of aggregating over a set of rows is the same as aggregating over two partitions of the set separately and then combining the results.

C and JAVA Language Support

The COMBINE function can perform clean-up work to release resources that the INIT function allocated. However, it must not free the state arguments.

End of C and JAVA Language Support
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]