Writing an Aggregate Function

Informix DataBlade API Programmer's Manual
Creating Special-Purpose UDRs

Writing an Aggregate Function

An aggregate is a function that returns one value for a group of queried rows. The aggregate function performs one iteration for each of the queried rows. For each row, an aggregate iteration receives one column value (called the aggregate argument). The value that the aggregate returns to the SQL statement is called the aggregate result. For example, the following query calls the built-in SUM aggregate to determine the total cost of item numbers in order 1002:

SELECT SUM(total_price) FROM items
WHERE order_num = 1002

For this invocation of the SUM aggregate, each value of the total_price column that is passed into SUM is one aggregate argument. The total sum of all total_price values, which SUM returns to the SELECT, is the aggregate result.

The database server supports extension of aggregates in the following ways:

Extensions of built-in aggregates

User-defined aggregates

You can write C user-defined functions to implement these aggregate extensions. For an overview of how to create aggregate functions and how to write them in an SPL routine, see the chapter on this topic in the Extending Informix Dynamic Server 2000 manual. The following sections provide information specific to the creation of aggregate functions as C user-defined functions.

Tip: The Informix BladeSmith development tool automatically generates C source code for a user-defined aggregate as well as the SQL statements to register the aggregate function. For more information, see the "DataBlade Developers Kit User's Guide."

Extending a Built-In Aggregate

To extend a built-in aggregate function with a C user-defined function, follow these steps:

Determine the appropriate operator function that you must overload to implement the desired built-in aggregate function.

For a list of built-in aggregate functions and the associated operator functions to overload, see the chapter on aggregate functions in the Extending Informix Dynamic Server 2000 manual.

Write the C UDR that implements the required operator function for the data type that you want the aggregate to handle.

To extend built-in aggregates so that they handle user-defined data types, write an operator function that accepts the user-defined data type as an argument. Compile the C UDR and link it into a shared-object file.

Use the newly extended aggregate on the data.

Suppose you want to be able to use the SUM aggregate on complex numbers, which are stored in the following user-defined data type: a named row type called complexnum_t. Figure 13-5 shows the CREATE ROW TYPE statement that registers the complexnum_t named row type.


	Figure 13-5 A Named Row Type to Hold a Complex Number

The following sections show how to extend the SUM aggregate on the complexnum_t named row type.

Choosing the Operator Function

The SUM built-in aggregate function uses the plus operator (+), which the plus() user-defined function implements. The database server provides implementations of the plus() function over the built-in data types. Therefore, the SUM aggregate function works over built-in data types. To have the SUM aggregate operate on the complexnum_t row type, you implement a plus() function that handles this named row type; that is, it adds the two parts of the complex number and returns a complex number with the summed parts.

The following C function, complex_plus(), defines such a plus() function:

MI_ROW *complex_plus(arg1, arg2)
	MI_ROW *arg1;
	MI_ROW *arg2;

Writing the Operator Function

The code segment shows the implementation of the complex_plus() function, which implements a plus() function for the complexnum_t data type:

MI_ROW *complex_plus(arg1, arg2, fparam)
	MI_ROW *arg1;
	MI_ROW *arg2;
	MI_FPARAM *fparam;

{
	mi_real real_zero, imag_zero = 0.0;
	mi_real *real_value1, *real_value2;
	mi_integer real_len1, real_len2;
	mi_real *imag_value1, *imag_value2;
	mi_integer imag_len1, imag_len2;

	mi_real sum_real, sum_imag;

	MI_CONNECTION *conn;
	MI_TYPEID *type_id;
	MI_ROW_DESC *row_desc;

	mi_integer i;
	MI_ROW *ret_row;
	MI_DATUM values[2];
	mi_boolean nulls[2] = {MI_FALSE, MI_FALSE};

	for ( i=0; i<=1; i++ )
		{
		if ( mi_fp_argisnull(fparam, i) == MI_TRUE )
			{
			/* Put initialized complex number into 'values'
			** array */
			values[0] = (MI_DATUM)&real_zero;
			values[1] = (MI_DATUM)&imag_zero;

			/* Generate initialized row type for arg1 */
			conn = mi_open(NULL, NULL, NULL);
			type_id = mi_typestring_to_id(conn,
				"complexnum_t");
			row_desc = mi_row_desc_create(type_id);

			ret_row = mi_row_create(conn, row_desc, values,
				nulls);

			if ( i == 0 )
				arg1 = ret_row;
			else
				arg2 = ret_row;
			}
		}

	/* Extract values from arg1 row type */
	mi_value_by_name(arg1, "real_part", 
		(MI_DATUM *)&real_value1, &real_len1);
	mi_value_by_name(arg1, "imaginary_part", 
		(MI_DATUM *)&imag_value1, &imag_len1);

	/* Extract values from arg2 row type */
	mi_value_by_name(arg2, "real_part", 
		(MI_DATUM *)&real_value2, &real_len2);
	mi_value_by_name(arg2, "imaginary_part", 
		(MI_DATUM *)&imag_value2, &imag_len2);

	/* Sum the complex numbers */
	sum_real = *real_value1 + *real_value2;
	sum_imag = *imag_value1 + *imag_value2;

	/* Put sum into 'values' array */
	values[0] = (MI_DATUM)&sum_real;
	values[1] = (MI_DATUM)&sum_imag;

	/* Generate return row type */
	conn = mi_open(NULL, NULL, NULL);
	type_id = mi_typestring_to_id(conn, "complexnum_t");
	row_desc = mi_row_desc_create(type_id);
	ret_row = mi_row_create(conn, row_desc, values, nulls);

	return (ret_row);
}

This version of the plus() function performs the following tasks:

Checks for a NULL-valued state (which indicates the first invocation of the ITER function) to initialize the state.

Checks for a NULL-valued aggregate argument to initialize a NULL the argument.

Accepts the two complexnum_t arguments as row-type pointers (MI_ROW *) and uses the mi_value_by_name() function to extract the individual fields of the row type from the arguments.

Calculates the sum of the complex numbers by adding the real values together and the imaginary values together.

Creates an MI_ROW type to hold the complexnum_t value with the final sum: the mi_row_desc_create() function creates a row descriptor for the complexnum_t data type and the mi_row_create() function populates the associated row structure with the final sum values.

Returns a pointer to the MI_ROW structure (because a row type must be returned by reference, not by value).

Once the complex_plus() function is written, you compile it and put it into a shared-object file. Suppose that complex_plus() is compiled and linked into a shared-object module called sqsum.

On a Solaris system, the executable code for the complex_plus() operator function would be in a shared library called sqsum.so.

For more information, see Compiling a C UDR.

To extend a built-in aggregate over a user-defined data type, you overload the appropriate operator function to handle the user-defined type. However, operator functions can also be used as part of an expression that does not involve aggregates. Therefore, aggregate support functions for built-in aggregates on user-defined data types (opaque types, distinct types, and named row types) must allocate a new state when they need to modify the state.

For example, the following SUM aggregate uses the overloaded plus() operator to calculate the sum of values in the col1 column:

SELECT SUM(col1) FROM tab2 WHERE ....

For each aggregate argument, the SUM aggregate invokes the plus() operator to add the aggregate argument (agg_arg) into the sum of the previous values in the aggregate state (agg_state), as follows:

plus(agg_state, agg_arg)

When you modify the aggregate state in-place, the value of the agg_state argument to plus() changes. When plus() exits, the agg_state argument holds the new sum of the aggregate arguments, which includes the agg_arg value.

However, the plus() function is also valid in expressions that do not involve aggregates, as in the following query:

SELECT col1 FROM tab2 WHERE col1 + 4 > 17

In this WHERE clause, the database server invokes the plus() operator to add 4 to the col1 value, as follows:

plus(col1, 4)

If the plus() operator modifies the aggregate state in-place, the value of its first argument changes to hold the sum of col1 and 4. It is not safe to modify arguments in place because the values of arguments (col1 and 4) must not change. Therefore, when you modify the aggregate state in an operator function of a built-in aggregate, you must be careful not to use the "in-place" modification method.

Registering the Overloaded Operator Function

With the operator function written, compiled, and linked into a shared-object file, you can register this function in the database with the CREATE FUNCTION statement. You must have the appropriate privileges for this registration to be successful. For more information, see the chapter on user-defined aggregates in the Extending Informix Dynamic Server 2000.

Figure 13-6 shows the CREATE FUNCTION statement that overloads the plus() function with a new version that handles the complexnum_t named row type.


	Figure 13-6 Registering the Overloaded plus() Function

Tip: Because SUM is a built-in aggregate, you do not have to use the CREATE AGGREGATE statement to define the SUM aggregate.

Using the Extended Aggregate

Once you execute the CREATE FUNCTION statement in Figure 13-6 on page 13-22, you can use the SUM aggregate on complexnum_t columns. For example, suppose you create the tab1 table as Figure 13-7 shows.


	Figure 13-7 A Table With a complexnum_t Column

The following query uses the SUM aggregate function on the complexnum_t column, col2:

SELECT SUM(col2) FROM tab1

With the rows that Figure 13-7 has inserted, the preceding query yields a complexnum_t value of:

ROW(28.6, 34.7)

As a side effect of the new plus() function, you can also add two complexnum_t columns in an SQL expression, as follows:

SELECT complex_num1 + complex_num2 FROM complex_nums
WHERE id > 6

Creating a User-Defined Aggregate

The built-in aggregates provide basic aggregations. However, if your data requires some special aggregation, you can create a custom aggregate function, called a user-defined aggregate. To implement your custom aggregation, you design an aggregate algorithm, which consists of the following parts:

The aggregate state contains the information that needs to be passed between iterations of the aggregate.

The aggregation tasks, which are implemented as special-purpose user-defined functions called aggregate support functions.

To implement a user-defined aggregate function with C user-defined functions, follow these steps:

Determine the content and data type of the aggregate state.

Write the C UDRs that implements the required aggregate support functions for the data type on which you want to implement the user-defined aggregate.

Define the user-defined aggregate in the database with the CREATE AGGREGATE and CREATE FUNCTION statements.

Once you have completed these steps, you can use the aggregate in an SQL statement.

The following sections describe each of these development steps in more detail and use the SQSUM1 user-defined aggregate (which Figure 13-8 describes) as an example.

User-Defined Aggregate	Description	Definition
SQSUM1	Sums all values and calculates the square of this sum	(x₁+ x₂ + x₃+ ... )²

Figure 13-8
A Sample User-Defined Aggregate

User-Defined Aggregate Description Definition

SQSUM1 Sums all values and calculates the square of this sum (x₁+ x₂ + x₃+ ... )²

Determining the Aggregate State

An aggregate is a series of iterations. Each iteration processes one aggregate argument (which contains one column value) and performs the necessary computations to merge it into a partial result. The partial result is a snapshot of the aggregate arguments that the aggregate has merged so far. Once the aggregate has received all column values, it returns a value to the calling statement, based on the final partial result.

Each iteration of the aggregate is a separate invocation of a user-defined function. If user-allocated memory has the default PER_ROUTINE memory duration, the database server automatically deallocates it after only one iteration of the iteration function. (For more information, see Determining Memory Duration.) Therefore, while an iteration executes, it can access only the following information:

Its own local variables, which are deallocated at the end of each iteration and therefore unavailable to other iterations

The aggregate argument, which contains a new column value for each iteration

The aggregate state, which contains any non-local information that the iteration needs to perform its merge, including the partial result and any other external information (such as an operating-system file) that an iteration might need to access

During its invocation, each aggregate iteration merges the aggregate argument into the aggregate state and returns the updated state. The database server preserves the updated aggregate state and passes it into the next iteration of the aggregate. When you create a user-defined aggregate, you must determine what non-local information each iteration needs and then define an aggregate state to contain this information. Otherwise, the aggregate iterations cannot obtain the information they need to perform their computations.

Important: Design the aggregate state so that each aggregate support function can obtain all the state information that it needs.

As a starting point, try using the data type of the aggregate argument for the aggregate state. Such a state is called a simple state. For more information, see Aggregate Support Functions for the Aggregate State.

For example, to determine the state for the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes), assess what tasks need to be performed in each iteration of SQSUM1. For each aggregate argument, SQSUM1 needs to add together the following values:

The aggregate argument, which is passed to each iteration of the aggregate

The partial sum of previous argument values, which must exist in the aggregate state

The data type that you choose for the aggregate state affects how the state must be managed. When the SQSUM1 aggregate receives INTEGER values as its aggregate arguments, the sum of these values is also an INTEGER value. Therefore, the SQSUM1 aggregate has an integer state, which holds the partial sum.

Writing the Aggregate Support Functions

An aggregate support function is a special-purpose user-defined function that implements some task in the aggregate algorithm. You write aggregate support functions to initialize, calculate, and return the aggregate result to the calling code. An aggregate algorithm can include the following kinds of aggregate support functions.


Aggregate Support Function	Algorithm Step
INIT	Possible initialization tasks that must be performed before the iterations can begin
ITER	An iteration step, which is performed on each aggregate argument and merges this argument into a partial result
COMBINE	Merging one partial result with another partial result, thus allowing parallel execution of the user-defined aggregate
FINAL	Post-iteration tasks that must be performed after all aggregate arguments were merged into a partial result

The following sections summarize each of the aggregate support functions.

INIT Function

The INIT aggregate support function performs the initialization for the user-defined aggregate. Figure 13-9 summarizes possible initialization tasks.

Initialization Task	For More Information
Set-up for any additional resources outside the state that the aggregation might need	Aggregate Support Functions that the Algorithm Requires
Initial calculations that the user-defined aggregate might need	Aggregate Support Functions that the Algorithm Requires
Allocation and initialization of the aggregate state that the rest of the aggregation computation might need	Aggregate Support Functions for the Aggregate State
Handling of an optional set-up argument	Implementing a Set-Up Argument

Figure 13-9
Initializations Tasks for the INIT Aggregate Support Function

Initialization Task For More Information

Set-up for any additional resources outside the state that the aggregation might need Aggregate Support Functions that the Algorithm Requires

Initial calculations that the user-defined aggregate might need Aggregate Support Functions that the Algorithm Requires

Allocation and initialization of the aggregate state that the rest of the aggregation computation might need Aggregate Support Functions for the Aggregate State

Handling of an optional set-up argument Implementing a Set-Up Argument

The INIT support function is an optional aggregate support function. If your aggregate algorithm does not require any of the tasks in Figure 13-9, you do not need to define an INIT function. When you omit the INIT function from your user-defined aggregate, the database server performs the state management for the aggregate state. For more information, see Handling a Simple State.

To declare an INIT support function as a C function, use the following syntax:

agg_state init_func(dummy_arg, set_up_arg)
	agg_arg_type dummy_arg;
	set_up_type set_up_arg; /* optional */
 



agg_state

is the data type of the aggregate state.


dummy_arg

is a dummy parameter that has the same data type as the aggregate argument that this aggregate support function is to handle for the user-defined aggregate.


init_func

is the name of the INIT aggregate support function.


set_up_arg

is an optional parameter for the set-up argument. For more information, see Implementing a Set-Up Argument.

agg_state	is the data type of the aggregate state.
dummy_arg	is a dummy parameter that has the same data type as the aggregate argument that this aggregate support function is to handle for the user-defined aggregate.
init_func	is the name of the `INIT` aggregate support function.
set_up_arg	is an optional parameter for the set-up argument. For more information, see Implementing a Set-Up Argument.

In the execution of a UDA, the database server calls the INIT function before it begins the actual aggregation computation. It passes in any optional set-up argument (set_up_arg) from the user-defined aggregate and copies any initialized aggregate state that INIT returns into the state buffer. For more information on the state buffer, see Aggregate Support Functions for the Aggregate State.

The first argument of the INIT function serves only to identify the data type of the aggregate argument that the user-defined aggregate handles. The routine manager uses this argument in routine resolution to determine the correct version of the overloaded INIT function. At the time of invocation, the routine manager just passes in a NULL value for the first argument of the INIT function, as the following syntax shows:

agg_state init_func(NULL, optional set-up argument)

Tip: For more information on how the routine manager resolves the overloaded aggregate support functions, see the chapter on aggregates in the "Extending Informix Dynamic Server 2000."

Figure 13-10 shows the INIT aggregate support function that handles an INTEGER argument for the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes).


	Figure 13-10 INIT Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-11 on page 13-30, Figure 13-12 on page 13-32, and Figure 13-14 on page 13-34.

ITER Function

The ITER aggregate support function performs the sequential aggregation or iteration for the user-defined aggregation. It merges a single aggregate argument into the partial result, which the aggregate state contains. The ITER function is a required aggregate support function; that is, you must define an ITER function for every user-defined aggregate. If you do not define an ITER function for your user-defined aggregate, the database server generates an error.

Tip: If the UDA does not have an INIT support function, the ITER support function can initialize the aggregate state. For more information, see Handling a Simple State.

If the UDA was registered with the HANDLESNULLS modifier in the CREATE AGGREGATE statement, the database server calls the ITER support function once for each aggregate argument that passed to the user-defined aggregate. Each aggregate argument is one column value. If you omit the HANDLESNULLS modifier from CREATE AGGREGATE, the database server does not call ITER for any NULL-valued aggregate arguments. Therefore, NULL-valued aggregate arguments do not contribute to the aggregate result.

For example, suppose you execute the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes) in the following query:

SELECT SQSUM1(col3) FROM tab1

The tab1 table (which Figure 13-7 on page 13-23 defines) contains 6 rows. Therefore, the preceding query (which contains no WHERE clause) causes 6 invocations of the SQSUM1 ITER function. Each invocation of this ITER function processes one value from the col3 column.

To declare a ITER function as a C function, use the following syntax:

agg_state iter_func(current_state, agg_argument)
	agg_state current_state;
	agg_arg_type agg_argument;
 



agg_state

is the data type of the current and updated aggregate state. After the ITER function merges the agg_argument into the current_state state, it returns a pointer to the updated state. 


agg_arg_type

is the data type of the agg_argument, which is the data type that aggregate support function handles for the user-defined aggregate.


agg_argument

is a single aggregate argument, usually a column value, which the ITER function merges into the partial result in the current_state aggregate state.


current_state

is the current aggregate state, which previous calls to the ITER function and to the INIT function have generated.


iter_func

is the name of the ITER aggregate support function.










Important: Make sure that the ITER support function obtains all state information that it needs from its "current_state" argument. The INIT function cannot maintain additional state information as user data in its MI_FPARAM structure because MI_FPARAM is not shared among the other aggregate support functions. However, the ITER function can store user data in MI_FPARAM that is not part of the aggregate result.

agg_state	is the data type of the current and updated aggregate state. After the `ITER` function merges the agg_argument into the current_state state, it returns a pointer to the updated state.
agg_arg_type	is the data type of the agg_argument, which is the data type that aggregate support function handles for the user-defined aggregate.
agg_argument	is a single aggregate argument, usually a column value, which the `ITER` function merges into the partial result in the current_state aggregate state.
current_state	is the current aggregate state, which previous calls to the `ITER` function and to the `INIT` function have generated.
iter_func	is the name of the `ITER` aggregate support function.

Figure 13-11 shows the ITER aggregate support function that handles an INTEGER argument for the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes).


	Figure 13-11 ITER Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-10 on page 13-28, Figure 13-12 on page 13-32, and Figure 13-14 on page 13-34.

COMBINE Function

The COMBINE aggregate support function allows your user-defined aggregate to execute in a parallel query. When a query that contains a user-defined aggregate is processed in parallel, each parallel thread operates on a one subset of selected rows. The COMBINE function merges the partial results from two such subsets. This aggregate support function ensures that the result of aggregating over a group of rows sequentially is the same as aggregating over two subsets of the rows in parallel and then combining the results.

The COMBINE function is required for parallel execution. When a query includes a user-defined aggregate, the database server uses parallel execution when the query includes only aggregates. However, the COMBINE function might be used even when a query is not parallelized. For example, when a query contains both distinct and non-distinct aggregates, the database server can decompose the computation of the non-distinct aggregate into sub-aggregates based on the distinct column values. Therefore, you must provide a COMBINE function for every user-defined aggregate.

If you do not define an COMBINE function for your user-defined aggregate, the database server generates an error. However, if your user-defined aggregate uses a simple state, the COMBINE function can be the same as the ITER function. For more information, see Handling a Simple State.

To declare a COMBINE function as a C function, use the following syntax:

agg_state combine_func(agg_state1, agg_state2)
	agg_state agg_state1, agg_state2;
 



agg_state

is the data type of the two partial aggregate states (agg_state1 and agg_state2) as well as the updated aggregate state, which the COMBINE function returns.


agg_state1

is the aggregate state from one parallel thread.


agg_state2

is the aggregate state from the second parallel thread.


combine_func

is the name of the COMBINE aggregate support function.

agg_state	is the data type of the two partial aggregate states (agg_state1 and agg_state2) as well as the updated aggregate state, which the `COMBINE` function returns.
agg_state1	is the aggregate state from one parallel thread.
agg_state2	is the aggregate state from the second parallel thread.
combine_func	is the name of the `COMBINE` aggregate support function.

In the execution of a UDA, the database server calls the COMBINE once for each pair of threads (agg_state1 and agg_state2) that execute a parallel query that contains the user-defined aggregate. When the COMBINE function combines two partial results, it might also need to release resources associated with one of the partial results.

Figure 13-12 shows the COMBINE aggregate support function that handles an INTEGER argument for the SQSUM user-defined aggregate (which Figure 13-8 on page 13-24 describes).


	Figure 13-12 COMBINE Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-10 on page 13-28, Figure 13-11 on page 13-30, and Figure 13-14 on page 13-34. For more information on parallel execution of a UDA, see Executing a User-Defined Aggregate in Parallel Queries.

FINAL Function

The FINAL aggregate support function performs the post-iteration tasks for the user-defined aggregate. Figure 13-13 summarizes possible post-iteration tasks.

Post-Iteration Task	For More Information
Type conversion of the final state into the return type of the user-defined aggregate	Returning an Aggregate Result Different from the Aggregate State
Post-iteration calculations that the user-defined aggregate might need	Aggregate Support Functions that the Algorithm Requires
Deallocation of memory that the INIT aggregate support function has allocated	Managing a Pointer-Valued State

Figure 13-13
Post-Iteration Tasks for the FINAL Aggregate Support Function

Post-Iteration Task For More Information

Type conversion of the final state into the return type of the user-defined aggregate Returning an Aggregate Result Different from the Aggregate State

Post-iteration calculations that the user-defined aggregate might need Aggregate Support Functions that the Algorithm Requires

Deallocation of memory that the INIT aggregate support function has allocated Managing a Pointer-Valued State

The FINAL function is an optional aggregate support function. If your user-defined aggregate does not require one of the tasks in Figure 13-13, you do not need to define a FINAL function. When you omit the FINAL function from your user-defined aggregate, the database server returns the final aggregate state as the return value of the user-defined aggregate. If this state does not match the expected data type of the aggregate return value, the database server generates a data-type-mismatch error.

Important: In general, the FINAL support function must not deallocate the aggregate state. Only for a pointer-valued state (in which the aggregate support functions must handle all state management) does the FINAL support function need to deallocate the state. For more information, see Managing a Pointer-Valued State.

To declare a FINAL function as a C function, use the following syntax:

agg_type final_func(final_state)
	agg_state final_state;
 



agg_type

is the data type of the aggregate result, which is what the user-defined aggregate returns to the SQL statement in which it was invoked.


final_state

is the final aggregate state, as previous calls to the INIT and ITER functions have generated.


final_func

is the name of the FINAL aggregate support function.

agg_type	is the data type of the aggregate result, which is what the user-defined aggregate returns to the SQL statement in which it was invoked.
final_state	is the final aggregate state, as previous calls to the `INIT` and `ITER` functions have generated.
final_func	is the name of the `FINAL` aggregate support function.

In the execution of a UDA, the database server calls the FINAL function after all iterations of the ITER function are complete.

Figure 13-14 shows the aggregate support functions that handle an INTEGER argument for the SQSUM user-defined aggregate (which Figure 13-8 on page 13-24 describes).


	Figure 13-14 FINAL Aggregate Support Function for SQSUM1 on INTEGER

For other aggregate support functions of SQSUM1, see Figure 13-10 on page 13-28, Figure 13-11 on page 13-30, and Figure 13-12 on page 13-32.

Defining the User-Defined Aggregate

To define a user-defined aggregate in a database, you perform these SQL steps:

The development steps for a UDA list the definition of the UDA after the aggregate support functions are written. However, the CREATE AGGREGATE statement does not verify that the aggregate support functions it lists were registered nor does the CREATE FUNCTION statement verify that the executable C code exists. Therefore, you can define the UDA before you create the C implementation of the aggregate support functions. However, you must ensure that the names of the C functions match the names in the CREATE FUNCTION statements that register them.

Figure 13-15 shows the CREATE AGGREGATE statement that defines the SQSUM1 user-defined aggregate (which Figure 13-8 on page 13-24 describes) in the database.


	Figure 13-15 Registering the SQSUM1 User-Defined Aggregate

Suppose that the INIT, ITER, COMBINE and FINAL aggregate support functions for the SQSUM1 aggregate are compiled and linked into a shared-object module called sqsum.

On a Solaris system, the executable code for the SQSUM1 aggregate support functions would be in a shared library called sqsum.so.

Figure 13-16 shows the CREATE FUNCTION statements that register the aggregate support functions for the SQSUM1 user-defined aggregate to handle the INTEGER data type.


	Figure 13-16 Registering the Aggregate Support Functions for SQSUM1 to Handle INTEGER

Notice that the registered names of the aggregate support functions must match the names that the CREATE AGGREGATE statement lists. For example, in Figure 13-16, a CREATE FUNCTION statement registers the INIT support function as init_sqsum1, which is the same name that the INIT option lists in the CREATE AGGREGATE statement (see Figure 13-15 on page 13-35).

In addition, the CREATE FUNCTION for the INIT support function must include the HANDLESNULLS routine modifier so that the database server can pass the INIT function the dummy NULL-valued argument.

For more information on the use of the CREATE AGGREGATE and CREATE FUNCTION statements to define user-defined aggregates, see the chapter on aggregates in the Extending Informix Dynamic Server 2000. For the syntax of these statements, see the Informix Guide to SQL: Syntax.

Using the User-Defined Aggregate

Once you have completed the aggregate-development steps, the end user can use the user-defined aggregate on the defined data type in SQL statements. However, use of the user-defined aggregate does assume the appropriate privileges.

For the tab1 table, which Figure 13-7 on page 13-23 defines, the following query uses the new SQSUM1 aggregate function on the INTEGER column, col3:

SELECT SQSUM1(col3) FROM tab1

With the rows that Figure 13-7 has inserted, the preceding query yields an INTEGER value of 10201.

To be able to use SQSUM1 on other data types, you need to ensure that the appropriate aggregate support functions exist for this data type. For example, SQSUM2 User-Defined Aggregate shows the definition of a version of the SQSUM aggregate on both an INTEGER and a named row type.

Determining Required Aggregate Support Functions

Figure 13-17 shows the execution sequence of aggregate support functions for a user-defined aggregate that is not executed in a parallel query.


	Figure 13-17 Execution of a Non-Parallel UDA

Tip: For information about how to execute a user-defined aggregate in parallel queries, see Executing a User-Defined Aggregate in Parallel Queries.

Notice how these aggregate support functions use the aggregate state to pass shared information between themselves.

As you design your aggregate algorithm, you must determine which of the support functions the algorithm requires. As a minimum, the user-defined aggregate must have an ITER function. It is the ITER function that performs a single iteration of the aggregate on one aggregate argument. Although Figure 13-17 on page 13-37 shows the execution of both the INIT and FINAL support functions, these functions are optional for a user-defined aggregate. In addition, the COMBINE function, though required, often does not require separate code; it can simply call the ITER function.

Writing aggregate support functions that your user-defined aggregate does not require means unnecessary coding and execution time. Therefore, it is important to assess your aggregate for required functions. The following table shows the design decisions in the determination of required aggregate support functions.


Design Decision	Aggregate Support Functions Involved	For More Information
Does the algorithm require initialization or clean-up tasks?	INIT and FINAL	Aggregate Support Functions that the Algorithm Requires
Does the aggregate have a simple aggregate state?	INIT, COMBINE, and FINAL	YES: Handling a Simple State NO: Handling a Non-Simple State
Does the aggregate have a set-up argument?	INIT	Implementing a Set-Up Argument
Does the aggregate return a value whose data type is different from the aggregate state?	FINAL	Returning an Aggregate Result Different from the Aggregate State
Does the aggregate have special needs to run in a parallel query?	COMBINE	Executing a User-Defined Aggregate in Parallel Queries

You can overload the aggregate support functions to provide support for different data types. However, any overloaded version of the UDA cannot:

omit any of the aggregate support functions that the CREATE AGGREGATE statement has listed.

use any support function that CREATE AGGREGATE has not specified.

When the database server executes a UDA (regardless of the data type of the aggregate argument), the database expects to find all the aggregate support functions that the CREATE AGGREGATE statement has registered. Therefore, if you omit a support function for one of the reasons in the preceding table, all versions of the aggregate for all data types must be able to execute using only the aggregate support functions that CREATE AGGREGATE specifies.

Aggregate Support Functions that the Algorithm Requires

To implement a user-defined aggregate, you must develop an algorithm that calculates an aggregate return value based on the aggregate-argument values. You need to break this algorithm into the following steps.


Algorithm Step	Aggregate Support Function
Calculations or initializations that must be done before the iterations can begin	INIT
Calculations that must be done on each aggregate argument to merge this argument into a partial result	ITER
Post-iteration tasks must be performed after all aggregate arguments were merged into a partial result	FINAL

All aggregate algorithms must include an ITER function to handle each aggregate argument. However, the INIT and FINAL support functions are optional. To determine whether your algorithm requires an INIT or FINAL function, make the following design assessments:

Are there calculations or initializations that must be done before the iterations can begin?

If the algorithm requires additional resources to perform its task (such as operating-system files or smart large objects), use the INIT function to set up these resources. The INIT function can also initialize the partial result.

Are there post-iteration tasks that must be performed after all aggregate arguments were merged into a partial result?

If the INIT function has set up resources to perform the aggregation, the FINAL function can deallocate or close resources so that they are free for other users. In addition, if the aggregation requires calculations that must be performed on the final partial result, use the FINAL function to perform these calculations.

For example, the following algorithm defines the SQSUM1 aggregate (which Figure 13-8 on page 13-24 describes):

(x₁+ x₂ + ... )²

where each x_i is one column value; that is, one aggregate argument. The ITER function for SQSUM1 takes a single aggregate argument and adds it to a partial sum (see Figure 13-11 on page 13-30). The algorithm does not require initialization of additional resources. Therefore, no INIT function is required for this task. However, the INIT function can initialize the partial result (see Figure 13-10 on page 13-28).

The SQSUM1 user-defined aggregate does require post-iteration calculations. When the last iteration is reached, the partial sum needs to be squared to obtain the aggregate return value. This final calculation is performed in a FINAL function and returned as the return value for the SQSUM1 aggregate (see Figure 13-14 on page 13-34).

The SUMSQ user-defined aggregate (described on page 13-56) is an example of a user-defined aggregate that requires neither initialization nor post-iteration tasks. Therefore, it does not require the INIT and FINAL support functions.

Aggregate Support Functions for the Aggregate State

The aggregate support functions pass information about the aggregate among themselves in the aggregate state.

Tip: For an explanation of the aggregate state, see Determining the Aggregate State.

The database server invokes all aggregate support functions as regular user-defined routines. Therefore, each support function has a default memory duration of PER_ROUTINE, which means that the database server frees any memory that the support function allocates when that function terminates. However, the aggregate state must be valid across all invocations of all aggregate support functions, including possible multiple iterations of the ITER function. Therefore, a special state buffer (with a PER_COMMAND memory duration) must exist so that the aggregate state is available to all aggregate support functions. The database server then passes this state buffer to each invocation of an aggregate support function.

The purpose of the INIT support function is to return a pointer to the initialized aggregate state. To determine whether your aggregate state requires an INIT support function for state management, assess the data type of this state. The aggregate-state data type determines whether the INIT function must handle state management, as follows:

If the data type of the aggregate state is the same as the aggregate argument, the aggregate has a simple state.

If your user-defined aggregate uses a simple state, the database server can perform the state management. It can automatically allocate the state buffer for the aggregate state. Therefore, the INIT support function does not need to handle this allocation.

If the data type of the aggregate state is different from the aggregate argument, the aggregate has a non-simple state.

There are several types of non-simple states possible. The database server cannot perform state management for these non-simple states. Instead, the INIT support functions must perform the state-management tasks to handle non-simple states.

Once the user-defined aggregate has an aggregate state, the database server passes a pointer to this state buffer to each invocation of the ITER function and to the FINAL function.

Handling a Simple State

A simple state is an aggregate state whose data type is the same as the aggregate argument. At any point in the iteration, a simple state contains only the partial result of the aggregation. For example, the SUM built-in aggregate uses a simple state because its state contains only the partial result: the running total of the aggregate arguments. When the SUM aggregate operates on INTEGER aggregate arguments, it creates an integer partial sum for these arguments. Therefore, the data type of its aggregate argument and aggregate state is the same.

However, the AVG built-in aggregate does not use a simple state. Because it must divide the total by the number of values processed, its state requires two values: the running total and the number of arguments processed. When the AVG aggregate operates on INTEGER aggregate arguments, it creates an integer partial sum and an integer count for these arguments. Therefore, the data type of its aggregate argument (INTEGER) cannot be not the same as its aggregate state (two INTEGER values).

When a user-defined aggregate has a simple state, you can take the following steps in your aggregate support functions:

The INIT aggregate support function does not need to allocate the aggregate state.

In this case, the database server automatically performs the state management. If a UDA with a simple state does not include any other tasks that require an INIT support function (see Figure 13-9 on page 13-27), you can omit the INIT function from the definition of the UDA. The only possible state-management task you might want to perform in the INIT function is state initialization. For more information, see When to Allocate and Deallocate a State.

The COMBINE aggregate support function can just call the ITER support function.

In this case, you do not have to create special code in the COMBINE function for the handling of parallel execution. Instead, the ITER function can perform the merge of two partial results. For more information, see Executing a User-Defined Aggregate in Parallel Queries.

The FINAL function is not required if the data type of the simple state is the same as the aggregate result.

In this case, the aggregate argument, aggregate state, and aggregate result have the same data type. Such a user-defined aggregate is called a simple binary operator. If a UDA is a simple binary operator and does not include any other tasks that require a FINAL support function (see Figure 13-13 on page 13-33), you can omit the FINAL function from the definition of the UDA. For more information, see Returning an Aggregate Result Different from the Aggregate State.

When a UDA does not include an INIT function, the database server takes the following state-management steps:

Allocate the PER_COMMAND state buffer to hold the aggregate state.

The database server can determine the size of the state buffer from the data type of the aggregate argument, which is passed into the user-defined aggregate.

Initialize this aggregate state to an SQL NULL value whose data type is the same as the aggregate argument.

Pass the NULL-valued aggregate state to the first invocation of the ITER support function.

The ITER support function can just use the system-allocated state buffer to hold the state information. If you have some minor initialization tasks that you need to perform, the ITER function can check for a NULL-valued aggregate state on its first iteration. If the state is NULL, ITER can initialize the state to its appropriate value. In this way, you can perform minor state initialization without the overhead of a separate invocation of the INIT function.

When a UDA does not include a FINAL function, the database server passes the final state as the aggregate result of the user-defined aggregate.

The implementation on the SQSUM1 aggregate includes an INIT support function that initializes the aggregate state (see Figure 13-10 on page 13-28). However, because SQSUM1 has a simple state, this INIT function is not required. Instead, an ITER function can check for a NULL-valued state and perform the state initialization. The ITER support function of the SQSUM2 aggregate (see Figure 13-21 on page 13-59) shows this type of implementation.

The SUMSQ user-defined aggregate (described on page 13-56) also has a simple state and therefore does not require an INIT support function for state management.

Handling a Non-Simple State

When the data type of the aggregate argument is not adequate for the state information, you must use a non-simple state for your UDA. A non-simple state is an aggregate state whose data type is not the same as the aggregate argument. Possible uses for a non-simple state include an aggregate state that contains:

more information than the aggregate-argument data type can hold.

information of a data type different than that of the aggregate argument.

When the aggregate-argument data type is not adequate for the state information, you must determine the appropriate data structure to hold the state information. This data structure depends upon the size of the aggregate state that you need to maintain, as the following table shows.


Non-Simple State	Description	For More Information
Single-valued state	Consists of information that can be stored in an Informix built-in data type	Managing a Single-Valued State
Opaque-type state	Consists of several values, but these values do not exceed the maximum size of an opaque data type	Managing an Opaque-Type State.
Pointer-valued state	Consists of values whose size does exceed the maximum size of an opaque data type	Managing a Pointer-Valued State.

If your user-defined aggregate uses a single-valued or opaque-type state, the database server can still perform the state management. However, it cannot provide all necessary state management for a pointer-valued state.

Managing a Single-Valued State

A single-valued state uses a built-in SQL data type to hold the aggregate state. Use a single-valued state for an aggregate state that can fit into a built-in data type but whose data type does not match that of the aggregate argument.

Tip: Built-in SQL data types are those data types that Informix provides. For more information, see the chapter on data types in the "Informix Guide to SQL: Reference."

To use a single-valued state for a UDA, follow these steps:

Write the appropriate aggregate support functions so that they handle a single-valued state.

Declare the state parameters and return values of the aggregate support functions to use the DataBlade API data type that corresponds to the built-in SQL data type that your state requires. For a list of these data type correspondences, see Figure 1-1 on page 1-11. Information on how to handle state management for a single-valued state is provided below.

Specify the built-in SQL data type as the data type of the state parameters and return values in the function signatures of the aggregate support functions.

The database server can perform memory management for a single-valued state because it can determine the size of a built-in data type. Therefore, you do not need to allocate memory for a single-valued state in the INIT support function.

In the ITER function, you can initialize or update a single-valued state in either of the following ways:

In-place state update

To modify the state, change the value (or values) of the DataBlade API variable that the database server has passed into the ITER function as the state argument.

Allocate a new state

To modify the state, declare a local variable or allocate PER_ROUTINE memory for a new variable and put the new values into this variable.

For more information, see When to Allocate and Deallocate a State.

For a single-valued state, the FINAL support function does not need to perform any state-management tasks. However, it must convert to the data type of the final aggregate state to that of the aggregate result. For more information, see Returning an Aggregate Result Different from the Aggregate State.

Managing an Opaque-Type State

An opaque-type state uses an opaque data type to hold the aggregate state. A possible use for an opaque-type state include an aggregate state that contains more information than the aggregate-argument data type or a built-in data type can hold. The size of an aggregate state that can be implemented as an opaque-type state is limited by the maximum size of an opaque type.

Important: The maximum size of an opaque type is system dependent. On many systems, this limit is 32 kilobytes. However, consult your Machine Notes for the limit on your system. If your aggregate state might contain more data than the opaque-type limit, you must use a pointer-valued state instead. For more information, see Managing a Pointer-Valued State.

To use an opaque-type state for a UDA, write the appropriate aggregate support functions so that they handle an opaque-type state.

Declare the state parameters and return values of the aggregate support functions to use the internal format of the opaque type. This internal format is usually a C struct structure.For more information, see Determining the Internal Format.

Handling State Management for an Opaque-Type State

Once you register the opaque data type, the database server can obtain information about the data type of the state value when the CREATE FUNCTION statement registers the function signatures of the aggregate support functions.

You need to write the opaque-type support functions only if you need such functionality in the aggregate support functions. For example, the input and output support functions might be useful when you debug your UDA. If you do write opaque-type support functions, you must compile and link them into a shared-object module, as Compiling a C UDR describes.

Specify the registered opaque type as the data type of the state parameters and return values in the function signatures of the aggregate support functions.

The database server can perform memory management for an opaque-type state because it can determine the size of the opaque data type. The CREATE OPAQUE TYPE statement registers the opaque type, including its size, in the system catalog tables of the database. From the system catalog tables, the database server can determine the size of the aggregate state to allocate. Therefore, you do not need to allocate the opaque-type state in the INIT support function.

In the ITER function, you can initialize or update an opaque-type state in either of the following ways:

In-place state update

To modify the state, change the values in the internal opaque-type structure that the database server has passed into the ITER function as an argument.

Allocate a new state

To modify the state, allocate PER_ROUTINE memory for a new internal opaque-type structure and put the new values into this structure.
If you need to initialize the internal structure of the opaque type, the INIT or ITER function can allocate PER_ROUTINE memory for the structure perform the appropriate initializations. When the support function exits, the database server copies the contents of this PER_ROUTINE structure into the PER_COMMAND system-allocated state buffer.

For more information, see When to Allocate and Deallocate a State. For an example of a user-defined aggregate that uses an opaque-type state, see the description of the PERCENT_GTR aggregate on page 13-64.

Managing a Pointer-Valued State

A pointer-valued state uses the POINTER data type as the aggregate state. The mi_pointer data type is the DataBlade API type that represents the SQL data type, POINTER. (For more information, see POINTER Data Type.) Use a pointer-valued state when an aggregate state might contains more information than can fit into the maximum opaque-type size.

Important: The maximum size of an opaque type is system dependent. On many systems, this limit is 32 kilobytes. However, consult your Machine Notes for the limit on your system. If your aggregate state contains less data than the opaque-type limit, you use an opaque-type state instead. For more information, see Managing an Opaque-Type State.

To use a pointer-valued state for a UDA, follow these steps:

Write the appropriate aggregate support functions so that they handle a pointer-valued state.

Declare the state parameters and return values of the aggregate support functions to use the mi_pointer data type. Information on how to handle state management of a pointer-valued state is provided below.

Specify the POINTER data type for the state parameters and return values in the function signatures of the aggregate support functions.

The database server cannot perform state management for a pointer-valued state because it cannot determine the size of the state. The DataBlade API data type mi_pointer is a typedef for the following C data type:

void *

Because this data type is only a pointer, the database server cannot determine how large the aggregate state is. Therefore, it cannot allocate the PER_COMMAND system-allocated state buffer. In this case, the INIT and FINAL aggregate support functions are not optional. They must perform state management of the non-simple aggregate state, as follows:

The INIT function can allocate and initialize the aggregate state.

The INIT function must also allocate any related resources that the aggregate state might need. Keep in mind that the database server does not interpret the contents of the pointer-valued state. It cannot manage any objects that the state type might reference. Therefore, use states with embedded pointers with caution.

The ITER function must perform an in-place update to initialize or modify a pointer-valued state.

Once you allocate the pointer-valued state, the database server passes a pointer to this state to the other aggregate support functions. Initialize or update the pointer-valued state only with an in-place update. For more information, see When to Allocate and Deallocate a State.

The FINAL function can handle deallocation of resources that the INIT function has set up.

For a pointer-valued state, the FINAL function must always deallocate the aggregate state. If your INIT support function has allocated related resources that the aggregate state uses, make sure that the FINAL function deallocates these resources.

Important: Make sure that you use a memory duration that extends for the life of the user-defined aggregate. A PER_ROUTINE memory duration (the default) expires after one invocation of the ITER function completes. Therefore, you must use a memory duration of at least PER_COMMAND for memory associated with the state.

When to Allocate and Deallocate a State

To each invocation of the ITER support function, the database server automatically passes a pointer to the state buffer. When you need to initialize or update the state information, the ITER function can handle the modification in either of the following ways.


Changing the State	State Memory Duration	Results
Merge the aggregate argument into the existing state in-place and return the existing state.	The existing state has a `PER_COMMAND` memory duration: For single-valued and opaque-type states, this state is the system-allocated state buffer. For a pointer-valued state, this state is a user-allocated state buffer.	The new state value is at the address that the database server has passed into the `ITER` function. The `ITER` function then returns this address as the updated state. Because the state memory has a `PER_COMMAND` memory duration, the database server can re-use the same state for subsequent invocations of `ITER`.
Allocate fresh memory for a new state, merge the existing state with the new aggregate argument into this state, and return this new state.	The new state has a `PER_ROUTINE` memory duration: For a single-valued state, this state can be either a declared local variable or user-allocated `PER_ROUTINE` memory. For an opaque-type state, the new state must be user-allocated `PER_ROUTINE` memory. For a pointer-valued state, this state is user-allocated memory with either a `PER_ROUTINE` or `PER_COMMAND` memory duration. However, for `PER_COMMAND` memory, you must also handle deallocation of the old state. For more information, see Managing a Pointer-Valued State.	The new state value is at the address of the new state. The `ITER` function then returns the address of the new state as the updated state. Because this memory has a `PER_ROUTINE` memory duration, the database server must copy the returned state back into the `PER_COMMAND` buffer.

The new state method can be slower than the in-place method because the database server must perform the additional copy operation. Informix recommends that you design your ITER support function to use the in-place method whenever possible. When the database server can skip the copy operation, you can improve performance of your UDA.

To determine which of these methods was used in the ITER support function, the database server compares the state value that ITER returns and the state value that was passed into ITER. If these two pointers identify the same memory location, the ITER function has modified the state in-place. Therefore, the database server does not need to perform the copy operation. If these two pointers identify different memory locations, the database server proceeds with the copy operation.

Aggregate support functions have the following restrictions on the deallocation of an aggregate state:

For any state other than a pointer-valued state, no aggregate support function must deallocate the state memory.

No aggregate support function can return a NULL-valued pointer as the state.

Implementing a Set-Up Argument

You can define a UDA so that the end user can supply a set-up argument to the aggregate. The set-up argument can customize the aggregate for a particular invocation. For example, the PERCENT_GTR user-defined aggregate (see page 13-64) determines the percentage of numbers greater than a particular value. The UDA could have been implemented so that the value to compare against is hardcoded into the UDA. However, this would mean a separate user-defined aggregate that checks for values greater than 10, another that checks for values greater than 15, and so on.

Instead, the PERCENT_GTR aggregate accepts the value to compare against as a set-up argument. In this way, the end user can determine what values are needed, as follows:

SELECT PERCENT_GTR(col1, 10) FROM tab1; -- values > 10
SELECT PERCENT_GTR(col1, 15) FROM tab1; -- values > 15

The database server passes in the set-up argument as the second argument to the INIT function. Therefore, the INIT support function must handle the set-up argument. Usually, this handling involves performing any initial processing required for the value and then saving this value in the aggregate state. It might also check for a possible SQL NULL value as a set-up argument.

This set-up argument is optional, in the sense that you can define a UDA with one or without one. However, if you define your UDA to include a set-up argument, the end user must provide a value for this argument. When the UDA is invoked with two arguments (aggregate argument and set-up argument), the database server looks for an INIT function with two arguments. If you omit the set-up argument when you invoke the UDA, the database server looks for an INIT function with just one argument.

To indicate no set-up argument, the end user can provide the SQL NULL value as a set-up value. However, if you really want to make the set-up argument truly optional for the end user, you must create and register two INIT functions:

One that takes two arguments

One that takes only one argument

In this case, you could assign the set-up argument some known default value.

As the writer of the UDA, you need to decide whether this feature is useful.

Returning an Aggregate Result Different from the Aggregate State

The aggregate result is the value that the user-defined aggregate returns to the calling SQL statement. If the user-defined aggregate does not include a FINAL support function exists, the database server returns the final aggregate state; that is, it returns the value of the aggregate state after the last aggregate iteration. However, if your UDA needs to return a value whose data type is different from the aggregate state, use a FINAL support function to convert the final aggregate state to the data type that you want to return from the aggregate.

For example, the PERCENT_GTR user-defined aggregate (see page 13-64) returns the percentage of values greater than some value as a percentage; that is, as a fixed-point number in the range 0.00 to 100.00. To handle integer values, the user-defined aggregate would require an aggregate state that holds the following values:

The total number of aggregate arguments greater than 10

The total number of aggregate arguments

The value to compare against

However, the aggregate result of PERCENT_GTR is a fixed-point number. Therefore, you would not want the aggregate to return the final state to the calling SQL statement. Instead, the FINAL support function needs to perform the following steps:

Divide the total number of arguments by the total that are greater than 10.

Return the fixed-point quotient, which is the percentage of values greater than 10.

For the complete example of the PERCENT_GTR user-defined aggregate, see page 13-64.

Executing a User-Defined Aggregate in Parallel Queries

The database server can break up the aggregation computation into several pieces and compute them in parallel. Each piece is computed sequentially as follows:

The INIT support function initializes execution in the parallel thread.

For each aggregate argument in the subset, the ITER support function merges the aggregate argument into a partial result.

The database server then calls the COMBINE support function to merge the partial states, two at a time, into a final state. For example, for the AVG built-in aggregate, the COMBINE function would add the two partial sums and adds the two partial counts. Finally, the database server calls the FINAL support function on the final state to generate the aggregate result.

Figure 13-18 shows the execution sequence of aggregate support functions for a user-defined aggregate that is executed in two parallel threads.


	Figure 13-18 Parallel Execution of a UDA

Figure 13-18 shows how the COMBINE function is used to execute a user-defined aggregate with two parallel threads. For more than two parallel threads, the database server calls COMBINE on two thread states to obtain one, combines this state with another thread state, and so on until it has processed all parallel threads. The database server makes the decision whether to parallelize execution of a user-defined aggregate and the degree of such parallelism. However, these decisions are invisible to the end user.

Parallel aggregation must give the same results as an aggregate that is not computed in parallel. Therefore, you must write the COMBINE function so that the result of aggregating over the entire group of selected rows is the same as aggregating over two partitions of the group separately and then combining the results.

For an example of COMBINE functions in user-defined aggregates, see "Sample User-Defined Aggregates."

Sample User-Defined Aggregates

This section provides the sample user-defined aggregates that the following table describes.


User-Defined Aggregate	Description
SUMSQ	Squares each value and calculates the sum of these squared values
SQSUM2	Sums all values and calculates the square of this sum
PERCENT_GTR	Determines the percentage of values greater than a user-specified value
X_PERCENTILE	Determines the value in a group of values that is the x-percentile, where x is a percent that the end user specifies.

Each description includes the aggregate support functions written C and the SQL statements to define the user-defined aggregate in the database.

SUMSQ User-Defined Aggregate

The SUMSQ user-defined aggregate squares each value and calculates the sum of these squared values. It has the following algorithm:

x₁²+ x₂² + ...

where each x_i is one column value; that is, one aggregate argument.

To determine the aggregate state for SUMSQ, examine what information needs to be available for each iteration of the aggregate. To perform one iteration of SUMSQ, the ITER function must:

Square the aggregate argument.

The ITER function has access to the aggregate argument because the database server passes it in. Therefore, ITER does not require additional information to perform this step.

Add the squared argument to the partial sum of previous squared values.

To add in the squared argument, the aggregate must keep a partial sum of the previous squared values. For the ITER function to have access to the partial sum from the previous iterations, the aggregate state must contain it.

The SUMSQ has a simple state because the data type of the partial sum is the same as that of the aggregate argument. For example, when the SUMSQ aggregate receives INTEGER values, this partial sum is also an INTEGER value. Therefore, SUMSQ can allow the database server to manage this state, which has the following effect on the design of the aggregate support functions:

The INIT support function does not need to perform state management.

An aggregate with a simple state does not need to explicitly handle the allocation and deallocation of the aggregate state. Instead, the database server automatically allocates the aggregate state and initializes it to NULL. Therefore, the INIT function does not require other INIT-function tasks (see Figure 13-9 on page 13-27). Therefore, this support function can safely be omitted from the aggregate definition.

The COMBINE support function can be the same as its ITER function.

No special processing is required to merge two partial states. The ITER function can adequately perform this merge.

Before the iterations begin, the partial sum needs to be initialized to zero (0). However, because the INIT function is not required for state management, this aggregate initializes the state in the first invocation of its ITER function. The ITER function then calculates the square of a single aggregate argument, and adds this value to a partial sum. When the last iteration is reached, the final partial sum is the value that the SUMSQ aggregate returns. Therefore, the SUMSQ algorithm does not require a FINAL function for post-iteration tasks.

Figure 13-19 shows the required aggregate support functions that handle an INTEGER argument for the SUMSQ user-defined aggregate.


	Figure 13-19 Aggregate Support Functions for SUMSQ on INTEGER

The following SQL statement registers the SUMSQ user-defined aggregate in the database:

CREATE AGGREGATE sumsq
	WITH (ITER = iter_sumsq,
		  COMBINE = combine_sumsq)

This CREATE AGGREGATE statement lists only the aggregate support functions that are required to implement SUMSQ: ITER and COMBINE.

Suppose that the ITER and COMBINE aggregate support functions for the SUMSQ aggregate are compiled and linked into a shared-object module called sumsq.

On a Solaris system, the executable code for the SUMSQ aggregate support functions would be in a shared library called sumsq.so.

Figure 13-20 shows the CREATE FUNCTION statements that register the aggregate support functions for SUMSQ to handle INTEGER aggregate arguments.


	Figure 13-20 Registering the SUMSQ Aggregate Support Functions for INTEGER

For the tab1 table, which Figure 13-7 on page 13-23 defines, the following query uses the new SUMSQ aggregate function on the INTEGER column, col3:

SELECT SUMSQ(col3) FROM tab1

With the rows that Figure 13-7 has inserted, the preceding query yields an INTEGER value of 2173. To be able to use SUMSQ on other data types, you need to ensure that the appropriate aggregate support functions exist for this data type.

SQSUM2 User-Defined Aggregate

The SQSUM2 user-defined aggregate is another version of the SQSUM1 aggregate, which Figure 13-8 on page 13-24 describes. Its algorithm is the same as SQSUM1, that is:

(x₁+ x₂ + ... )²

where each x_i is one column value; that is, one aggregate argument.

However, the SQSUM2 aggregate takes advantage of the fact that this aggregate has a simple state. Because the database server automatically handles state management, the SQSUM2 aggregate can safely omit the INIT function.

Figure 13-21 shows the aggregate support functions that handle an INTEGER argument for the SQSUM2 user-defined aggregate.


	Figure 13-21 Aggregate Support Functions for SQSUM2 on INTEGER

In its first invocation, the ITER function performs the state initialization. It then takes a single aggregate argument and adds it to a partial sum. For aggregates with a simple state, the COMBINE can be the same as the ITER function. Therefore, this COMBINE function just calls iter_sumsq2() to perform the merge of two partial states.

Tip: The ITER function in Figure 13-21 could use the binary operator plus() to perform the addition. This operator is already defined on the INTEGER data type and therefore would not need to be written or registered. To use plus() in ITER, you would need to ensure that it is defined for the data type on which the SQSUM2 aggregate is defined.

The data type of the aggregate result is also the same as the aggregate state. Therefore, SQSUM2 is a simple binary operator and the FINAL support function is not needed to convert the data type of the final state. However, the SQSUM2 aggregate still does require a FINAL support function. The SQSUM2 algorithm involves a post-iteration calculation: it must square the final sum to obtain the aggregate return value. The FINAL function performs this final calculation and returns it as the aggregate result for the SQSUM2 aggregate.

Suppose that the ITER, COMBINE, and FINAL aggregate support functions for the SQSUM2 aggregate are compiled and linked into a shared-object module called sqsum.

On a Solaris system, the executable code for the SQSUM2 aggregate support functions would be in a shared library called sqsum.so.

Once you have successfully compiled and linked the aggregate support functions, you can define the SQSUM2 aggregate in the database. Figure 13-22 shows the CREATE AGGREGATE statement that registers the SQSUM2 user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions.


	Figure 13-22 Registering the SQSUM2 User-Defined Aggregate

Figure 13-23 shows the CREATE FUNCTION statements that register the SQSUM2 aggregate support functions for the aggregate argument of the INTEGER data type.


	Figure 13-23 Registering the SQSUM2 Aggregate Support Functions for INTEGER

In Figure 13-23, the CREATE FUNCTION that registers the ITER support function requires the HANDLESNULLS routine modifier because the aggregate does not have an INIT support function.

For the tab1 table, which Figure 13-7 on page 13-23 defines, the following query uses the new SQSUM2 aggregate function on the INTEGER column, col3:

SELECT SQSUM2(col3) FROM tab1

With the rows that Figure 13-7 has inserted, the preceding query yields an INTEGER value of 10201, which is the same value that the SQSUM1 aggregate returned for these same rows.

Now, suppose that you want to define the SQSUM2 user-defined aggregate on the complexnum_t named row type, which Figure 13-5 on page 13-17 defines. This version of SQSUM2 must have the same aggregate support functions as the version that handles INTEGER (see Figure 13-22 on page 13-60).


Aggregate Support Function	SQL Function Name	C Function Name
ITER	iter_sqsum2()	iter_sqsum2_complexnum()
COMBINE	combine_sqsum2()	combine_sqsum2_complexnum()
FINAL	final_sqsum2()	final_sqsum2_complexnum()

The following code shows the aggregate support functions that handle a complexnum_t named row type as an argument for the SQSUM2 user-defined aggregate.

/* SQSUM2 ITER support function for complexnum_t */
MI_ROW *iter_sqsum2_complexnum(state, value, fparam)
	MI_ROW *state;
	MI_ROW *value;
	MI_FPARAM *fparam;
{
	/* Compute the new partial sum using the complex_plus() 
	** function. Put the sum in a new MI_ROW, which 
	** complex_plus() allocates (and returns a pointer to) */
	return (complex_plus(state, value, fparam));
}

/* SQSUM2 COMBINE support function for complexnum_t */
MI_ROW *combine_sqsum2_complexnum(state1, state2, fparam)
	MI_ROW *state1, *state2;
	MI_FPARAM *fparam;
{
	MI_ROW *ret_state;

	ret_state = 
		iter2_sqsum2_complexnum(state1, state2, fparam);

	mi_free(state1);
	mi_free(state2);

	return (ret_state);
}

/* SQSUM2 FINAL support function for complexnum_t */
MI_ROW *final_sqsum2_complexnum(state)
	MI_ROW *state;
{
	MI_CONNECTION *conn;
	MI_TYPEID *type_id;
	MI_ROW_DESC *row_desc;

	MI_ROW *ret_row;
	MI_DATUM values[2];
	mi_boolean nulls[2] = {MI_FALSE, MI_FALSE};

	mi_real *real_value, *imag_value;
	mi_integer real_len, imag_len;
	mi_real sqsum_real, sqsum_imag;

	/* Extract complex values from state row structure */
	mi_value_by_name(state, "real_part", 
		(MI_DATUM *)&real_value, &real_len);
	mi_value_by_name(state, "imaginary_part", 
		(MI_DATUM *)&imag_value, &imag_len);

	/* Calculate square of sum */
	sqsum_real = (*real_value) * (*real_value);
	sqsum_imag = (*imag_value) * (*imag_value);

	/* Put final result into 'values' array */
	values[0] = (MI_DATUM)&sqsum_real;
	values[1] = (MI_DATUM)&sqsum_imag;

	/* Generate return row type */
	conn = mi_open(NULL, NULL, NULL);
	type_id = mi_typestring_to_id(conn, "complexnum_t");
	row_desc = mi_row_desc_create(type_id);
	ret_row = mi_row_create(conn, row_desc, values, nulls);

	return (ret_row);
}

Figure 13-24 shows the CREATE FUNCTION statements that register the SQSUM2 aggregate support functions for an aggregate argument of the complexnum_t data type.


	Figure 13-24 Registering the SQSUM2 Aggregate Support Functions for the complexnum_t Named Row Type

The following query uses the SQSUM2 aggregate function on the complexnum_t column, col2:

SELECT SQSUM2(col2) FROM tab1

With the rows that Figure 13-7 on page 13-23 has inserted, the preceding query yields a complexnum_t value of:

ROW(817.96, 1204.09)

PERCENT_GTR User-Defined Aggregate

Suppose you want to create a user-defined aggregate that determines the percentage of values greater than some user-specified value and returns this percentage as a fixed-point number in the range 0 to 100. The implementation of this UDA uses the following aggregate features:

Use a set-up argument to allow the end user to specify the value to compare against.

Use an opaque-type state to hold the state information and initialize the state in the INIT support function.

Use a COMBINE function that must do more than just call the ITER support function.

Return an aggregate result whose data type is different from that of the aggregate argument.

Handle NULL values as aggregate arguments.

The PERCENT_GTR user-defined aggregate needs the following state information:

The user-specified set-up argument

The current number of values greater than the set-up argument

The current number of values processed

Therefore, it uses the following C structure, called percent_state_t, to hold the aggregate state:

typedef struct percent_state
	{
	mi_integer gtr_than;
	mi_integer total_gtr;
	mi_integer total;
	} percent_state_t;

Because the size of the percent_state_t structure never exceeds the maximum opaque-type size, PERCENT_GTR can use an opaque-type state to hold its aggregate state. The following code shows the INIT aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:

/* PERCENT_GTR INIT support function for INTEGER */
percent_state_t *init_percentgtr(dummy_arg, gtr_than,
		fparam)
	mi_integer dummy_arg;
	mi_integer gtr_than;
	MI_FPARAM *fparam;
{
	percent_state_t *state;

	/* Allocate PER_ROUTINE memory for state and initialize
	** it */
	state = mi_alloc(sizeof(percent_state_t));

	/* Check for a NULL-valued set-up argument */
	if ( mi_fp_argisnull(fparam, 1) )
		state->gtr_than = 0;
	else
		state->gtr_than = gtr_than;
	state->total_gtr = 0;
	state->total = 0;

	return (state);
}

This INIT function performs the following tasks:

Handles a set-up argument.

This set-up argument is the value that the end user specifies so that the aggregate knows which value to compare the aggregate arguments against. If the end user provides a NULL value for the set-up argument, PERCENT_GTR checks for values greater than zero (0).

Allocates PER_ROUTINE memory for the opaque-type state.

The INIT function does not need to allocate memory for an opaque-type state because the database server can perform the state management. However, because PERCENT_GTR already requires an INIT function to handle the set-up argument, INIT allocates a PER_ROUTINE percent_state_t structure so that it can initialize the opaque-type state.

The following code implements the ITER aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:

/* PERCENT_GTR ITER support function for INTEGER */
percent_state_t *iter_percentgtr(curr_state, agg_arg,
		fparam)
	percent_state_t *curr_state;
	mi_integer agg_arg;
	MI_FPARAM *fparam;
{
	if ( mi_fp_argisnull(fparam, 1) == MI_TRUE )
		agg_arg = 0;

	if ( agg_arg > curr_state->gtr_than )
		curr_state->total_gtr += 1;

	curr_state->total += 1;

	return (curr_state);
}

The PERCENT_GTR aggregate is defined to handle NULL values (see Figure 13-25 on page 13-68). Therefore, this ITER function must check for a possible NULL aggregate argument. It converts any NULL value to a zero (0) so that the numeric comparison can occur.

The following COMBINE aggregate support function handles an INTEGER argument for the PERCENT_GTR aggregate:

/* PERCENT_GTR COMBINE support function for INTEGER */
percent_state_t *combine_percentgtr(state1, state2)
	percent_state_t *state1;
	percent_state_t *state2;
{
	state1->total += state2->total;
	state1->total_gtr += state2->total_gtr;

	mi_free(state2);

	return(state1);
}

Because PERCENT_GTR does not have a simple state, its COMBINE function must explicitly perform the merging of two parallel threads, as follows:

It adds the two partial sums (total and total_gtr).

It deallocates the PER_COMMAND memory for the second parallel thread (merging of the two states was done "in-place" in state1).

The following code shows the FINAL aggregate support function that handles an INTEGER argument for the PERCENT_GTR aggregate:

/* PERCENT_GTR FINAL support function for INTEGER */
mi_decimal *final_percentgtr(final_state)
	percent_state_t *final_state;
{
	mi_double_precision quotient;
	mi_decimal return_val;
	mi_integer ret;

	quotient = 
		((mi_double_precision)(final_state->total_gtr)) / 
		((mi_double_precision)(final_state->total)) * 100;

	if ( (ret = deccvdbl(quotient, &return_val)) < 0 )
		ret = deccvasc("0.00", 4, &return_val);
		
	return (&return_val);
}

The PERCENT_GTR aggregate returns a data type is different from aggregate state. Therefore, the FINAL function must convert the final state from the aggregate-state data type (percent_state_t) to the aggregate-result data type (DECIMAL).

Once you have successfully compiled and linked the aggregate support functions, you can define the PERCENT_GTR aggregate in the database. For a user-defined aggregate that uses an opaque-type state, this definition includes the following steps:

Use CREATE OPAQUE TYPE to register the opaque type that holds the opaque-type state.

Use CREATE AGGREGATE to register the aggregate.

Use CREATE FUNCTION to register the aggregate support functions.

The PERCENT_GTR aggregate uses a fixed-length opaque type, percent_state_t, to hold its opaque-type state. The following CREATE OPAQUE TYPE statement registers this opaque type:

CREATE OPAQUE TYPE percent_state_t (INTERNALLENGTH = 12);

The INTERNALLENGTH modifier specifies the size of the fixed-length C data structure, percent_state_t, that holds the opaque-type state.

Figure 13-25 shows the CREATE AGGREGATE statement that defines the PERCENT_GTR user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions. It also includes the HANDLESNULLS modifier to indicate that the PERCENT_GTR aggregate does process NULL values as aggregate arguments. By default, the database server does not pass a NULL value to an aggregate.


	Figure 13-25 Registering the PERCENT_GTR User-Defined Aggregate

Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the PERCENT_GTR aggregate are compiled and linked into a shared-object module called percent.

On a Solaris system, the executable code for the PERCENT_GTR aggregate support functions would be in a shared library called percent.so.

The following CREATE FUNCTION statements register the PERCENT_GTR aggregate support functions for an aggregate argument of the INTEGER data type:

CREATE FUNCTION init_percent_gtr(dummy INTEGER, 
	gtr_val INTEGER)
RETURNING percent_state_t
WITH (HANDLESNULLS)
EXTERNAL NAME 
'/usr/udrs/aggs/percent/percent.so(init_percentgtr)'
LANGUAGE C;

CREATE FUNCTION iter_percent_gtr(state percent_state_t, 
one_value INTEGER)
RETURNS percent_state_t
WITH (HANDLESNULLS)
EXTERNAL NAME 
'/usr/udrs/aggs/percent/percent.so(iter_percentgtr)'
LANGUAGE C;

CREATE FUNCTION combine_percent_gtr(state1 percent_state_t, 
	state2 percent_state_t)
RETURNS percent_state_t
WITH (HANDLESNULLS)
EXTERNAL NAME 
'/usr/udrs/aggs/percent/percent.so(combine_percentgtr)'
LANGUAGE C;

CREATE FUNCTION final_percent_gtr (state percent_state_t)
RETURNS DECIMAL(5,2)
WITH (HANDLESNULLS)
EXTERNAL NAME 
'/usr/udrs/aggs/percent/percent.so(final_percentgtr)'
LANGUAGE C;

These CREATE FUNCTION statements register an SQL name for each of the aggregate support functions that you have written in C. They must all include the HANDLESNULLS routine modifier because the PERCENT_GTR aggregate handles NULL values.

The following query uses the PERCENT_GTR aggregate function on the INTEGER column, col3, to determine the percentage of values greater than 25:

SELECT PERCENT_GTR(col3, 20) FROM tab1

With the rows that Figure 13-7 on page 13-23 has inserted, the preceding query yields a DECIMAL(5,2) value of 33.33 percent: 2 of the 6 values are greater than 20 (24 and 31).

X_PERCENTILE User-Defined Aggregate

Suppose you want to create a user-defined aggregate that calculates the x-percentile for a group of values. The x-percentile is the number within the group of values that separates x% of the values below and (100-x)% above. The median is a special case of the x-percentile. It represents the 50th-percentile:

X_PERCENTILE(y, 50)

That is, the above aggregate returns the value within a sample of y values that has an equal number of values (50%) above and below it in the sample.

The implementation of this UDA uses the following aggregate features:

Use a set-up argument to allow the end user to specify the x-percentile to obtain.

Use a pointer-valued state to hold the state information. Allocate and initialize the state in the INIT support function.

Use a COMBINE function that must do more than just call the ITER support function.

Handle NULL values as aggregate arguments, including returning an SQL NULL value if the aggregate argument to return was NULL.

The X_PERCENTILe user-defined aggregate needs the following state information:

The user-specified set-up argument

The current number of values processed

The current list of values processed

The current list of whether the values processed are NULL.

Therefore, X_PERCENTILE uses a C structure called percentile_state_t to hold the aggregate state:

#define MAX_N 1000

typedef struct percentile_state
	{
	mi_integer percentile;
	mi_integer count;
	mi_integer value_array[MAX_N];
	mi_integer value_is_null[MAX_N];
	} percentile_state_t;


Important: The percentile_state_t structure stores the values processes in an in-memory array within the state. You could also choose to store these values elsewhere, such as in an operating-system file or in a separate location in memory. Each of these locations has advantages and disadvantages. Choose the structure that best fits your application needs.

The size of the percentile_state_t structure depends on the number of aggregate arguments stored in the value_array array; that is, values less then or equal to the MAX_N constant. On a system with 4-byte mi_integer values, the size of this structure is:

8 + 4(MAX_N)

If X_PERCENTILE used an opaque-type state, this structure must be less than the maximum opaque-type size. For systems that have a 32 kilobyte maximum opaque-type size, the X_PERCENTILE aggregate could use an opaque-type state as long as it is called in a query that finds 8190 or fewer rows. If the query finds more than 8190 rows, the state would not fit into an opaque type. To avoid this restriction, X_PERCENTILE implements the aggregate state as a pointer-valued state.

The following code shows the INIT aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:

/* X_PERCENTILE INIT support function on INTEGER */
mi_pointer init_xprcnt(dummy, prcntile, fparam)
	mi_integer dummy;
	mi_integer prcntile;
	MI_FPARAM *fparam;
{
	percentile_state_t *state;

	/* Allocate memory for the state from the PER_COMMAND
	** pool */
	state = (percentile_state_t *)
		mi_dalloc(sizeof(percentile_state_t), PER_COMMAND);

	/* Initialize the aggregate state */
	if ( mi_fp_argisnull(fparam, 1) )
		state->percentile = 50; /* median */
	else
		state->percentile = prcntile;
	state->count = 0;

	return ((mi_pointer)state);
}

This INIT support function performs the following tasks:

handles a set-up argument.

This set-up argument is the value that the end user specifies so that the aggregate can determine the value that has x% values below and (100-x)% above. If the end user provides an SQL NULL for the set-up argument, X_PERCENTILE assumes a value of 50 and therefore calculates the median.

allocates PER_COMMAND memory for the pointer-valued state.

The database server does not perform state management for pointer-valued states. Therefore, the INIT function must allocate the memory for the state. It also assigns the appropriate values to the percentile_state_t structure to initialize the state.

The following code implements the ITER aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:

/* X_PERCENTILE ITER support function on INTEGER */
mi_pointer iter_xprcnt(state_ptr, value, fparam)
	mi_pointer state_ptr;
	mi_integer value;
	MI_FPARAM *fparam;
{   
	mi_integer i, j;
	mi_integer is_null = 0;
	percentile_state_t *state = 
		(percentile_state_t *)state_ptr;

	/* Check for NULL-valued 'value' */
	if ( mi_fp_argisnull(fparam, 1) )
		{
		value = 0;
		is_null = 1;
		}

	/* Find position of 'value' in ordered 'value_array' */
	for ( i=0; i < state->count; i++ )
		{
		if ( state->value_array[i] > value )
			break;
		}

	/* Increment number of values (count) */
	++state->count;

	/* Put value into ordered list of existing values */
	for (j=state->count - 1; j > i; j--)
		{
		state->value_array[j] = state->value_array[j-1];
		state->value_is_null[j] = state->value_is_null[j-1];
		}
	state->value_array[i] = value;
	state->value_is_null[i] = is_null;

return ((mi_pointer)state);
}

The ITER support function updates the aggregate state in-place with the following information:

Increment the number of aggregate arguments processed (count)

Store the new aggregate argument in increasing sorted order in the value_array array.

Store the is-NULL flag that corresponds to each aggregate argument in its corresponding position in the value_is_null array.

The ITER function also handles a possible NULL-valued aggregate argument. Because the X_PERCENTILE aggregate is defined to handle NULL values (see Figure 13-26 on page 13-76), the database server calls ITER for NULL-valued aggregate arguments.

The following COMBINE aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:

/* X_PERCENTILE COMBINE support function on INTEGER */
mi_pointer combine_xprcnt(state1_ptr, state2_ptr)
   mi_pointer state1_ptr, state2_ptr;
{
	mi_integer i;
	percentile_state_t *state1 = 
		(percentile_state_t *)state1_ptr;
	percentile_state_t *state2 = 
		(percentile_state_t *)state2_ptr;

	/* Merge the two ordered value arrays */
	for ( i=0; i < state2->count; i++ )
		(void) iter_xprcnt(state1_ptr, 
			state2->value_array[i]);

	/* Free the PER_COMMAND memory allocated to the state of
	** the 2nd parallel thread (state2). The memory
	** allocated to the state of the 1st parallel thread
	** (state1) holds the updated state. It is in the FINAL
	** support function. */
	mi_free(state2);

	return (state1_ptr);
}

This COMBINE support function merges two aggregate states, as follows:

Two ordered lists are merged into a single ordered list.

Two counts are added together.

Memory for one of the partial states is freed.

A pointer to the merged aggregate state is returned.

The following FINAL aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:

/* X_PERCENTILE FINAL support function on INTEGER */
mi_integer final_xprcnt(state_ptr, fparam)
	mi_pointer state_ptr;
	MI_FPARAM *fparam;
{
	mi_integer index, trunc_int;
	mi_integer x_prcntile;
	percentile_state_t *state = 
		(percentile_state_t *)state_ptr;

/* Obtain index position of x-percentile value */
	trunc_int = (state->count) * (state->percentile);
	index = trunc_int/100;
	if ( (trunc_int % 100) >= 50 )
		index++;

/* Obtain x-percentile value from sorted 'value_array' */ 
	x_prcntile = state->value_array[index];

/* Check for NULL value so it can be returned as such */
	if ( state->value_is_null[index] )
		mi_fp_setreturnisnull(fparam, 0, MI_TRUE);

/* Free the PER_COMMAND memory allocated to the state */
   mi_free(state);

/* Return retrieved x-percentile value */
	return (x_prcntile);
}

This FINAL support function performs the following tasks:

Calculates the x-percentile for the values in the sorted array.

The FINAL function must obtain the index position for the value that represents the x-percentile, where x is the percentage that the end user has passed in as a set-up argument.

Deallocates PER_COMMAND memory for the pointer-valued state.

The database server does not perform any state management for pointer-valued states. Therefore, the FINAL function must deallocate the PER_COMMAND state memory that the INIT function has allocated.

Once you have successfully compiled and linked the aggregate support functions, you can define the PERCENT_GTR aggregate in the database. Figure 13-26 shows the CREATE AGGREGATE statement that defines the X_PERCENTILE user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions. It also includes the HANDLESNULLS modifier to indicate that the PERCENT_GTR aggregate does process NULL values as aggregate arguments. By default, the database server does not pass a NULL value to an aggregate.


	Figure 13-26 Registering the X_PERCENTILE User-Defined Aggregate

Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the X_PERCENTILE aggregate are compiled and linked into a shared-object module called percent.

On a Solaris system, the executable code for the X_PERCENTILE aggregate support functions would be in a shared library called percent.so.

The following CREATE FUNCTION statements register the X_PERCENTILE aggregate support functions for an aggregate argument of the INTEGER data type:

CREATE FUNCTION init_x_prcntile(dummy INTEGER, 
		x_percent INTEGER)
	RETURNING POINTER
	WITH (HANDLESNULLS)
	EXTERNAL NAME 
		'/usr/udrs/aggs/percent/percent.so(init_xprcnt)'
	LANGUAGE C;

CREATE FUNCTION iter_x_prcntile(agg_state POINTER, 
		one_value INTEGER)
	RETURNS POINTER
	WITH (HANDLESNULLS)
	EXTERNAL NAME 
		'/usr/udrs/aggs/percent/percent.so(iter_xprcnt)'
	LANGUAGE C;

CREATE FUNCTION combine_x_prcntile(agg_state1 POINTER, 
		agg_state2 POINTER)
	RETURNS POINTER
	WITH (HANDLESNULLS)
	EXTERNAL NAME 
	'/usr/udrs/aggs/percent/percent.so(combine_xprcnt)'
	LANGUAGE C;

CREATE FUNCTION final_x_prcntile(agg_state POINTER)
	RETURNS INTEGER
	WITH (HANDLESNULLS)
	EXTERNAL NAME 
		'/usr/udrs/aggs/percent/percent.so(final_xprcnt)'
	LANGUAGE C;

These CREATE FUNCTION statements use the SQL data type, POINTER, to indicate that the aggregate support functions accept a generic C pointer and perform their own memory management. They must all include the HANDLESNULLS routine modifier because the X_PERCENTILE aggregate handles NULL values.

The following query uses the X_PERCENTILE aggregate function on the INTEGER column, col3, to determine the quartile (the 25th-percentile) for the values in col3:

SELECT X_PERCENTILE(col3, 25) FROM tab1

For the tab1 rows that Figure 13-7 on page 13-23 has inserted, X_PERCENTILE creates the following sorted list for the col3 values:

5, 9, 13, 19, 24, 31

Because 25% of 6 values is 1.5, X_PERCENTILE obtains the item from the data set that has 2 values (1.5 rounded up to the nearest integer) below it. Therefore, the preceding query returns 13 as the quartile for col3.

Now suppose you add the following row to the tab1 table:

INSERT INTO tab1 (7, NULL:complexnum_t, NULL)

This INSERT statement adds a NULL value to the col3 column. Because X_PERCENTILE handles NULLs, the database server calls the X_PERCENTILE aggregate on this new row as well. After this seventh row is inserted, X_PERCENTILE would generate the following sorted list for col3:

(NULL), 5, 9, 13, 19, 24, 31

Twenty-five percent of 7 values is 1.75. Therefore, X_PERCENTILE obtains the item from the data set that has 2 (1.75 truncated to the nearest integer) values below it. Now the quartile for col3 would be 9. However, if X_PERCENTILE was not registered with the HANDLESNULLS modifier, the database server would not call X_PERCENTILE for this newest row and the quartile for col3 would have been 13 (the quartile for 6 rows, even though col3 actually has 7 rows).

Notice also that if you called the X_PERCENTILE aggregate with an x-percentile that would return the first value in the list (the NULL value), the FINAL support function uses the DataBlade API function mi_fp_setreturnisnull() to set the aggregate result to NULL. For example, suppose you execute the following query on the col3:

SELECT X_PERCENTILE(col3, 5) FROM tab1

This query asks for the 5%-tile for the seven values in col3. The first element Because 5% of 7 values is 0.35, X_PERCENTILE obtains the item from the data set that has zero values (0.35 truncated to the nearest integer) below it. Therefore, the preceding query returns NULL as the quartile for col3. The ITER function has stored NULL values as zeros in the sorted value_array. For the FINAL support function to determine when a value of zero indicates a NULL and when it indicates zero, it checks the value_is_null array. If the zero indicates a NULL value, FINAL calls the DataBlade API function mi_fp_setreturnisnull() to set the aggregate result to NULL.

Informix DataBlade API Programmer's ManualCreating Special-Purpose UDRs