Extending a Built-In Aggregate

Home | Previous Page | Next Page Creating User-Defined Routines > Creating Special-Purpose UDRs > Writing an Aggregate Function >

Extending a Built-In Aggregate

This section explains how to extend a built-in aggregate by overloading an operator function.

To extend a built-in aggregate function with a C user-defined function

Determine the appropriate operator function that you must overload to implement the desired built-in aggregate function.
For a list of built-in aggregate functions and the associated operator functions to overload, see the chapter on aggregate functions in the IBM Informix: User-Defined Routines and Data Types Developer's Guide.
Write the C UDR that implements the required operator function for the data type that you want the aggregate to handle.
To extend built-in aggregates so that they handle user-defined data types, write an operator function that accepts the user-defined data type as an argument. Compile the C UDR and link it into a shared-object file.
Register the overloaded operator function with the CREATE FUNCTION statement.
Use the newly extended aggregate on the data.

Suppose you want to use the SUM aggregate on complex numbers, which are stored in the following user-defined data type: a named row type named complexnum_t. Figure 93 shows the CREATE ROW TYPE statement that registers the complexnum_t named row type.

Figure 93. A Named Row Type to Hold a Complex Number

CREATE ROW TYPE complexnum_t
   (real_part SMALLFLOAT,
   imaginary_part SMALLFLOAT);

The following sections show how to extend the SUM aggregate on the complexnum_t named row type.

Choosing the Operator Function

The SUM built-in aggregate function uses the plus operator (+), which the plus( ) user-defined function implements. The database server provides implementations of the plus( ) function over the built-in data types. Therefore, the SUM aggregate function works over built-in data types. To have the SUM aggregate operate on the complexnum_t row type, you implement a plus( ) function that handles this named row type; that is, it adds the two parts of the complex number and returns a complex number with the summed parts.

The following C function, complex_plus( ), defines such a plus( ) function:

MI_ROW *complex_plus(arg1, arg2)
   MI_ROW *arg1;
   MI_ROW *arg2;

Writing the Operator Function

The code segment shows the implementation of the complex_plus( ) function, which implements a plus( ) function for the complexnum_t data type:

MI_ROW *complex_plus(arg1, arg2, fparam)
   MI_ROW *arg1;
   MI_ROW *arg2;
   MI_FPARAM *fparam;

{
   mi_real real_zero, imag_zero = 0.0;
   mi_real *real_value1, *real_value2;
   mi_integer real_len1, real_len2;
   mi_real *imag_value1, *imag_value2;
   mi_integer imag_len1, imag_len2;

   mi_real sum_real, sum_imag;

   MI_CONNECTION *conn;
   MI_TYPEID *type_id;
   MI_ROW_DESC *row_desc;

   mi_integer i;
   MI_ROW *ret_row;
   MI_DATUM values[2];
   mi_boolean nulls[2] = {MI_FALSE, MI_FALSE};

   for ( i=0; i<=1; i++ )
      {
      if ( mi_fp_argisnull(fparam, i) == MI_TRUE )
         {
         /* Put initialized complex number into 'values'
          * array
          */
         values[0] = (MI_DATUM)&real_zero;
         values[1] = (MI_DATUM)&imag_zero;

         /* Generate initialized row type for arg1 */
         conn = mi_open(NULL, NULL, NULL);
         type_id = mi_typestring_to_id(conn,
            "complexnum_t");
         row_desc = mi_row_desc_create(type_id);

         ret_row = mi_row_create(conn, row_desc, values,
            nulls);

         if ( i == 0 )
            arg1 = ret_row;
         else
            arg2 = ret_row;
         }
      }

   /* Extract values from arg1 row type */
   mi_value_by_name(arg1, "real_part", 
      (MI_DATUM *)&real_value1, &real_len1);
   mi_value_by_name(arg1, "imaginary_part", 
      (MI_DATUM *)&imag_value1, &imag_len1);

   /* Extract values from arg2 row type */
   mi_value_by_name(arg2, "real_part", 
      (MI_DATUM *)&real_value2, &real_len2);
   mi_value_by_name(arg2, "imaginary_part", 
      (MI_DATUM *)&imag_value2, &imag_len2);

   /* Sum the complex numbers */
   sum_real = *real_value1 + *real_value2;
   sum_imag = *imag_value1 + *imag_value2;

   /* Put sum into 'values' array */
   values[0] = (MI_DATUM)&sum_real;
   values[1] = (MI_DATUM)&sum_imag;

   /* Generate return row type */
   conn = mi_open(NULL, NULL, NULL);
   type_id = mi_typestring_to_id(conn, "complexnum_t");
   row_desc = mi_row_desc_create(type_id);
   ret_row = mi_row_create(conn, row_desc, values, nulls);

   return (ret_row);
}

This version of the plus( ) function performs the following tasks:

Checks for a NULL-valued state (which indicates the first invocation of the ITER function) to initialize the state
Checks for a NULL-valued aggregate argument to initialize a NULL the argument
Accepts the two complexnum_t arguments as row-type pointers (MI_ROW *) and uses the mi_value_by_name( ) function to extract the individual fields of the row type from the arguments
Calculates the sum of the complex numbers by adding the real values together and the imaginary values together
Creates an MI_ROW type to hold the complexnum_t value with the final sum: the mi_row_desc_create( ) function creates a row descriptor for the complexnum_t data type and the mi_row_create( ) function populates the associated row structure with the final sum values
Returns a pointer to the MI_ROW structure (because a row type must be returned by reference, not by value)

Once the complex_plus( ) function is written, you compile it and put it into a shared-object file. Suppose that complex_plus( ) is compiled and linked into a shared-object module named sqsum.

UNIX/Linux Only

On UNIX or Linux, the executable code for the complex_plus( ) operator function would be in a shared library named sqsum.so.

End of UNIX/Linux Only

For more information, see Compiling a C UDR.

To extend a built-in aggregate over a user-defined data type, you overload the appropriate operator function to handle the user-defined type. However, operator functions can also be used as part of an expression that does not involve aggregates. Therefore, aggregate support functions for built-in aggregates on user-defined data types (opaque types, distinct types, and named row types) must allocate a new state when they need to modify the state.

For example, the following SUM aggregate uses the overloaded plus( ) operator to calculate the sum of values in the col1 column:

SELECT SUM(col1) FROM tab2 WHERE ....;

For each aggregate argument, the SUM aggregate invokes the plus( ) operator to add the aggregate argument (agg_arg) into the sum of the previous values in the aggregate state (agg_state), as follows:

plus(agg_state, agg_arg)

When you modify the aggregate state in-place, the value of the agg_state argument to plus( ) changes. When plus( ) exits, the agg_state argument holds the new sum of the aggregate arguments, which includes the agg_arg value.

However, the plus( ) function is also valid in expressions that do not involve aggregates, as in the following query:

SELECT col1 FROM tab2 WHERE col1 + 4 > 17;

In this WHERE clause, the database server invokes the plus( ) operator to add 4 to the col1 value, as follows:

plus(col1, 4)

If the plus( ) operator modifies the aggregate state in-place, the value of its first argument changes to hold the sum of col1 and 4. It is not safe to modify arguments in place because the values of arguments (col1 and 4) must not change. Therefore, when you modify the aggregate state in an operator function of a built-in aggregate, you must be careful not to use the "in-place" modification method.

Registering the Overloaded Operator Function

With the operator function written, compiled, and linked into a shared-object file, you can register this function in the database with the CREATE FUNCTION statement. You must have the appropriate privileges for this registration to be successful. For more information, see the chapter on user-defined aggregates in the IBM Informix: User-Defined Routines and Data Types Developer's Guide.

Figure 94 shows the CREATE FUNCTION statement that overloads the plus( ) function with a new version that handles the complexnum_t named row type.

Figure 94. Registering the Overloaded plus( ) Function

CREATE FUNCTION plus(arg1 complexnum_t, arg2 complexnum_t)
RETURNS complexnum_t
EXTERNAL NAME '/usr/udrs/aggs/sums/sqsum.so(complex_plus)'
LANGUAGE C;

Tip:

Because SUM is a built-in aggregate, you do not have to use the CREATE AGGREGATE statement to define the SUM aggregate.

Using the Extended Aggregate

Once you execute the CREATE FUNCTION statement in Figure 94, you can use the SUM aggregate on complexnum_t columns. For example, suppose you create the tab1 table as Figure 95 shows.

Figure 95. A Table with a complexnum_t Column

CREATE TABLE tab1
   (col1 INTEGER,
   col2 complexnum_t,
   col3 INTEGER);

INSERT INTO tab1 
   VALUES (1, row(1.5, 3.7)::complexnum_t, 24);
INSERT INTO tab1 
   VALUES (2, row(6.9, 2.3)::complexnum_t, 13);
INSERT INTO tab1 
   VALUES (3, row(4.2, 9.4)::complexnum_t, 9);
INSERT INTO tab1 
   VALUES (4, row(7.0, 8.5)::complexnum_t, 5);
INSERT INTO tab1 
   VALUES (5, row(5.1, 6.2)::complexnum_t, 31);
INSERT INTO tab1 
   VALUES (6, row(3.9, 4.6)::complexnum_t, 19);

The following query uses the SUM aggregate function on the complexnum_t column, col2:

SELECT SUM(col2) FROM tab1;

With the rows that Figure 95 has inserted, the preceding query yields a complexnum_t value of:

ROW(28.6, 34.7)

As a side effect of the new plus( ) function, you can also add two complexnum_t columns in an SQL expression, as follows:

SELECT complex_num1 + complex_num2 FROM complex_nums
WHERE id > 6;