Executing a User-Defined Routine

Informix DataBlade API Programmer's Manual
Developing a User-Defined Routine

Executing a User-Defined Routine

Once you register a user-defined routine as an external routine in the database, it can be called through:

In a client application, through SQL statements such as:

In the select list of a SELECT statement

In the WHERE clause of a SELECT, UPDATE, or DELETE statement

With the EXECUTE PROCEDURE or EXECUTE FUNCTION statement

In a C UDR, through an SQL statement that one of the following DataBlade API statement-execution functions sends to the database server:

mi_exec()

mi_exec_prepared_statement()

mi_open_prepared_statement()

Tip: For more information on how to use the DataBlade API statement-execution functions, see Chapter 8, Executing SQL Statements. For information on how to call a UDR directly from within another UDR, see Calling UDRs with the Fastpath Interface.

Each occurrence of a user-defined routine, implicit or explicit, in an SQL or SPL statement is a routine instance. One routine instance might involve several routine invocations. A routine invocation is one execution of the UDR. For example, if the following query selects five matching rows, the query has one routine instance of the a_func() user-defined function and five routine invocations for this function:

SELECT a_func(x) FROM table1
WHERE y > 7

Similarly an iterator function might contain many invocations in a single routine instance.

To execute a UDR instance in an SQL statement, the database server takes the following steps:

The query parser breaks the SQL statement into its syntactic parts and performs any routine resolution required.

The query optimizer develops a query plan, which efficiently organizes the execution of the SQL-statement parts.

The query executer calls the routine manager, which handles execution of the UDR instance and any invocations.

The following sections provide information about how the steps of UDR execution can affect the way that you write the UDR. For more general information, see the chapter on how a UDR runs in Extending Informix Dynamic Server 2000.

Routine Resolution

If more than one registered user-defined routine has the same routine name, the routine is overloaded. Routine overloading allows several routines to share a name but to each handle arguments of different data types. When an SQL statement includes a call to an overloaded routine, the query parser uses routine resolution to determine which of the overloaded routines best handles the data type of the arguments in the routine call of the SQL statement.

To perform routine resolution, the query parser looks up information in the system catalogs based on the routine signature. The routine signature contains the following information:

The routine name

The number and data types of the arguments

Whether the routine is a function or a procedure

The database server combines this information to create a routine identifier that uniquely identifies the UDR. This routine identifier is in the procid column of the sysprocedures system catalog.

Tip: The DataBlade API provides the mi_funcid data type to hold routine identifiers.The mi_funcid data type has the same structure as the mi_integer data type. For backward compatibility, some DataBlade API functions (such as mi_routine_id_get()) continue to store an mi_integer for a routine identifier.

For a more detailed description of the steps involved in routine resolution, see Extending Informix Dynamic Server 2000.

The Routine Manager

Once the query parser has used routine resolution to determine which UDR to invoke, the query executor calls the routine manager to handle the UDR execution. The routine manager performs the following steps to execute the C UDR:

For each UDR instance:

Load the shared-object file that contains the object code for the UDR into shared memory.

Allocate and initialize the routine sequence for the UDR.

For each invocation of the UDR:

Push the UDR argument values onto the thread stack.

Dispatch the UDR to the appropriate virtual-processor class for execution.

Save the return value of a user-defined function on the thread stack.

At the end of the UDR instance, release the routine sequence.

The following sections briefly describe each of these steps. For a general discussion of the routine manager, see Extending Informix Dynamic Server 2000.

Loading a Shared-Object File

When you compile a C UDR, you store its object code in a shared-object file. (For more information, see Compiling a C UDR.) For a UDR to execute, its object code must reside in memory so that a virtual processor (VP) can execute it. The database server uses virtual processors to service client-application SQL requests. Each virtual processor is a separate operating-system process. A thread is a database server task that a VP schedules for processing.

Tip: For a more detailed discussion of virtual processors and threads, see Choosing a Virtual Processor.

When the routine manager reaches the first occurrence of the UDR in the SQL statement, the routine manager determines whether its shared-object file is loaded. If the file is not yet loaded, the routine manager dynamically loads its code and data sections into the permanent storage for all virtual processors. The routine manager obtains the pathname of the shared-object file in the externalname column of the sysprocedures system catalog. This loading also occurs for both explicit UDR calls and implicit calls (such as operator functions and opaque-type support functions).

Figure 11-2 shows a schematic representation of what VPs look like after the routine manager loads a shared-object file.


Figure 11-2 Loading a Shared-Object File

In Figure 11-2, assume that the func1(), func2() and func3() functions are registered as user-defined functions with the CREATE FUNCTION statement and linked into the source1.so shared-object file. The client application calls the func1() user-defined function within a SELECT statement. The routine manager loads the source1.so file into memory, if this file is not yet loaded. For subsequent references to these UDRs, the routine manager can skip the shared-object load.

The routine manager sends an entry to the message-log file about the status of the shared-object load, as follows:

when it successfully loads the shared-object file

when it unloads a shared-object file:

when it is not able to load the shared-object file

if the routine manager cannot find the shared-object file

if the shared-object file does not have read permission

if one of the symbols in the shared-object file cannot be resolved

For example, when the routine manager loads the source1.so shared-object file, the message log file would contain messages of the form:

12:28:45 Loading Module </usr/udrs/source1.so>
12:28:45 The C Language Module </usr/udrs/source1.so> loaded

Check the message log file for these messages to ensure that the correct shared-object file is loaded into the virtual processors.

You can monitor the loaded shared-object files with the -g dll option of onstat. This option lists the shared-object files that are currently loaded into the database server.

For information on when the shared-object file is unloaded, see Unloading a Shared-Object File. For information on how to create a shared-object file, see Creating a Shared-Object File. For general information about loading a shared-object file, see Extending Informix Dynamic Server 2000.

Creating the Routine Sequence

A routine sequence is the context in which the user-defined routine executes. Generally, each routine instance (whether implicit or explicit) creates a single, independent routine sequence. For example, suppose you have the following query:

SELECT a_func(x)
FROM table1
WHERE a_func(y) > 7;

When this query executes in serial, it contains two routine instances of a_func(): one in the select list and the second in the WHERE clause. Therefore, this query has two routine sequences.

However, when a query with a parallelizable UDR (one that is registered with the PARALLELIZABLE routine modifier) executes in parallel, each routine instance might have more than one routine sequence. For more information, see Executing the Parallelizable UDR.

For each routine sequence, the routine manager creates a routine-state space, called an MI_FPARAM structure, which contains routine-state information from the routine sequence, including the following:

The routine identifier

The number of arguments passed to the UDR

Information about the UDR arguments

The user state (optional)

The MI_FPARAM structure does not contain the actual argument values.

The routine manager allocates an MI_FPARAM structure when it initializes the routine sequence. This structure persists across all routine invocations in that routine sequence because the MI_FPARAM structure has a memory duration of PER_COMMAND. The routine manager passes an MI_FPARAM structure as the last argument to a UDR. (For more information, see For the MI_FPARAM Argument.) To obtain routine-state information, a C UDR invocation can access its MI_FPARAM structure. (For more information, see Accessing the Routine State with MI_FPARAM.)

Pushing Arguments Onto the Stack

When the routine manager pushes arguments onto the thread stack, it pushes them as MI_DATUM values. Therefore, it takes the following factors into account:

Whether the argument is passed by value or by reference

Whether the argument needs to be promoted

Passing Mechanism for MI_DATUM Values

The routine manager pushes these MI_DATUM values onto the thread stack before it invokes the routine. These MI_DATUM values contain the data in its internal database format. The size of an MI_DATUM determines whether the routine manager passes a particular argument by value or by reference, as follows:

The routine manager passes most argument values by reference; that is, it passes a pointer to the actual argument value.

If the argument value has a data type that is greater than the size of an MI_DATUM, the routine manager passes the argument by reference because it cannot fit the actual value onto the stack. Instead, the MI_DATUM that the routine manager pushes onto the stack contains a pointer to the value. The routine manager allocates the memory for these pass-by-reference arguments with a PER_ROUTINE duration.

The routine manager passes a few special types of argument by value; that is, the MI_DATUM contains the actual argument value.

If the argument value is a data type whose size is less than or equal to the size of an MI_DATUM, the routine manager passes the argument by value because it can fit the actual value onto the stack.

Figure 2-18 on page 2-48 lists the data types that the routine manager passes by value. All arguments whose data type is listed in this figure are passed by value unless the argument is an OUT parameter. OUT parameters are never passed by value; they are always passed by reference. The routine manager passes by reference any argument whose data type is not listed in Figure 2-18 on page 2-48.

Tip: For a particular argument data type, you can determine from its type descriptor whether it is passed by reference or passed by value with the mi_type_byvalue() function.

For information on how to code routine parameters, see Defining Routine Parameters. For information on how the routine manager passes return values out of a user-defined routine, see Returning the Value.

Argument Promotion

C compilers that accept Kernighan-&-Ritchie (K&R) syntax promote all arguments to the int data type when they are passed to a routine. The size of this int data type is native for the computer architecture. ANSI C compilers permit arguments to be shorter than the native computer architecture size of an int. However, the routine manager uses K&R calling conventions when it pushes an MI_DATUM onto the thread stack.

Tip: Many ANSI C compilers can use K&R calling conventions so code does work correctly across all platforms.

The routine manager cast promotes arguments with passed-by-value data types that are smaller than the size of MI_DATUM to the size of MI_DATUM. When you obtain the smaller passed-by-value data type from the MI_DATUM, you should reverse the cast promotion to assure that your value is correct. For more information, see MI_DATUM in a C UDR.

Tip: To avoid this cast-promotion situation, the BladeSmith product generates C source code for BOOLEAN arguments as mi_integer instead of mi_boolean.

If you pass arguments smaller than an MI_DATUM, Informix recommends that you pass small "by-value" SQL types as an mi_integer.

Managing UDR Execution

Once the routine manager creates a routine sequence and pushes the arguments onto the stack, it invokes the UDR. It then manages the execution of the UDR associated with this routine sequence. The number of times that the UDR is invoked depends on the following factors:

Does the UDR handle SQL NULL values?

If an argument to the UDR is the SQL NULL value and the UDR does not handle NULL values (it was not registered with the HANDLESNULLS routine modifier), the routine manager does not execute the UDR.

Is the UDR an iterator function?

An iterator function has several iterations. It executes once to initialize the iterations, once for each iteration, and once to release iteration resources. For more information, see Writing an Iterator Function.

Where is the UDR invoked within the SQL statement?

If the UDR is in the select list, it executes once per row that the WHERE clause qualifies.

If the UDR is in the WHERE clause, the exact number of times that it executes cannot be predicted. It might be less than or equal to the number of rows or it might not be executed at all. The query optimizer makes this determination.

If the UDR is in an EXECUTE FUNCTION or EXECUTE PROCEDURE statement, it executes once (unless it is an iterator function).

Where the UDR executes is determined by the presence of the CLASS routine modifier, as follows:

If the UDR registration did not have a CLASS routine modifier or this CLASS routine modifier specified the CPU VP, the routine manager switches to the CPU VP for execution of the UDR.

If the UDR registration had a CLASS routine modifier that specifies a user-defined VP class, the routine manager switches to the specified VP class for execution of the UDR.

Returning the Value

For execution of a user-defined function, the routine manager returns any resulting value to the query executor when execution is complete. When the routine manager returns the value from a user-defined function, it passes this value as an MI_DATUM value. As with routine arguments, the passing mechanism that the routine manager uses depends on the size of the return-value data type, as follows:

The routine manager passes most return values by reference; that is, it passes a pointer to the actual return value.

If the return value has a data type that is greater than the size of an MI_DATUM, the routine manager passes the return value by reference because it cannot fit the actual value onto the stack. The routine manager allocates the memory for these pass-by-reference return values with a PER_ROUTINE duration.

The routine manager passes a few special types of return values by value; that is, the MI_DATUM contains the actual return value.

If the return value is a data type whose size is less than or equal to the size of an MI_DATUM, the routine manager passes the return value back by value because it can fit the actual value onto the stack. Figure 2-18 on page 2-48 lists the data types that the routine manager passes by value.

Tip: For a particular return-value data type, you can determine from its type descriptor whether it is passed by reference or passed by value with the mi_type_byvalue() function.

The routine manager determines information about the return value (such as whether it is an SQL NULL value) from the MI_FPARAM structure of the UDR. For information on how to code routine return values, see Defining a Return Value.

Releasing the Routine Sequence

At the end of the routine instance, the routine manager releases the associated routine sequence. At this time, it also deallocates the MI_FPARAM structure.

Informix DataBlade API Programmer's ManualDeveloping a User-Defined Routine