Creating Parallelizable UDRs

Home | Previous Page | Next Page Creating User-Defined Routines > Creating Special-Purpose UDRs > Providing UDR-Optimization Functions >

Creating Parallelizable UDRs

The Parallel Database Query (PDQ) feature allows the database server to run a single SQL statement in parallel. When you send a query to the database server, it breaks your request into a set of discrete subqueries, each of which can be assigned to a different CPU virtual processor. A parallelizable query is a query that can be executed in parallel. PDQ is especially effective when your tables are fragmented and your server computer has more than one CPU.

A parallelizable UDR is a C UDR that can be executed in parallel when it is invoked within a parallelizable query. If you write your C UDR to be parallelizable, it can be executed in parallel when the query that invokes it is executed in parallel. That is, the C UDR can execute on subsets of table data just as the query itself can. A query that invokes a nonparallelizable UDR can still run in parallel. However, the calls to the UDR do not run in parallel. Similarly, prepared queries that invoke a parallelizable query do not run the UDR in parallel.

To create a parallelizable C UDR

Write the C UDR so that it does not call any DataBlade API functions that are non-PDQ-threadsafe.
Register the C UDR with the PARALLELIZABLE routine modifier.
Execute the parallelized C UDR, once in each scan thread of the parallelized query.
Debug the parallelized C UDR.

The following subsections describe these steps in detail.

Writing the Parallelizable UDR

To write a parallelizable C UDR, you must ensure that the UDR does not include any calls to the non-PDQ-threadsafe DataBlade API functions that Table 110 lists.

Table 110. Non-PDQ-Threadsafe DataBlade API Functions
Category of Non-PDQ-Threadsafe Function		DataBlade API Function
Statement processing:
	Statement execution A parallelizable UDR cannot parse an SQL statement.	mi_exec( ), mi_prepare( )
	Current-statement processing No current statement exists in a parallelizable UDR. Therefore, these functions are not useful.	mi_binary_query( ), mi_command_is_finished( ), mi_get_result( ), mi_get_row_desc_without_row( ), mi_next_row( ), mi_query_finish( ), mi_query_interrupt( ), mi_result_command_name( ), mi_result_row_count( ), mi_value( ), mi_value_by_name( )
	Prepared statements No prepared statement exists because you cannot prepare one in a parallelizable UDR. Therefore, these functions are not useful.	mi_close_statement( ), mi_drop_prepared_statement( ), mi_exec_prepared_statement( ), mi_fetch_statement( ), mi_get_statement_row_desc( ), mi_open_prepared_statement( ), mi_statement_command_name( ) All input-parameter accessor functions: mi_parameter_* (see Table 57)
	Transfer of data Even though these type-transfer functions are PDQ-threadsafe, they are usually called within the send and receive support functions of an opaque type and are likely to be called during statement processing.	All type-transfer functions: mi_get_, mi_put_ (see Table 114)
	Other	mi_current_command_name( )
Save-set handling		All save-set functions: mi_save_set_* (see Table 60)
Complex-type (collections and row types) handling:
	Collection processing	All collection functions: mi_collection_* (see Collections)
	Row-type processing	mi_get_row_desc( ), mi_get_row_desc_from_type_desc( ), mi_get_row_desc_without_row( ), mi_get_statement_row_desc( ) mi_row_create( ), mi_row_free( ), mi_row_desc_create( ), mi_row_desc_free( )
	Complex-type processing	Type-descriptor accessor functions if they access a complex type: mi_type_* (see Table 9) Column functions if they access a complex type: mi_column_* (see Table 22)
Operating-system file access		All file-access functions: mi_file_* (see Table 91)
Tracing:
	Even though the files listed here are not PDQ-threadsafe, you can include most statements that generate trace output in a parallelizable UDR.	mi_tracefile_set( ), mi_tracelevel_set( )GL_DPRINTF
Miscellaneous		mi_get_connection_option( ), mi_get_database_info( ), mi_get_session_connection( ), mi_get_type_source_type( )

A parallelizable C UDR cannot call (either explicitly or implicitly) any of the DataBlade API functions in Table 110. If you attempt to run a UDR that contains a non-PDQ-threadsafe function in parallel, the database server generates an error. If your UDR must call one of the functions in Table 110, it cannot be parallelizable.

Keep in mind the following considerations when you write a UDR to be parallelizable:

For a UDR that operates on an opaque type to be parallelizable, all support functions of the opaque type must be parallelizable.
A UDR that operates on complex data types cannot be parallelizable.
A UDR can be parallelizable whether it runs in the CPU VP or a user-defined VP.
A UDR that acts as a functional index cannot be parallelizable.
A UDR that is parallelizable cannot call a UDR that is not parallelizable (either explicitly or with the Fastpath interface).

Registering the Parallelizable UDR

When you register a UDR with the PARALLELIZABLE routine modifier, you tell the database server that the UDR was written according to the guidelines in Writing the Parallelizable UDR. That is, the UDR does not call any DataBlade API functions that are non-PDQ-threadsafe. However, registering the UDR with the PARALLELIZABLE modifier does not guarantee that every invocation of the UDR executes in parallel. The decision whether to parallelize a query and any accompanying UDRs is made when the query is parsed and optimized.

Executing the Parallelizable UDR

When a query with a parallelizable UDR executes in parallel, each routine instance might have more than one routine sequence. For a parallelized UDR, the routine manager creates a routine sequence for each scan thread of the query.

For example, suppose you have the following query:

SELECT a_func(x)
FROM table1
WHERE a_func(y) > 7;

Suppose also that the table1 table in the preceding query is fragmented into four fragments and the a_func( ) user-defined function was registered with the PARALLELIZABLE routine modifier. When this query executes in serial, it contains two routine instances (one in the select list and one in the WHERE clause) and two routine sequences. However, when this query executes in parallel over table1, it still contains two routine instances but it now has six routine sequences:

One routine sequence for the primary thread to execute a_func( ) in the select list.
Five routine sequences for a_func( ) in the WHERE clause:
- One routine sequence for the primary thread
- Four routine sequences for secondary PDQ threads, one for each fragment in the table

The MI_FPARAM structure holds the information of the routine sequence. Therefore, the routine manager allocates an MI_FPARAM structure for each scan thread of the parallelized query. All invocations of the UDR that execute in the scan thread can share the information in an MI_FPARAM structure. However, UDRs in different scan threads cannot share MI_FPARAM information across scan threads.

Tip:

The DataBlade API also supports memory locking for a parallelizable UDR that shares data with other UDRs or with multiple instances of the same routine. Memory locking allows the UDR to implement concurrency control on its data; however, the memory-locking feature is an advanced feature of the DataBlade API. For more information on the memory-locking feature, see Handling Concurrency Issues.

For more information about how the routine manager creates a routine sequence, see Creating the Routine Sequence.

Debugging the Parallelizable UDR

You can use the SQL statement SET EXPLAIN to determine whether a parallelizable query is actually being executed in parallel. The SET EXPLAIN statement executes when the database server optimizes a statement. It creates a file that contains:

A copy of the SQL statement
The plan of execution that the optimizer has chosen
An estimate of the amount of work

For more information on SET EXPLAIN, see its description in the IBM Informix: Guide to SQL Syntax and your IBM Informix: Performance Guide.

The following onstat options are useful to track execution of parallel activities:

The -g ath option shows the session thread and any additional scan threads for each fragment that is scanned for a statement that is running in parallel. You can use the -g ses option to help find the relationship between the threads.
The -g stk option dumps the stack of a specified thread. This option can be helpful in tracing exactly what the thread is doing.

For more information on the onstat utility, see the IBM Informix: Administrator's Reference.