informix
Informix DataBlade API Programmer's Manual
Writing a User-Defined Routine

Choosing a Virtual Processor

To service multiple client-application SQL requests, the database server uses virtual processors (VPs). The database server breaks the SQL request into distinct tasks, based on the resource that the task requires. Different VP types, called virtual-processor classes (VP classes), service the different kinds of tasks. The following table lists some of the types of VP classes that the database server supports.

Virtual-Processor Class Description
CPU Central processing (the primary VP class, which controls client-application requests)
AIO Asynchronous Disk I/O
SHM Shared-memory network communication
User-defined Special VP class for additional types of processing

The database server preserves the state of each request in a thread. It then assigns the thread to the VP class that manages the task or resource that the request requires. The VPs in the VP class service multiple requests for their resource by scheduling the threads on the resource.

The CPU virtual processor (CPU VP) is the main VP for the database server. It acts as the central processor for client-application SQL requests. When a client application establishes a connection, the CPU VP creates the session thread for that client application. A CPU VP runs multiple session threads to service multiple SQL client applications.

Tip: This section describes VPs in the context of C UDRs. For a general description of VPs and user-defined routines, see the description on VPs in "Extending Informix Dynamic Server 2000." For a description of virtual processors in general, see the chapter on the database server architecture in your "Administrator's Guide."

Because a session thread is the primary thread for the processing of SQL requests, any C UDRs in an SQL request normally execute in the CPU VP. However, the tasks that your C UDR needs to perform might limit its ability to execute in the CPU VP, as follows:

The VP class in which a C UDR executes affects its performance characteristics and what operating-system interactions the UDR can safely use. Generally, the more coding restrictions that apply to the VP class, the more efficiently the VP class can execute the UDR.

Important: The success of your C UDRs and your DataBlade API project depends in large degree on how well you implement the features related to the safety and interoperability of your C UDR.

Creating a Well-Behaved Routine

Because the CPU VP is such a critical resource, it is important that the code it executes be well-behaved; that is, all code must have the following attributes:

Informix ensures that code it provides to execute within SQL statements (such as built-in SQL functions) is well-behaved. However, Informix does not have control over the code you write in your C UDR. A C UDR must be well-behaved to execute in the CPU VP. As a UDR developer, you must ensure that your C UDR adheres to the safe-code requirements in Figure 12-5.

Figure 12-5
Safe-Code Requirements for a Well-Behaved UDR

Safe-Code
Requirement
Coding Rule Possible Workarounds
Preserve concurrency. Yield the CPU VP on a regular basis. To execute in the CPU VP, use mi_yield() to explicitly yield the CPU VP during resource-intensive processing. Otherwise, execute in a user-defined VP class.
Do not use blocking I/O calls. Execute in a yielding user-defined VP class.
Be thread safe. No heap-memory allocation. To execute in the CPU VP, use the DataBlade API memory-management functions.
No modification of global or static data. To execute in the CPU VP, use the MI_FPARAM structure if you need to preserve state information. If necessary, global or static data can be read, as long as it is not updated. Otherwise, execute in a nonyielding user-defined VP class or a user-defined VP class that has only one VP defined.
No modification of the global state of the virtual processor. A C UDR that modifies the global VP state, cannot execute safely in the CPU VP. If modification of this data is essential to the application, execute the C UDR in a nonyielding user-defined VP class or a user-defined VP class that has only one VP defined.
Avoid unsafe
operating-system calls.
Do not use any system calls that might impair concurrency or allocate local resources. If use of such system calls is essential to the application, execute the C UDR in a nonyielding user-defined VP class or a user-defined VP class that has only one VP defined.

If a UDR does not follow the safe-code requirements in Figure 12-5, it is called an ill-behaved routine. An ill-behaved routine cannot safely execute in the CPU VP.

Warning: Execution of an ill-behaved routine in the CPU VP can cause serious interference with the operation of the database server. In addition, the UDR itself might not produce correct results.

If your C UDR was one of the ill-behaved traits in Figure 12-5, follow the suggestions in the Possible Workarounds column. The following sections describe more fully the safe-code requirements for a well-behaved C UDR. For more information on how to execute your C UDR in a user-defined VP, see Handling an Ill-Behaved Routine.

Preserving Concurrency

A well-behaved C UDR must preserve the concurrency of the CPU virtual processor (CPU VP). Concurrency means that two or more threads can be in the middle of their execution at the same time. The CPU virtual processor appears to execute multiple threads simultaneously because it switches between threads. The database server tries to keep a thread running on the same CPU VP that begins the thread execution. However, if the current thread is waiting for some other type of resource to be accessed or some other task to be performed, the CPU virtual processor is needlessly held up. To avoid this situation, the database server can migrate the current thread to a VP class that schedules the resource for which the thread is waiting.

For example, a query request starts as a session thread in the CPU VP. Suppose this query contains a C UDR that accesses a smart large object. While the thread waits for the smart-large-object data to be fetched from disk, the database server migrates the thread to an AIO VP, releasing control of the CPU VP so that other threads can execute.

At a given time, a VP can run only one thread. To maintain concurrency for session threads, the CPU VP swaps out one thread to allow another to execute. This process of swapping threads is sometimes called thread migration. This continual thread migration keeps the CPU VP available to process many threads. The speed at which CPU-VP processing occurs produces the appearance that the database server processes multiple tasks simultaneously.

Unlike an operating system, which assigns time slices to processes for their CPU access, the database server does not preempt a running thread when a fixed amount of time expires. Instead, it runs a thread until the thread yields the CPU VP. Thread yielding can occur at either of the following events:

When a thread yields, the VP switches to the next thread that is ready to run. The VP continues execution and migration of threads until it eventually returns to the original thread.

For a C UDR to preserve concurrency of the CPU VP, it must ensure that it does not monopolize the CPU VP. When a C UDR keeps exclusive control of the CPU VP, it blocks other threads from accessing this VP. A C UDR can impair concurrency of the CPU VP if it performs the following tasks:

Denying other threads access to the CPU VP can affect every user on the system, not just the users whose queries contain the same C UDR. If you cannot code a C UDR to explicitly yield during resource-intensive processing and to avoid blocking-I/O functions, the UDR is an ill-behaved routine and must execute in a user-defined VP class.

Yielding the CPU

To preserve concurrency, a well-behaved C UDR must ensure that it regularly yields the CPU VP. During execution, a C UDR keeps exclusive control of the CPU VP until it completes execution or it yields the CPU VP. A C UDR might yield when it calls a DataBlade API function because DataBlade API functions automatically yield the VP when appropriate. For example, the UDR migrates to the AIO VP to perform any of the following kinds of I/O:

Therefore, you can assume that thread migration might occur during execution of any DataBlade API function.

However, if your C UDR performs any of the following types of resource-intensive tasks (which do not involve calls to DataBlade API functions), your UDR does not automatically yield the VP:

For such a C UDR to be well-behaved, it must explicitly yield the CPU VP with the DataBlade API function mi_yield(). The mi_yield() function causes the thread that is executing the UDR to voluntarily yield the CPU VP so that other threads get a chance to execute in the VP. When the original thread is ready to continue execution, execution resumes at the point immediately after the call to the mi_yield() function.

Tip: You can also use the mi_yield() function to yield a user-defined VP. For more information, see The Yielding User-Defined VP.

Write your C UDR so that it yields the VP at strategic points in its processing. Possible points include the beginning or end of lengthy loops and before and after expensive computations. Use of mi_yield() generally leads to an improved response time overall.

If you cannot code the C UDR to explicitly yield during resource-intensive passages of code, the UDR is an ill-behaved routine and must not execute in the CPU VP. To isolate a resource-intensive UDR from the CPU VP, you can assign the routine to a user-defined VP class. To determine which kind of user-defined VP to define, you must also consider whether you need to preserve concurrency of the user-defined VP. Keep in mind that all VPs of a class share a thread queue. If there are multiple users of your UDR, multiple threads can accumulate in the same thread queue. If your UDR does not yield, it blocks other UDRs that execute in the same VP class. Therefore, the VP might not effectively share between users. One user might have to wait while the UDR in the query of some other user completes.

You can use either type of user-defined VP to execute a resource-intensive routine:

For more information, see Handling an Ill-Behaved Routine.

Avoiding Blocking I/O Calls

To preserve concurrency, a well-behaved C UDR must avoid system calls that perform blocking input and output operations (I/O). These operating-system calls include the following calls:

accept() msgget() putmsg() semop()
bind() open() read() wait()
fopen() pause() select() write()
getmsg() poll()

When a C UDR executes any of these system calls, the CPU VP must wait for the I/O to complete. In the meantime, the CPU VP cannot process any other requests. The database server can appear to stall because the concurrency of the CPU VP is impaired.

If your C UDR needs to perform file I/O, do not use operating-system calls to perform this task. Instead, use the DataBlade API file-access functions. These file-access functions allow the CPU VP to process the I/O asynchronously. Therefore, they do not block the CPU VP. For more information, see Accessing Operating-System Files.

If your UDR must issue blocking I/O calls, assign the routine to execute in a user-defined VP class. When a UDR blocks a user-defined VP, only those UDRs that are assigned to that VP are affected. Your UDR must also handle any problems that could occur if the thread yielded. For example, operating-system files descriptors do not migrate with a thread if it moves to a different VP. For more information on how to assign a UDR to a user-defined VP, see Handling an Ill-Behaved Routine.

Writing Thread-Safe Code

A well-behaved C UDR must be thread safe. During execution, an SQL request might travel around the different VP classes. For example, a query starts in the CPU VP, but it might migrate to a user-defined VP to execute a UDR that was registered for that VP class. In turn, the UDR might fetch a smart large object, which would cause the thread to migrate to the AIO VP.

Migrating a thread to a different VP means that the database server must preserve the state of the thread before it migrates the thread. When a client application connects to the database server, the database server creates a thread-control block (TCB) to store thread-state information needed when a thread switches VPs. The TCB includes the following thread-state information:

When a thread migrates from one VP to another, it releases its original VP so this VP can execute other threads. The benefit of releasing the CPU VP outweighs the overhead involved in saving the thread state. Therefore, a C UDR must be able to continue execution without loss of information when it migrates to a different VP.

For a C UDR to successfully migrate among VPs, its code must be thread safe; that is, it must have the following attributes:

Restricting Dynamic Memory Allocation

To be thread safe, a well-behaved C UDR must not use system memory-management routines to dynamically allocate memory. These operating-system calls include the following calls:

calloc() mmap() shmat()
free() realloc() valloc()
malloc()

These operating-system calls allocate memory from the program heap space. The location of this heap space creates the following problems:

For a C UDR to be well-behaved, it must handle dynamic memory allocation with the DataBlade API memory-management functions. These DataBlade API functions provide the following benefits:

If you are porting legacy code to a C UDR, you might want to write simple C programs to implement system memory-management calls. The following code fragment shows a simple implementation of malloc() and free() functions:

This code fragment uses mi_alloc(), which allocates user memory in the current memory duration. Therefore, the fragment allocates the memory with the default memory duration of PER_ROUTINE. For more information, see Current Memory Duration.

If you cannot avoid using system memory-management functions, your C UDR is ill-behaved. You can use system memory-management functions in your UDR only if you can guarantee that the thread will not migrate. However, a thread could migrate during any DataBlade API call or any routines that are not CPU safe. Therefore, the only way to guarantee that the thread never migrates is to allocate and free the memory inside a code block that does not execute any DataBlade API functions.

This restriction means that if you must use a system memory-management function, you must segment the UDR into sections that use DataBlade API functions and sections that are not CPU safe. All files must be closed and memory deallocated before you leave the non-CPU-safe sections. For more information, see External-Library Routines.

Avoiding Modification of Global and Static Variables

To be thread safe, a well-behaved C UDR must avoid use of global and static variables. Global and static variables are stored in the address space of a virtual processor, in the data segment of a shared-object file. Therefore, they belong to the address space of the VP, not of the thread itself.

When an SQL statement contains a C UDR, the routine manager loads the shared-object file that contains the UDR object code into each VP. Therefore, each VP receives its own copy of the data and text segments of a shared-object file and all VPs have the same initial data in their shared-object data segments. Figure 12-6 shows a schematic representation of a virtual processor and indicates the location of global and static variables.

Figure 12-6
Location of Global and Static Variables in a VP

As Figure 12-6 shows, global and static variables are not stored in database server shared memory, but in the data and text segments of a VP. These segments in one VP are not visible after a thread migrates to another VP. Therefore, if a C UDR modifies global or static data in the data segment of one VP, the same data is not available if the thread migrates.

Figure 12-7 shows an implementation of a C UDR called bad_rowcount(), which creates an incremented row count for the results of a query.


Figure 12-7
Incorrect Use of Static Variable
in a C UDR

Suppose the following SELECT statement executes:

The CPU VP that is processing this query (for example, CPU-VP 1) executes the bad_rowcount() function. The bad_rowcount() function is not well- behaved because it uses a static variable to hold the row count. Use of this static bad_count variable creates the following problems:

A well-behaved C UDR can avoid use of global and static data with the following workarounds.

Workaround Description
Use only local (stack) variables and user memory (which the DataBlade API memory-management functions allocate). Both of these types of memory remain accessible when a thread migrates to another VP.
  • Because the stack is maintained as part of the thread, reads and writes of local variables do not overlap when the thread migrates among VPs. Therefore, write reentrant code that keeps variables on the stack.
  • User memory resides in database server shared memory and therefore is accessible by all VPs. For more information, see Managing User Memory.
  • Use a function-parameter structure, called MI_FPARAM, to track private state information for a C UDR. The MI_FPARAM structure is available to all invocations of a UDR within a routine sequence. Figure 9-7 on page 9-18 shows the implementation of the rowcount() function, which uses the MI_FPARAM structure to correctly implement the row counter that bad_rowcount() attempts to implement. For more information, see Saving a User State.
    If necessary, you can use read-only static or global variables because the values of these variables remain the same in each CPU VP. Keep in mind, that addresses of global and static variables are not stable when the UDR migrates across VPs.

    If your C UDR cannot avoid using global or static variables, it is an ill-behaved routine. You can execute the ill-behaved routine in a nonyielding user-defined VP class but not in the CPU VP. A nonyielding user-defined VP prevents the UDR from yielding, thereby preventing it from migrating to another VP. Because the nonyielding VP executes the UDR to completion, any the global (or static) value is valid for the duration of a single invocation of the UDR. The nonyielding VP prevents other invocations of the same UDR from migrating into the VP and updating the global or static variables. However, there is no way to guarantee that the UDR will return to the same VP for the next invocation.

    For the global (or static) value to be valid across a single UDR instance (all invocations of the UDR), define a single-instance user-defined VP. This VP class contains one nonyielding VP. It ensures that all instances of the same UDR execute on the same VP and update the same global variables. A single-instance user-defined VP is useful if your UDR must access a global or static variable by its address.

    For more information, see Choosing the User-Defined VP Class.

    Modifying the Global Process State

    To be thread safe, a well-behaved C UDR most avoid modification of the global process state. All virtual processors that belong to the same VP class share access to both data and processing queues in memory. However, the global process state is not shared. The database server assumes that the global process state of each VP is the same. This consistency ensures that VPs can exchange work on threads.

    For a C UDR to be well-behaved, it must avoid any programming tasks that modify the global process state of the virtual processor. Update of global and static data (page 12-31) involves modification of the global process. Various operating-system calls can alter the process state and therefore are restricted. The following table shows the categories of operating-system calls that modify the global state of the CPU VP.

    Type of Operating-System Call Restricted Operating-System Calls
    Calls that change the current working directory chdir()
    Calls that change the global state of the process gethostent(), gethostbyaddr(), gethostbyname(), sethostent(), endhostent(), umask()
    Calls that modify the data-segment size of the process brk(), sbrk()
    Calls that modify the shared-memory segments shmat()
    Calls that initiate threads that the operating system provides pthread_create(), thr_create()

    The calls in the preceding table can interfere with thread migration because the global process state does not migrate with the thread. In addition, you need to be careful with tasks such as opening file descriptors and using operating-system threads.

    If a C UDR modifies the global process state, it is ill-behaved. You can execute the ill-behaved routine in a single-instance user-defined VP class but not in the CPU VP. A single-instance user-defined VP class contains one VP. Therefore, it ensures that all invocations of a UDR instance execute on the same VP and update the same global process state. For more information, see The Single-Instance User-Defined VP.

    Avoiding Unsafe Function Calls

    A well-behaved C UDR must avoid the use of system calls. System calls can have the following adverse effects:

    Informix cannot provide a definitive list of unsafe system calls because system calls that are unsafe vary among versions of operating systems and different types of operating systems. Additionally, the implementation of the VPs is different between UNIX and Windows NT.

    On UNIX systems, the VPs are implemented as separate processes.

    On Windows NT, each VP is an NT thread of a common process.

    The difference in VP implementation means that some system calls are acceptable when the C UDR runs on Windows NT but not when this same UDR runs on UNIX. There are also differences in how UNIX handles shared libraries and how Windows NT handles dynamic link libraries (DLLs) that can affect the platform on which operating-system calls are valid.

    However, for a C UDR to be well-behaved, it must use system calls with discretion. This section provides the following information about the use of operating-system calls in well-behaved UDRs:

    Unsafe Operating-System Calls

    A well-behaved C UDR must not include any of the categories of system calls in Figure 12-8. The system calls listed in the Sample Operating-System Calls column are listed only as possible examples. The operating-system calls that are unsafe in your C UDR can depend on your operating system. Consult your operating-system documentation for information on system calls that perform the categories of tasks in Figure 12-8.

    Figure 12-8
    Unsafe Operating-System Calls

    Type of Operating-System Call Sample Operating-System Calls
    Calls that manipulate signals to processes signal(), alarm(), sleep()
    Calls that modify the system security setuid(), seteuid(), setruid(), setgid(), setegid(), setrgid()
    Calls that initiate or halt system processes fork(), exec(), exit(), system(), popen()
    Calls that modify the shared-memory segments shmat()
    Calls that modify the runtime environment of the dynamic linker dlopen(), dlsym(), dlerror(), dlclose() Windows NT systems: LoadLibrary()

    Warning: The database server reserves all operating-system signals for its own use. The virtual processors use signals to communicate with one another. If a UDR were to use signals, these signals would conflict with those that the virtual processors use. Therefore, do not raise, handle, or mask signals within a C UDR.

    You can use system utilities to check if undesired system calls were included in your shared-object file.

    On UNIX systems, you can use the nm and ldd commands to obtain this information. The ldd command lists the dynamic dependencies from a shared object.

    On Windows NT, you can use the DUMPBIN command with its /IMPORTS option to obtain this information.

    Tip: Given a DataBlade build (.bld) file, check for unresolved references in the file and all its dependencies. You can compare this list for system calls that violate the rules of the VP you have chosen to execute your C UDR.

    For a list of operating-system calls that are generally safe in a C UDR, see Safe System Calls.

    External-Library Routines

    Informix recommends that a well-behaved C UDR does not use routines from existing external libraries. Often these external routines contain indirect calls to unsafe operating-system calls.

    If your C UDR must use an external routine, it is an ill-behaved routine. If it is essential to your application, you can take special measures to include calls to the following kinds of non-CPU-safe external library routines:

    For a non-CPU-safe routine to safely execute in a UDR, take the following steps:

    For a non-CPU-safe call to execute safely, the thread that executes the UDR must not migrate out of the VP as long as the UDR uses the unsafe resources (open files, memory allocated with malloc, or static-memory data). However, DataBlade API functions might automatically yield the VP when they execute. This yielding causes the thread to migrate to another VP.

    Therefore, you ca