Multirep, variable-length user-defined types

For a complete example of a DataBlade that implements a multirep, variable-length user-defined type, download this file. The following explanation is designed to accompany the example code.

Introduction

If a DataBlade employs a variable-length user-defined type (UDT), it is usually necessary to handle various representations of the data that are contained in the UDT. This is because Informix Universal Server imposes some size limitations on UDTs. The limitations are:

  1. Row buffer size (32k)

    For efficiency inside the server, entire rows (i.e., tuples) are passed around in 32 kbyte row buffers. If the UDT is to be stored in a multi-column table, the sum of the sizes of all the columns must not exceed 32k. This 32k limit does not affect the UDT size limit directly, but it does influence the choice of max length (item #2, next).

  2. UDT max length (default 2k)

    When you define a variable-length opaque type, you can specify a maximum allowable length with a MAXLEN modifier. If you don't specify anything, the default value is 2 kbytes.

  3. Rtree leaf page space (approx 1960 bytes)

    If you intend to make your UDT indexable with an Rtree, a copy of the UDT data will be stored in the leaf pages of the index. The Rtree index page size is 2 kbytes, but after allowing for data structure overhead there are approximately 1960 bytes available. Furthermore, the Rtree index will not work if there are not at least two UDTs on each page, so you should restrict the UDT size to less than or equal to 1/2 of this amount. In the example code we have chosen 960 bytes.

Because of these size limitations, it is necessary to convert UDTs which are 'too big' into large objects. This means that the variable-length portion of the UDT data will get stored in Smart Blob Space (i.e., sbspace), and a handle to the large object (i.e., lohandle) will get stored in the actual database table (i.e., in-row). This is the concept of multi-representation: the data comprising a UDT can mean different things, depending on how the actual user data is stored.

The DataBlade API contains a certain amount of support for dealing with multirep data, but the structures and library routines are somewhat difficult to use. Hence this example of how to implement a multirep UDT.

Example blade overview

The example blade creates a new UDT called MyUdt. This is a variable-length opaque type which internally comprises an array of double-precision floating point values.

A bare minimum of SQL functions are provided to show how to deal with multirep data:

WARNING! There is absolutely no error reporting, and virtually no error handling, in the example. Also, the code used in the text input and output routines is not thread-safe, does not use GLS library calls, and is therefore not internationalizable. If you plan on using this code as the basis for a production DataBlade, be sure to flesh out these sections.

Key design concepts

The source code is fairly well commented and should be self-explanatory, but there are a few concepts which, once understood, will make the code easier to follow.

How to run the example

Prerequisites: The makefile is set up for a Solaris/Sparc platform, but with minor modifications it can be made to work on other unix platforms. All the platform-dependent flags are located at the beginning of the makefile. Use the makefile to compile and install the blade in your $INFORMIXDIR/extend directory tree. Then use dbaccess or blademgr to register the blade (via the register.sql script). Finally, you can execute one or more of the testn.sql scripts to see how the blade behaves.

What the test scripts do

test1.sql
This is an example of creating and inserting a small UDT into a table. Since the array of values is small enough to fit in-row, the text input routine (MyUdtIn) will return an lvarchar containing the actual data,and the Assign routine will not have to do anything.

test2.sql
This is an example of creating and insertinga large UDT into a table. The array of values is too big to fit in-row, so the text input routine will store the array in a buffer. When the server calls the Assign routine to insert the UDT into the table, it must convert the buffer to a large object and store the lohandle in the UDT.

test3.sql
This example creates a UDT which is large, but never gets stored in a table. Hence the Assign routine will never be called, and the NumElements routine will operate on a UDT whose data is stored in a per-command memory buffer.

test4.sql
This is an example of nested routine calls. The text input routine gets called to convert the text string to a UDT; then the NumElements routine will get called to return the number of array elements; finally the Element routine gets called to return one array element. The Assign routine will never get called because the UDT is not stored in a table.

test5.sql
This example illustrates what happens when you increase the size of a UDT to the point where it must be converted to a large object. The update operation will call AppendElement, which creates a new UDT, with the array data stored in a per-command memory buffer. The Assign routine will convert this buffer to a large object.

test6.sql
This is another example of how to modify a UDT via the ReplaceElement routine. Even though the size of the UDT is not being changed, you need to make a copy of the original UDT and let the Assign routine convert the new UDT back into a large object if necessary. This is because if the original UDT was a large object, you must pin the old data to bring it into memory, modify the data, and create a new large object. A more sophisticated implementation of this routine would modify the large object data in-place, via mi_lo_writewithseek. This exercise is left to the reader.