<img src="048.gif" height="24" width="100"> <img src="0410.gif" height="24" width="100"> Locale Support For C User-Defined Routines

Informix Guide to GLS Functionality
Database Server Features

Locale Support For C User-Defined Routines

Dynamic Server allows you to create user-defined routines (UDRs) that are written in the C programming language. These C UDRs use the DataBlade API to communicate with the database server. For a complete description of the DataBlade API, see the DataBlade API Programmer's Manual. This section describes how to internationalize a C UDR.

Internationalization is the process of creating a user-defined routine (UDR) that can support different languages, territories, and code sets without changing or recompiling its code. For a complete discussion of internationalization, see the Informix GLS Programmer's Manual. An internationalized C UDR must handle the following GLS considerations:

Where can the UDR use non-ASCII characters in source code?

What considerations must the C UDR take when copying character data?

How can the UDR access GLS locales?

How does the UDR handle code-set conversion?

How does the UDR handle locale-specific end-user formats?

How can the UDR access internationalized exception messages?

How can the UDR access internationalized tracing messages?

How do opaque-type support functions handle locale-sensitive data?

Current Processing Locale for UDRs

To access a database, a client application first requests a connection to the database server. The database server must verify that it can access the specified database and establish the connection between the client and this database. In the process, the database server establishes the server-processing locale to use the duration of the connection. When the client application executes a UDR, this UDR executes on the server computer in the context of the server-processing locale. This locale is often called the current processing locale.

Many user-defined routines handle non-ASCII data correctly even if they were originally written for ASCII data. However, some routines might perform abnormally. To internationalize your C UDR, you must ensure that your UDR handles the server-processing locale in any GLS-related operations. If the UDR does not properly support the server-processing locale, the routine might return an error message.

Non-ASCII Characters in Source Code

Non-ASCII characters might appear in the following places within a C-language UDR source file:

In C-language statements, such as variable names and if statements

In SQL statements, which are sent to the database server through the mi_exec() or mi_exec_prepared_statement() functions

In C-Language Statements

The C compiler must recognize the code set that you use in your C-language statements. The capabilities of your C compiler might limit your ability to use non-ASCII characters within the C-language statements in a UDR source file. For example, some C-language compilers support multibyte characters in literals or comments only.

If the C compiler does not fully support non-ASCII characters, it might not successfully compile a UDR that contains these characters. In particular, the following situations might affect compilation of your UDR:

Multibyte characters might contain C-language tokens.

A component of a multibyte character might be indistinguishable from certain single-byte characters such as percent (%), comma, backslash (\), and double quote ("). If such characters exist in a quoted string, the C compiler might interpret them as C-language tokens, which can result in compilation errors or even lost characters.

The C compiler might not be 8-bit clean.

If a code set contains non-ASCII characters (with code values that are greater than 127), the C compiler must be 8-bit clean to interpret the characters. To be 8-bit clean, a compiler must read the eighth bit as part of the code value; it must not ignore or put its own interpretation on the meaning of this eighth bit.

Tip: The C compiler must also recognize the ASCII code set to be able to interpret the names of the DataBlade API functions within your C UDR.

In SQL Statements

In C UDRs, SQL statements occur as literal strings to the mi_exec() and mi_prepare() functions. The C compiler does not parse these literal strings. Therefore, it does not need to recognize the code set of the characters in these SQL statements.

Within a C source file, you can use non-ASCII characters in SQL statements for the following objects:

Names of SQL identifiers such as databases, tables, columns, views, constraints, prepared statements, and cursors

For more information, see Naming Database Objects.

Literal strings

For example, in a UDR, the following use of multibyte characters is valid: mi_exec(conn, "insert into tbl1 (nchr1) values 'A¹A²B¹B²'", 0);

Filenames and pathnames, as long as your operating system supports multibyte characters in filenames and pathnames

Important: To use non-ASCII characters in your SQL statements, your server-processing locale must include either a code set that supports these characters or a code set that is compatible with the character code set. For information on how to perform code-set conversion, see Character Strings in UDRs.

Copying Character Data

When you copy data, you must ensure that the buffers are an adequate size to hold the data. If the destination buffer is not large enough for the multibyte data in the source buffer, the data might be truncated during the copy. For example, the following C code fragment copies the multibyte data A¹A²A³B¹B²B³from buf1 to buf2:

char buf1[20], buf2[5];
...
stcopy("A1A2A3B1B2B3", buf1);
...
stcopy(buf1, buf2);

Because buf2 is not large enough to hold the multibyte string, the copy truncates the string to A¹A²A³B¹B². To prevent this situation, ensure that the multibyte string fits into a buffer before the DataBlade API module performs the copy.

The Informix GLS Library

The Informix GLS library is an application programming interface (API) that lets developers of user-defined routines and DataBlade modules create internationalized applications.

Character Processing with Informix GLS

The macros and functions of Informix GLS provide access within a DataBlade API module to GLS locales, which contain culture-specific information. The Informix GLS library contains functions that provide the following capabilities:

Process single-byte and multibyte characters

Format date, time, and numeric data to locale-specific formats

For more information on the Informix GLS library and how to use it in a DataBlade API module, see the Informix GLS Programmer's Manual.

Compatibility of Wide-Character Data Types

Wide character data types are an alternative form for the processing of multibyte characters. A wide-character form of a code set involves the normalization of the size of each multibyte character so that each character is the same size. A legacy DataBlade API module might use any of the following data types to hold wide characters.


Wide-Character Data Type	Description	Drawback
mi_wchar	A legacy DataBlade API data type currently defined as unsigned short on all systems	The DataBlade API does not provide wide-character functions that operate on mi_wchar values.
wchar_t	An operating-system data type that is platform-specific	The operating-system provides wide-character functions that operate on wchar_t values. Use of these functions is platform specific.

The Informix GLS library provides the gl_wchar_t data type for support of wide characters. Informix GLS also provides its own set of wide-character functions that operate on gl_wchar_t. Use of the Informix GLS wide-character functions removes platform dependency from your application and provides access within your DataBlade API module to Informix GLS locales.

The Informix GLS library does not provide any functions for conversion between gl_wchar_t and mi_wchar or gl_wchar_t an wchar_t. If a DataBlade API module continues to use either mi_wchar or wchar_t and also needs to use the Informix GLS wide-character processing, you must write code to perform any necessary conversions.

Code-Set Conversion and the DataBlade API

Within a UDR, the DataBlade API does not perform any code-set conversion automatically. Your C UDR might need to perform code-set conversion in the following situations:

In strings that contain SQL statements

In an opaque-type support function for an opaque type that contains character data

Character Strings in UDRs

When your C UDR contains character strings that are sent to the database server, it must perform any required code-set conversion on these strings. This code-set conversion must handle any differences between the code set of this character string and the code set of the server-processing locale in which the UDR executes.

For example, the DataBlade API does not perform code-set conversion on the multibyte table name, A¹A²A³B¹B², in following SELECT statement:

mi_exec(conn, "SELECT * from A¹A²A³B¹B²", 0);

If your UDR might execute in a server-processing locale that does not include a code set that supports characters in your SQL statements, the UDR can explicitly perform code-set conversion between the code sets of the server-processing locale and a specified locale. The DataBlade API provides the following functions to assist in this code-set conversion.


Code-Set Conversion on a String	DataBlade API Function
Perform code-set conversion on a specified string from a specified locale to the server-processing locale	mi_convert_from_codeset()
Perform code-set conversion on a specified string from the server-processing locale to a specified locale	mi_convert_to_codeset()

For more information on the syntax of these DataBlade API functions, see the function reference of the DataBlade API Programmer's Manual.

Character Strings in Opaque-Type Support Functions

The client application performs code-set conversion of non-opaque-type data that is transferred to and from the client. However, the database server does not know about the internal format of an opaque data type. Therefore, for opaque data types, the support functions are responsible for explicitly converting any string that is not in the code set of the server-processing locale.

You might need to perform code-set conversion in the following opaque-type support functions:

In the input and output support functions: to convert the external format of the opaque type between the code sets of the client locale and the server-processing-locale

In the receive and send support functions: to convert any character fields in the internal structure of the opaque type

Tip: The code that the DataBlade Developers Kit (DBDK) generates for opaque-type input and output support functions handles external formats from nondefault locales.

The DataBlade API provides the following functions for code-set conversion in the support functions of an opaque data type.


Code-Set Conversion on an Opaque Type	DataBlade API Function
Perform code-set conversion on a string argument from the code set of the server-processing locale to that of the client locale	mi_put_string()
Perform code-set conversion on a string from the code set of the client locale to that of the server-processing locale	mi_get_string()

For more information on the syntax of these DataBlade API functions, see the function reference in the DataBlade API Programmer's Manual.

Locale-Specific Data Formatting

When a C UDR handles strings that contain end-user formats for date, time, numeric, or monetary data, you must write the UDR so that it handles any locale-specific formats of these end-user formats. The DataBlade API provides functions that convert between the internal representation of several data types and its end-user format.

The following DataBlade API functions convert an internal database value to a string that uses the locale-specific end-user format.


DataBlade API Function	Description
mi_date_to_string()	Uses the locale-specific end-user date format to convert an internal `DATE` value to its string equivalent.
mi_money_to_string()	Uses the locale-specific end-user monetary format to convert an internal `MONEY` value to its string equivalent.
mi_decimal_to_string()	Uses the locale-specific end-user numeric format to convert an internal `DECIMAL` value to its string equivalent.

Important: The mi_datetime_to_string() and mi_interval_to_string() functions do not format the string in the date and time formats of the current processing locale. Instead, they create a date/time or interval string in a fixed ANSI SQL format.

The following DataBlade API functions interpret a string in its locale-specific end-user format and convert it to its internal database value.


DataBlade API Function	Description
mi_string_to_date()	Converts a string in its locale-specific date end-user format to its internal `DATE` format.
mi_string_to_money()	Converts a string in its locale-specific currency end-user format to its internal `MONEY` format.
mi_string_to_decimal()	Converts a string in its locale-specific numeric end-user format to its internal `DECIMAL` format.

Important: The mi_string_to_datetime() and mi_string_to_interval() functions do not interpret the string in the date and time formats of the current processing locale. Instead, they interpret the date/time or interval string in a fixed ANSI SQL format.

Internationalized Exception Messages

The DataBlade API function mi_db_error_raise() sends an exception message to an exception callback. This message can be either of the following:

A literal message, which you provide as the third argument to mi_db_error_raise()

A custom message that is associated with a value of SQLSTATE, which you provide as the third argument to mi_db_error_raise()

The mi_db_error_raise() function can raise exceptions with custom messages, which DataBlade modules and user-defined routines can store in the syserrors system catalog table. The syserrors table maps these messages to five-character SQLSTATE values. In syserrors, you can associate a locale with the text of a custom message.

For general information on how to specify a literal message in mi_db_error_raise() and how to specify a custom message for mi_db_error_raise(), see the chapter on how to handle exceptions and events in the DataBlade API Programmer's Manual.

This section discusses the following tasks about how to raise locale-specific exception messages:

How to add a locale-specific exception message to the syserrors system catalog table

How the choice of locale in a custom message affects the way that mi_db_error_raise() searches for a custom message

How to specify parameter markers that contain non-ASCII characters

Inserting Custom Exception Messages

You can store custom status codes and their associated messages in the syserrors system catalog table. To create a custom exception message, insert a row directly in the syserrors table. The syserrors table provides the following columns for an internationalized exception message.


Column Name	Description
sqlstate	The `SQLSTATE` value that is associated with the exception You can use the following query to determine the current list of `SQLSTATE` message strings in syserrors: `SELECT sqlstate, locale, message FROM syserrors ORDER BY sqlstate, locale` For more information on how to determine `SQLSTATE` values, see the DataBlade API Programmer's Manual.
message	The text of the exception message, with characters in the code set of the target locale By convention, do not include any newline characters in the message.
locale	The locale with which the exception message is to be used The locale column identifies the language and code set used for the internationalization of error and warning messages. This name is the name of the target locale of the message text.

Tip: For more information on the columns of the syserrors system catalog table, see the chapter on the system catalog tables in the "Informix Guide to SQL: Reference."

Do not allow any code-set conversion to take place when you insert the message text in syserrors. If the code sets of the client and database locales differ, temporarily set both the CLIENT_LOCALE and DB_LOCALE environment variables in the client environment to the name of the database locale. This workaround prevents the client application from performing code-set conversion.

If you specify any parameters in the message text, include only ASCII characters in the parameters names. Following this convention means that the parameter name can be the same for all locales. Most code sets include the ASCII characters.

For example, the following INSERT statements insert new messages in syserrors whose SQLSTATE value is "03I01":

INSERT INTO syserrors 

VALUES ("03I01", "en_us.8859-1", 0, 1, 

	"Operation Interrupted.")

INSERT INTO syserrors 

VALUES ("03I01", "fr_ca.8859-1", 0, 1, 

	"Traitement Interrompu.")

The '03I01' SQLSTATE value now has two locale-specific messages. The database server chooses the appropriate message based on the server-processing locale of the UDR when it executes. For more information on how mi_db_error_raise() locates an exception message, see Searching for Custom Messages.

For a complete description of how to add custom messages to the syserrors system catalog table, see the DataBlade API Programmer's Manual.

Searching for Custom Messages

When the mi_db_error_raise() function initiates a search of the syserrors system catalog table, it requests the message in which all components of the locale (language, territory, code set, and optional modifier) are the same in the current processing locale and the locale column of syserrors.

For C UDRs that use the default locale, the current processing locale is U.S. English (en_us). When the current processing locale is U.S. English, mi_db_error_raise() looks only for messages that use the U.S. English locale. However, for C UDRs that use nondefault locales, the current processing locale is the server-processing locale.

For a description of how mi_db_error_raise() searches for messages in the syserrors system catalog table, see the chapter on exceptions in the DataBlade API Programmer's Manual.

Specifying Parameter Markers

The custom message in the syserrors system catalog table can contain parameter markers. These parameter markers are sequences of characters enclosed by a single percent sign on each end (for example, %TOKEN%). A parameter marker is treated as a variable for which the mi_db_error_raise() function can supply a value. The mi_db_error_raise() function assumes that any message text or message parameter strings that you supply are in the server-processing locale.

For a complete description of how to specify parameter markers for a custom message, see the DataBlade API Programmer's Manual.

Internationalized Tracing Messages

The DataBlade API supports trace messages that correspond to a particular locale. The current database locale determines which code set the trace message uses. Based on the current database locale, a given tracepoint can produce an internationalized trace message. Internationalized tracing enables you to develop and test the same code in many different locales.

To provide internationalized tracing support, the DataBlade API provides the following capabilities:

The systracemsgs system catalog table stores internationalized trace messages.

Two internationalized trace functions, gl_dprintf() and gl_tprintf(), format internationalized trace messages.

Inserting Messages in the systracemsgs System Catalog Table

The systracemsgs system catalog table stores internationalized trace messages that you can use to debug your C UDRs. To create an internationalized trace message, insert a row directly into the systracemsgs table. The systracemsgs table provides the following information about an internationalized trace message.


Column Name	Description
name	The name of the trace message
locale	The locale with which the trace message is to be used
message	The text of the trace message

The combination of message name and locale must be unique within the table. Once you insert a new trace class into systracemsgs, the database server assigns it a unique identifier, called a trace-message identifier. It stores the trace-class identifier in the msgid column of systracemsgs. Once a trace message exists in the systracemsgs table, you can specify the message either by name or by trace-message identifier to DataBlade API tracing functions.

The trace-message text can be a string of text in the appropriate language and code set for the locale, and it can contain tokens to indicate where to substitute a piece of text. Token names are set off by a single percent (%) symbol on each end.

The following INSERT statement puts a new message called qp1_exit in the systracemsgs table:

INSERT INTO informix.systracemsgs(name, locale, message)
VALUES ('qp1_exit', 'en_us.8859-1',
	'Exiting msg number was %ident%; the input is still %i%')

This message text is in English and therefore the systracemsgs row specifies the default locale of U.S. English.

This second message is the French version of the qp1_exit message and therefore the systracemsgs row specifies the French locale on a UNIX system (fr_fr.8859-1):

INSERT INTO informix.systracemsgs(name, locale, message)
VALUES ('qp1_exit', 'fr_fr.8859-1',
	'Le numéro de message en sortie était %ident%; \
	 l'entrée est toujours %i%')

Enter message text in the language of the server locale, with any characters available in the server code set. To insert a variable, enclose the variable name with a a single percent sign on each end (for example, %a%). When the database server prepares the trace message for output, it replaces each variable with its actual value.

Putting Internationalized Trace Messages into Code

The DataBlade API provides the following tracing functions to insert internationalized tracepoints into UDR code:

The GL_DPRINTF macro formats an internationalized trace message and specifies the threshold for the tracepoint.

The syntax for GL_DPRINTF is as follows: GL_DPRINTF(trace_class, threshold, (message_name [,toktype, val]...,MI_LIST_END));

The gl_tprintf() function formats an internationalized trace message but does not specify a tracepoint threshold.

The gl_tprintf() function is for use within a trace block, which uses the tf() function to compare a specified threshold with the current trace level. The syntax for gl_tprintf() is as follows:

gl_tprintf(message_name [,toktype ,val]..., MI_LIST_END);

Syntax elements for both GL_DPRINTF and gl_tprintf() have the following values:


trace_class	is either a trace-class name or the trace-class identifier integer value expressed as a character string.
threshold	is a nonnegative integer that sets the tracepoint threshold for execution.
message_name	is the identifier for an internationalized message stored in the systracemsgs system catalog table of the database.
toktype	is a string made up of a token name followed by a single percent (%) symbol followed by a single character output specifier as used in printf formats.
val	is a value expression to be output that must match the type of the output specifier in the preceding token.
`MI_LIST_END`	is a macro constant that ends the variable-length list.

Important: The MI_LIST_END constant marks the end of the variable-length list. If you do not include MI_LIST_END, the user-defined routine might fail.

The following example shows an internationalized trace statement that uses the GL_DPRINTF macro:

i = 6;
/* If the current trace level of the funcEntry class is
  * greater than or equal to 20, find the version of the
  * qp1_entry message whose locale matches the current database
  * locale
  */
GL_DPRINTF("funcEntry", 20, 
			("qp1_entry",
			"ident%s", "one",
			"i%d", i,
			MI_LIST_END));

If the current locale is the default locale of U.S. English and the current trace level of the funcEntry class is greater than or equal to 20, this tracepoint generates the following trace message:

13:21:51    Exiting msg number was one; the input is still 6

The following example shows an internationalized trace block that uses the gl_tprinf() function:

i = 6;
/* Compare current trace level of "funcEnd" class and
  * with a tracepoint threshold of 25. Continue execution of
  * trace block if:
  *            trace level >= 25 
  */
if ( tf("funcEnd", 25) )
	{
		i = doSomething();

	/* Generate an internationalized trace message (based 
	  * on current database locale) */
		gl_tprintf("qp1_exit", "ident%s", "deux", "i%d", i,
		MI_LIST_END);
	}

If the current locale is French and the current trace level of the funcEntry class is greater than or equal to 25, this tracepoint generates the following trace message:

13:21:53 Le numéro de message en sortie était deux; l'entrée est toujours 6

The database server writes the trace messages in the trace-output file in the code set of the locale associated with the message. If the trace message originated from the systracemsgs system catalog table, its characters are in the code set of the locale specified in the locale column of its systracemsgs entry. The database server might have performed code-set conversion on these trace messages if the code set in the UDR source is different from (but compatible with) the code set of the server-processing locale.

Searching for Trace Messages

To write an internationalized trace message to your trace-output file, the database server must locate a row in the systracemsgs system catalog table whose locale column matches (or is compatible with) the server-processing locale for your UDR. Therefore, to see a particular trace message in the trace-output file, your locale environment variables (CLIENT_LOCALE, DB_LOCALE, and SERVER_LOCALE) must be set so that the database server generates a server-processing locale that matches an entry in the systracemsgs table.

The database server searches the systracemsgs table for an entry with the same name as the tracepoint and a locale in which all components of the locale (language, territory, and code set) are the same in the current processing locale and the locale column of systracemsgs. If only the language and territory match, the database server converts the code set. If no message has matching language and territory, it uses the first available message with the correct language. If there is no message in the appropriate language, it uses the message for the default language, en_us.

Locale-Sensitive Data in an Opaque Data Type

When you create an opaque data type, you must write the support functions and SQL functions of the opaque type so that they handle locale-sensitive data. An opaque data type is fully encapsulated; its internal structure is not known to the database server. Therefore, the database server cannot automatically perform the locale-specific tasks such as code-set conversion on character data or locale-specific formatting of date, numeric, or monetary data.

When you create an opaque data type, you must write the support functions of the opaque type so that they handle any locale-sensitive data. In particular, consider how to handle any locale-sensitive data when you write the following support functions:

The input and output support functions

The receive and send support functions

The DataBlade API and Informix GLS provide GLS support for opaque-type support functions written in C. The following sections summarize GLS considerations for these support functions. For general information on the support functions of an opaque data type, see Extending Informix Dynamic Server 2000.

Internationalized Input and Output Support Functions

The internal representation of an opaque data type is the C structure that stores the opaque-type information. Each opaque type also has a character-based format, known as its external representation. This external representation is received by the database server as an LVARCHAR value. The LVARCHAR data type can hold single-byte (ASCII and non-ASCII) and multibyte character data, depending on the locale of the client application.

Client applications perform code-set conversion on LVARCHAR data. However, the ability to transfer the data between a client application and database server is not sufficient to support locale-sensitive data in opaque data types. It does not ensure that the data is correctly manipulated at its destination. The input and output support functions convert the opaque data type from its internal to an external representation, and vice versa, as follows:

The input function converts the external representation of the data type to the internal representation.

The output function converts the internal representation of the data type to the external representation.

When you write these opaque-type support functions as C UDRs, you must ensure that these functions correctly handle any locale-sensitive data, including the following tasks.


Locale-Sensitive Task	For More Information
Any code-set conversion on character data	Code-Set Conversion and the DataBlade API
Any handling of multibyte or wide characters in character data	The Informix GLS Library
Any formatting of locale-specific date, numeric, or monetary data	Locale-Specific Data Formatting

Internationalized Send and Receive Support Functions

The send and receive functions support binary transfer of opaque data types.That is, they convert the opaque data type from its internal representation on the client computer to its internal representation on the server computer (where it is stored), as follows:

The receive function converts the internal representation of the data type on the client computer to its internal representation on the server computer.

The send function converts the internal representation of the data type on the client computer to its internal representation on the server computer.

If the internal representation of an opaque type contains character data, the client application cannot perform any locale-specific translations, including the following ones.


Locale-Sensitive Task	For More Information
Any code-set conversion on character data	Character Strings in Opaque-Type Support Functions
Any handling of multibyte or wide characters in character data	The Informix GLS Library

Therefore, when you write the receive and send support functions as C UDRs, you must ensure that these functions handle these locale-sensitive tasks correctly.

Informix Guide to GLS FunctionalityDatabase Server Features