INFORMIX
Informix Guide to GLS Functionality
Chapter 4: INFORMIX-Universal Server Features
Home Contents Index Master Index New Book

Introducing the Informix GLS API

Informix provides the Global Language Support Application-Programming Interface (GLS API) to enable you to develop internationalized applications with a C-language interface. The GLS API relies on GLS locales, which contain culture-specific information. The GLS API provides macros and functions to:

Compliance with Industry Standards

The GLS API for Universal Server, Version 9.1 was derived from the X/Open XPG4 specifications.

How to Use the GLS API

You can use the GLS API on either the client application or the database server. All functions access the current processing locale. The current processing locale, based on the GLS environment variables and data stored in the database, must be established on both the server computer and client computer before any locale-sensitive processing occurs. The database server establishes the current processing locale when a client application opens a database (or when a session corresponding to a client connection is established). A client application needs to set the processing locale explicitly with a function call.

Compiling and Linking the GLS API

To use the GLS API in your C-language program, you must include the following header file in your source file:

Both ESQL/C users and client-side users of the DataBlade API need to ensure that the correct processing locale is established by calling the ifx_gl_init() function prior to calling any of the ifx_gl_* functions.

To compile and link ESQL/C programs that use the GLS API, issue the following ESQL command:

To compile and link DataBlade module programs, use the following command:

Additionally, DataBlade module programmers need to distinguish whether a DataBlade module runs on the server computer or on a client computer when they compile user-defined routines. To make this distinction, use the compiler flag -DMI_SERVBUILD when you compile user-defined routines.

Tip: When you use the Datablade Developers Kit (DBDK) to compile user-defined routines, you do not have to explicitly specify the location of the header files.
To build a shared object that contains the C code for a user-defined routine, use the following example:

(For more information on how to write user-defined routines, see the DataBlade Developers Kit.)

The following directories must be available to use the GLS API:

    This directory includes the subdirectories for locale and code-set conversion files.

    This directory contains the static and shared GLS libraries.

    This directory contains two header files: gls.h and ifxgls.h

How to Internationalize Programs with the GLS API

The ultimate goal of internationalizing an application program is to create (or modify) an application so that the only change necessary to support a different language, territory, and code set is to point the application to the correct GLS locale.

This section discusses the areas you need to consider in your application to perform internationalization. These application areas are listed in order of importance. Each area references a macro or function of the GLS API and includes a brief description.The GLS API is documented on-line as HTML pages with Universal Server. For a list of the macros and functions of the GLS API, refer to the GLS API Programmer's Manual.

The following GLS API macros and functions are for multibyte character processing only. Wide-character macros and functions are listed in "Improving the Performance of Your Programs".

String Traversal

String traversal of a multibyte character string can be forward or backward. An example is provided for both directions.

Traversing Multibyte Character Strings Forward

Traversing Multibyte Character Strings Backward

The null terminator in a multibyte character string is assumed to occupy a single byte. To process a multibyte character, you cannot pass the entire character to a function. You must pass a pointer to the beginning of the character so that the called function can access the remaining bytes of the character.

The GLS API provides the following functions for multibyte string traversal and indexing:

    This function determines the number of bytes in the multibyte character, mb.

    This function returns a pointer to the next multibyte character after mb.

    This function returns a pointer to the multibyte character before mb, where mb0 is a pointer to the beginning of the multibyte string.

String Processing

String processing involves concatenating character strings, searching for characters in strings, copying strings (and portions of strings), and so on. The following GLS API functions provide multibyte string processing capabilities:

    This function appends a copy of the character string mbs2 to the end of the character string mbs1. If mbs1 and mbs2 overlap, the result of this function is undefined.

    This function locates the first occurrence of mb in the multibyte string mbs.

    This function copies the multibyte string mbs2 to the location pointed to by mbs1. If mbs1 and mbs2 overlap, the result of this function is undefined.

    This function returns the number of characters in the maximum initial substring of mbs1, which consists entirely of multibyte characters not in the string mbs2.

    This function returns the number of characters (not bytes) in the string mbs, not including any terminating null characters.

    This function searches for the first occurrence of the multibyte string mbs1 in the multibyte string mbs2.

    This function appends mbs2 to the end of mbs1. No more than char_limit characters are read from mbs2 and written to mbs1. If mbs1 and mbs2 overlap, the result of this function is undefined.

    This function copies mbs2 to the location to which mbs1 points. No more than char_limit characters are read from mbs2 and written to mbs1. If mbs1 and mbs2 overlap, the result of this function is undefined.

    This function returns the number of characters in mbs, not including the trailing space characters. The characters that are not included in the count are the space characters defined in the current locale.

    This function returns the number of bytes in mbs, not including the trailing space characters. The characters that are not included in the count are the ASCII space characters and any multibyte characters that are equivalent to the ASCII space character.

    This function searches for the first occurrence in the multibyte string mbs1 of any multibyte character from the string mbs2.

    This function locates the last occurrence of mb in the multibyte string mbs.

    This function returns the number of characters in the maximum initial substring of mbs1, which consists entirely of multibyte characters in the string mbs2.

Memory Allocation

No GLS library function allocates memory that remains after the function returns. If a function allocates memory, this memory is allocated only temporarily and is freed before the function returns. Therefore, the caller of each function must allocate memory needed by the function. The following GLS API macro and function help you determine how much memory a multibyte character requires:

    This macro indicates the maximum number of bytes that any multibyte character in any locale can occupy. This macro is usually used to allocate space in static buffers that are intended to contain one or more multibyte characters.

    This function returns the maximum number of bytes that any multibyte character can occupy.

Code-Set Conversion

Because a character might be encoded differently on different operating systems, the appropriate communication layer must be prepared to convert between the two encodings. The GLS API provides the following functions to support code-set conversion:

    This function determines whether characters encoded in scrccodeset require conversion to dstcodeset by using ifx_gl_cv_mconv(). Use this function to determine if code-set conversion is needed. Comparing the names of the code sets does not provide enough information. For example, 8859-1, 819, and Latin-1 all refer to the same code set.

    This function converts the string of characters in *src to other characters but encoded in another code set as defined in the code-set conversion files, $INFORMIXDIR/gls/cv*. This function stores the result in the buffer to which *dst points.

    This function calculates either exactly the number of bytes that are required by a destination buffer of code-set-converted multibyte characters or a close over-approximation of the number. The argument srcbytes is the number of bytes in the buffer of multibyte characters to be code-set converted.

    If the code-set conversion, from srccodeset to dstcodeset, converts from a single-byte code set to another single-byte code set where there are no substitution conversions, this function sets array to an array of 256 unsigned characters that represent the conversion. If the code-set conversion is not of this form, then this function sets array to NULL.

Character Classification

The following GLS API functions test whether the multibyte character for the respective character classification follows the rules of the current locale:

    This function returns true if the character is either in the alpha or digit class.

    This function returns true if the character is an alphabetic character. All uppercase and lowercase characters are also in this class.

    This function returns true if the character is a horizontal space character. The single-byte space and tab characters plus any multibyte version of these characters are in this class.

    This function returns true if the character is a control character.

    This function returns true if the character is a digit. Only the 10 ASCII digits are in this class.

    This function returns true if the character is a graphical character. All characters which have a visual representation are in this class.

    This function returns true if the character is a lowercase alphabetic character.

    This function returns true if the character is a printable character.

    This function returns true if the character is a punctuation character. A single-byte ASCII punctuation characters plus any non-ASCII punctuation characters are in this class.

    This function returns true if the character is a horizontal or vertical space character. This class includes all characters from the blank class, single-byte and multibyte version of newline, vertical tab, form feed, and carriage return.

    This function returns true if the character is an uppercase alphabetic character. The ASCII characters A through Z and all other single and multibyte uppercase characters for Latin-based languages are in this class.

    This function returns true if the character is a digit character. Only the 10 ASCII digit characters, A through F, and a through f are in this class. Multibyte version or alternative representations (for example, Hindi or Kanji digits) are in the alpha class.

Case Conversion

All alphabetic case conversions must use the conversions that the locale specifies. The GLS API provides the following functions to support multibyte case conversion:

    This function calculates either exactly the number of bytes that are required by a buffer of case-equivalent multibyte characters or a close over-approximation of the number.

    These functions return the alphabetic case equivalent of the source character or return the source character if the source character does not have a case equivalent.

Character/String Comparison and Sorting

All sorting and comparison of characters and character strings must adhere to the comparison order that the locale specifies. The ifx_gl_mbscoll() function compares the multibyte character strings mbs1 and mbs2 according to the rules of the current locale.

Date/Time Conversion and Formatting Functions

The following GLS API functions provide conversion functions for an internal representation of date and time, as well as formatting functions for an external representation of date and time:

    This function converts the date string stored in datestr to an internal representation. The internal representation is stored in the mi_date structure to which date points.

    This function converts the datetime string stored in datetimestr to an internal representation. The internal representation is stored in the mi_datetime structure to which datetime points.

    This function uses the format specified by format to create a string from the mi_date structure to which date points. The resulting string is stored in the buffer to which datestr points.

    This function uses the format specified by format to create a string from the mi_datetime structure to which datetime points. The resulting string is stored in the buffer to which datetimestr points.

Numeric Conversion and Formatting

The following GLS API functions provide a conversion function for an internal representation of a number string and a formatting function for an external representation of a number string:

    This function converts the number string stored in decstr to an internal representation. The internal representation is stored in the mi_decimal structure to which money points in the format specified by format.

    This function uses the format specified by format to create a string from the value represented by the mi_number structure to which number points.The resulting string is stored in the buffer to which numstr points.

Money Conversion and Formatting

The following GLS API functions provide a conversion function for an internal representation of a money string and a formatting function for an external representation of a money string:

    This function converts the money string stored in monstr to an internal representation. The internal representation is stored in the mi_money structure to which money points.

    This function uses the format specified by format to create a string from the mi_money structure to which money points. The resulting string is stored in the buffer to which monstr points.

Stream Input and Output

Character data that contains Asian characters must be correctly processed in all graphical user interface (GUI) I/O, clipboard I/O, character terminal I/O, file I/O and network I/O. The following GLS API functions process input and output multibyte character streams:

    This function calls the user-defined function funcp to obtain bytes that are used to form one multibyte character. This multibyte character is then written to mb. The pointer v is passed to funcp each time that it is called.

    This function calls the user-defined function funcp with each byte of the multibyte character, mb. The pointer v is passed to funcp each time that it is called.

Initialization and Error Handling

The GLS API provides the following functions for initialization and error handling:

    ESQL/C programs need to call this function at the beginning of main() to ensure that the locale, based on the environment variables, has been established before calling any other ifx_gl_* functions. This requirement also applies to other client-side programs that use the ifx_gl_* functions. However, it is not necessary to call ifx_gl_init() from functions that run on the server.

    GLS library functions use this value, of type int, to provide more information about an error that has occurred. This value is set only if an error has occurred, unless it is documented otherwise.

Accessing Messages

Each string that is presented to a user (for example, an error message, informational message, menu item, or button label) should not appear as a literal in a program, but rather as a reference to a message file. In addition, each literal string a user might enter (for example, yes/no responses) should be a reference to a message file.

A literal string does not include program keywords such as SELECT, WHERE, IF, and WHILE.

Improving the Performance of Your Programs

The GLS API was created to help the performance of your programs by evaluating the requested GLS API function, determining if the requested function is appropriate for the data that you are trying to process, and then executing the appropriate code.

For example, if you use the ifx_gl_mbsnext() function to traverse data that is encoded in a single-byte code set, the function is reduced to a macro that advances the character byte by byte instead of executing code that parses multibyte sequences. Additionally, when collation is based on code-set order rather than locale-defined order, a binary compare (such as strcmp()) is used instead of executing algorithms that examine collation weights.

You can also choose how to structure your data to improve performance. Wide-character processing functions are typically faster than multibyte functions. However, wide characters take more space, and as a result, data is generally stored as multibyte strings. If you choose to use wide-character processing, you will have to convert the multibyte strings into wide-character strings, process the strings, and then convert them back to multibyte strings.

This technique is cost-effective if the data you are processing is traversed more than once. The following section describes the GLS API wide-character functions.

Wide-Character String Traversal
A program can traverse a wide-character string in the forward and backward direction. An example is provided for both methods.

Traversing Wide-Character Strings Forward

Traversing Wide-Character Strings Backward

You can compare or assign a single-byte ASCII character or character constant to a single wide character, as the following example shows:

Wide-Character String Processing
The following GLS API functions provide wide-character string processing capabilities:

    This function appends a copy of wcs2 to the end of wcs1. If wcs1 and wcs2 overlap, the result of this function is undefined.

    This function locates the first occurrence of wc in the wide character string wcs.

    This function copies the wide-character string wcs2 to the location pointed to by wcs1. If wcs1 and wcs2 overlap, the result of this function is undefined.

    This function returns the number of characters in the maximum initial substring of wcs1, which consists entirely of wide-characters not in the string wcs2.

    This function computes the number of wide-character codes in the wide-character string to which wcs points, not including the null-terminating wide-character code.

    This function appends wcs2 to the end of wcs1. No more than char_limit characters are read from wcs2 and written to wcs1. If wcs1 and wcs2 overlap, the result of this function is undefined.

    This function copies wcs2 to the location pointed to by wcs1. No more than char_limit characters are read from wcs2 and written to wcs1. If wcs1 and wcs2 overlap, the result of this function is undefined.

    This function returns the number of characters in wcs, not including the trailing space characters. The characters that are not included in the count are the space characters defined in the current locale.

    This function searches for the first occurrence in the wide-character string wcs1 of any wide character from the string wcs2.

    This function locates the last occurrence of wc in the wide-character string wcs.

    This function computes the number of characters in the maximum initial substring of wcs1, which consists entirely of wide characters from wcs2.

    This function searches for the first occurrence of the wide-character string wcs1 in the wide-character string wcs2.

Wide-Character Classification
The following GLS API functions test whether the wide character for the respective character classification follows the rules of the current locale:

    This function returns true if the character is in either the alpha or digit class.

    This function returns true if the character is an alphabetic character. All uppercase and lowercase characters are also in this class.

    This function returns true if the character is a horizontal space character. The single-byte space and tab characters plus any multibyte version of these characters are in this class.

    This function returns true if the character is a control character.

    This function returns true if the character is a digit. Only the 10 ASCII digits are in this class.

    This function returns true if the character is a graphical character. All characters that have a visual representation are in this class.

    This function returns true if the character is a lowercase alphabetic character.

    This function returns true if the character is a printable character.

    This function returns true if the character is a punctuation character. A single-byte ASCII punctuation characters plus any non-ASCII punctuation characters are in this class.

    This function returns true if the character is a horizontal or vertical space character. This class includes all characters from the blank class, single-byte and multibyte version of newline, vertical tab, form feed, and carriage return.

    This function returns true if the character is an upper-case alphabetic character. The ASCII characters A through Z and all other single and multibyte uppercase characters for Latin-based languages are in this class.

    This function returns true if the character is a digit character. Only the 10 ASCII digit characters, A through F, and a through f are in this class. Multibyte version or alternative representations (for example, Hindi or Kanji digits) are in the alpha class.

Wide-Character Case Conversion
All alphabetic case conversions must use the conversions specified in the locale. The following functions return the alphabetic case equivalent of the source character, or return the source character if it does not have a case equivalent:

    This function converts wide characters to lowercase.

    This function converts wide characters to uppercase.

Wide-Character/String Comparison and Sorting
All sorting and comparison of wide characters and wide-character strings must adhere to the comparison order specified in the locale. The ifx_gl_wcscoll() function compares wide-character strings wcs1 and wcs2 according to the rules of the current locale.




Informix Guide to GLS Functionality, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.