INFORMIX
INFORMIX-GLS Programmer's Guide
Chapter 2: Character Processing
Home Contents Index Master Index New Book

Other Operations

In addition to the operations on characters and strings, the INFORMIX-GLS library provides support for the following operations:

String and Character Termination

You can use the INFORMIX-GLS library with many different application programming interfaces (APIs), which might handle strings in different ways. To provide flexible support for APIs, the INFORMIX-GLS library allows you to indicate how to handle the following:

    Is the string argument a null-terminated string?

    Is the length of a multibyte character known?

Character-String Termination

The API that you use with the INFORMIX-GLS library might handle string termination in either of the following ways:

    The null character indicates the end of the string. Such strings are called null-terminated strings. The null terminator of a multibyte string consists of one byte whose value is zero. The null terminator of a wide-character string consists of one gl_wchar_t character whose value is zero.

    Character strings that are not null-terminated are called length-terminated strings. Length-terminated strings can contain null characters, but these null characters do not indicate the end of the string.

The INFORMIX-GLS functions that take a string argument allow you to pass this string as either a null-terminated string, or as a length-terminated string. To provide this flexibility, many INFORMIX-GLS functions that accept a multibyte or wide-character string expect the string to be represented with the following two arguments:

The value that you provide for the string length tells the INFORMIX-GLS function how to handle the associated string, as the following table shows.
String-Length Value Meaning

IFX_GL_NULL

The INFORMIX-GLS function assumes that the string is a null-terminated string.

>=0

The INFORMIX-GLS function assumes that this length indicates the number of bytes in the length-
terminated string.

<0,
!= IFX_GL_NULL

The INFORMIX-GLS function sets the error number to the IFX_GL_PARAMERR error.

Multibyte-Character Termination

Many GLS library functions operate on just one multibyte character. Each INFORMIX-GLS function that accepts a multibyte character expects the character to be represented by the following two arguments:

The value that you provide for the character length tells the INFORMIX-GLS function how to handle the associated character, as the following table shows.
String-Length Value Meaning

IFX_GL_NO_LIMIT

The INFORMIX-GLS function reads as many bytes as necessary from the multibyte character to form a complete character.

>=0

The INFORMIX-GLS function does not read more than this number of bytes from the multibyte character when it tries to form a complete character.

<0,
!= IFX_GL_NO_LIMIT

The INFORMIX-GLS function sets the error number to the IFX_GL_EINVAL error.

If the multibyte character is in a null-terminated multibyte string, the character length must be IFX_GL_NO_LIMIT. For example, if mbs points to a null-terminated string of multibyte characters, the following code fragment must specify IFX_GL_NO_LIMIT as the character length:

If a multibyte character, mb, is in a length-terminated multibyte string or is a character in a buffer by itself, the character length must be equal the number of bytes between where mb points and the end of the buffer that holds the string or character. For example, if mbs points to a length-terminated string of multibyte characters and mbs_bytes is the number of bytes in that string, the following call to ifx_gl_mblen() must specify the length of the multibyte string:

Similarly, if mb points to one multibyte character and mb_bytes is the number of bytes in the buffer that holds the character, the following call to ifx_gl_mblen() must specify the length of the multibyte character:

If the INFORMIX-GLS function cannot determine whether bytes in a buffer make up a valid multibyte character, it sets the error number to IFX_GL_EINVAL. Possible reasons for being unable to determine a valid multibyte character include:

Tip: Wide characters are fixed length. Therefore, INFORMIX-GLS functions that operate on wide characters do not require the character length.

Managing Memory for Strings and Characters

You must make buffers large enough to hold text in any of the languages that your application will handle. If your application will handle many languages, you must ensure that allocated buffers are large enough to hold translated versions of the text. If your application will handle Asian (multibyte) languages, you need to replace single-byte buffers with multibyte- or wide-character buffers.

Important: Any memory that INFORMIX-GLS functions allocate remains allocated only for the duration of the function. It does not remain after the function returns. Therefore, you must manage memory for multibyte-character and wide-character strings.

Multibyte-Character-String Allocation

Multibyte characters have varying lengths. When you represent a multibyte-character string in an array, the number of array elements does not equal the number of multibyte characters in the string. Therefore, you cannot use the same allocation method for multibyte strings as for single-byte strings.

Instead, you can use the following INFORMIX-GLS macro and functions to help you determine how much memory a multibyte character requires.
INFORMIX-GLS Function Purpose

IFX_GL_MB_MAX

Indicates the maximum number of bytes that any multibyte character in any locale can occupy

This macro is usually used to allocate space in static buffers that are intended to contain one or more multibyte characters.

ifx_gl_mb_loc_max()

Returns the maximum number of bytes that any character in the current locale can occupy

ifx_gl_cv_outbuflen(),

ifx_gl_case_conv_outbuflen()

Calculates one of the following values:

For example, the following declaration statically allocates 20 multibyte characters for the mbs string:

The following declarations dynamically allocate 20 multibyte characters for the mb1 and mb2 strings:

The declaration for mb1 uses the maximum multibyte-character size. The declaration for mb2 uses the ifx_gl_mb_loc_max() function to obtain a more precise estimate for the size of 20 multibyte characters. The ifx_gl_mb_loc_max() function returns the maximum size among all characters in the current processing locale.

If your multibyte-character string is null terminated, allocate one additional byte for the null terminator. The following declarations allocate three null-terminated multibyte-character strings:

Wide-Character String Allocation

When you represent a wide-character string in an array, the number of array elements does equal the number of wide characters in the string. Therefore, you can use the same allocation method for wide-character strings as for single-byte strings. For example, the following declaration statically allocates 20 wide characters for the wcs string:

The following declaration dynamically allocates 20 wide characters for the wc1 string:

If your wide-character string is null terminated, you must allocate one additional character for the null terminator. The null terminator requires the same space allocated as an entire wide-character. The following declaration allocates three null-terminated wide-character strings:

String Deallocation

The INFORMIX-GLS library does not automatically deallocate memory that you dynamically allocate. Once you no longer need the string buffer, you must ensure that you deallocate any memory that your application has dynamically allocated for multibyte-character and wide-character strings.

DB API
The DataBlade API does provide some automatic garbage collection for memory that you allocate dynamically. When this memory is deallocated depends on the memory duration with which it was allocated. However, it is good programming practice to handle memory deallocation implicitly whenever possible. For more information on memory management with the DataBlade API, see the "
DataBlade API Programmer's Manual."

Keeping Multibyte Strings Consistent

You must take special measures to perform the following operations on multibyte strings so that you do not split a multibyte character:

Truncating Multibyte Strings

Sometimes you need to truncate a long character string so that it fits into a smaller buffer. When you truncate a character string that contains just single-byte characters, you can truncate at an arbitrary byte location in the string. Because each character is one byte long, the truncated result still contains only complete characters.

However, to truncate a string that might contain even one multibyte character, you must take special measures. If you truncate at an arbitrary byte location in a multibyte-character string, you might truncate at a byte that is part of a multibyte character. In this case, the truncated string might end with only the first 1, 2 or 3 bytes of a multibyte character without the remaining bytes of the character. For such a string, subsequent traversal could result in an attempt to read beyond the end of the buffer.

Therefore, all INFORMIX-GLS functions that traverse one multibyte character or a length-terminated multibyte-character string set the error number to IFX_GL_EINVAL if they detect that an otherwise valid character has been truncated.

If you know that no truncation has occurred to the string, you can consider the IFX_GL_EINVAL error the same as IFX_GL_EILSEQ. However, if truncation might have occurred, IFX_GL_EINVAL indicates that you need to further truncate the string so that the last character in the string is complete. Depending on your application, you might do one of the following:

Important: Even though the INFORMIX-GLS library functions can detect invalid characters after truncation has occurred, it is much better to avoid the situation.

Fragmenting Multibyte Strings

Sometimes you need to fragment a long character string into two or more nonadjacent buffers to meet the memory-management requirements of their components. When you fragment a character string that contains just single-byte characters, you can fragment at an arbitrary byte location in the string. Because each character is one byte long, the fragmented results are still each a complete character string.

However, to fragment a string that might contain even one multibyte character, you must take special measures. If you fragment at an arbitrary byte location in a multibyte-character string, you might fragment at a byte that is part of a multibyte character. In this case, one fragment might end with the first 1, 2 or 3 bytes of a character, while the next fragment starts with the remaining byte or bytes.

If the only thing that you ever do with these fragments is to concatenate them back together to form one string, you do not need to perform any special processing. However, if you need to traverse the fragments as multibyte strings, these fragments might cause an attempt to read beyond the end of one fragment or an illegal character at the beginning of the next fragment.

Therefore, all INFORMIX-GLS functions that traverse one multibyte character or a length-terminated multibyte-character string set the error number to IFX_GL_EINVAL if they detect an otherwise valid character at the end of a fragment.

Important: The INFORMIX-GLS functions cannot detect that the beginning of a fragment contains the remaining bytes of the last character in some previous fragment because they cannot look at the previous fragment first. Therefore, they might interpret the last 1, 2 or 3 bytes of a multibyte character as a valid character.
If you know that no fragmentation has occurred on the string, you can consider the IFX_GL_EINVAL error the same as IFX_GL_EILSEQ. However, if fragmentation might have occurred, IFX_GL_EINVAL indicates that you need to fragment the string so that each fragment is a complete string. Depending upon your application, you might do one of the following:

Important: Even though the INFORMIX-GLS library functions can detect invalid characters after fragmentation has occurred, it is much better to avoid the situation.




INFORMIX-GLS Programmer's Guide, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.