Informix Guide to GLS Functionality

Informix Guide to GLS Functionality
Chapter 1: GLS Fundamentals

Home Contents Index Master Index New Book

What Is the GLS Feature?

In a database application, most of the tasks that the database server and the client application perform do not depend on the type of data that they handle. However, some portion of the tasks that the database server and client application perform are dependent on the data. For example, the database server must sort U.S. English data differently from Korean character data. The client application must display French currency differently than English currency.

If the Informix database server or the SQL API product included the code to perform these data-dependent tasks, each would need to be written specially to handle a different set of culture-specific data. In fact, past Informix products have done that with the Asian Language Support (ALS) products for support of Asian (multibyte) data and the Native Language Support (NLS) feature for support of single-byte, non-English data.

With support for Global Language Support (GLS), Informix products no longer need to specify how to process this culture-specific information directly. Culture-specific information resides in a GLS locale. When an Informix product needs culture-specific information, it makes a call to the GLS library. The GLS library, in turn, accesses the GLS locale and returns the information to the Informix product. A GLS locale is a set of Informix files that bring together the information about data that is specific to a particular culture, language, or territory. In particular, a GLS locale provides the following information:

The name of the code set that the application data uses
The collation order to use for character data
The format for different types of data to appear to end users

For more information on a GLS locale, see page 1-7.

In addition, the GLS feature is a more portable way to support culture-specific information. Many operating systems provide support for non-English data. However, this support is usually in a form that is specific to the operating system. Not many standards yet exist for the format of culture-specific information. This lack of conformity means that if you move an application from one operating-system environment to another, you might need to change the way in which the application requests language support from the operating system. You might even might find that the new operating-system environment does not provide the same aspect of language support that the initial environment provided.

Informix products support the GLS feature. Therefore, they can access culture-specific information regardless of the operating system under which they run. They can locate the locale information on any platform to which they are ported.

DB-Access User Manual

The SQL APIs allow host and indicator variable names as well as names of user-specifiable database objects such as tables, columns, views, statements, cursors, and stored procedures to include non-ASCII characters.

For more information, refer to

Chapter 7, "General SQL API Features."

INFORMIX-Universal Server provides a new application programming interface, GLS API, to help you internationalize your programs. For more information on the GLS API, see Chapter 4, "INFORMIX-Universal Server Features."
Database server utilities, such as dbexport or onmode, allow many command-line arguments to include non-ASCII characters.

For more information, refer to the appropriate database server chapter:

Chapter 4, "INFORMIX-Universal Server Features,"

Chapter 5, "INFORMIX-OnLine Dynamic Server Features,"

Chapter 6, "INFORMIX-SE Features."

Additional GLS Support

Informix products include the GLS locale files to support the default locale, U.S. English, and most non-Asian locales. (For more information on the default locale, see page 1-18.) If you do not find a locale to support your language and territory, you must install a Language Supplement for a particular language in addition to your Informix product.

A Language Supplement provides the locale files and error messages to support a particular language or set of languages. The International Language Supplement provides support for all non-Asian languages that Informix products support. You can obtain support for an Asian language (such as Korean, Japanese, or Chinese) with special add-on supplements.

For more information about the Language Supplements, contact your Informix sales representative. For more information about how to create customized message files, see "Locating Message Files".

A GLS Locale

For data to be useful, it must be available in the format that the end user of the database application understands. This format depends on factors such as the language that the user speaks, and the culture or territory in which the user resides.

The GLS locale is set of Informix files that bring together the information about data that is specific to a particular culture, language, or territory. In particular, a GLS locale provides the following information:

The name of the code set that the application data uses
The collation order to use for character data
The format for different types of data to appear to end users

This section describes each of these topics in more detail.

Tip: For information about the format of these GLS files, see Appendix A.

A Code Set

A character set is one or more natural-language alphabets together with additional symbols for digits, punctuation, and diacritical marks. Each character set has at least one code set, which maps its characters to unique bit patterns. These bit patterns are called code points. ASCII, ISO8859-1, Microsoft 1252, and EBCDIC are examples of code sets that support the English language.

The number of unique characters in the language determines the amount of storage that each character requires in a code set. Because a single byte can store values in the range 0 to 255, it can uniquely identify 256 characters. Most Western languages have fewer than 256 characters and therefore have code sets made up of single-byte characters. When an application handles data in such code sets, it can assume that 1 byte stores 1 character.

The ASCII code set contains 128 characters. Therefore, the code point for each character requires 7 bits of a byte. These single-byte characters with code points in the range 0 to 128 are sometimes called ASCII or 7-bit characters. The ASCII code set is a single-byte code set and is a subset of all code sets that Informix products support.

If a code set contains more than 128 characters, some of its characters have code points that must set the eighth bit of the byte. These non-ASCII characters might be either of the following types of characters:

8-bit characters

The 8-bit characters are single-byte characters whose code points are between 128 and 255. Examples from the ISO8859-1 or Microsoft 1252 code set include the non-English é, ñ, and ö characters. Only if the software is 8-bit clean can it interpret these characters correctly. For more information on 8-bit characters and 8-bit clean software, see the description of the GLS8BITFSYS environment variable on

page 2-23

Multibyte characters

SJIS

Tip: In this manual, the term non-ASCII characters applies to all characters with a code point greater than 127. Non-ASCII characters include both 8-bit and multibyte characters.

Informix products can support single-byte or multibyte code sets. For some examples of GLS locales that support non-ASCII characters, see "Supporting Non-ASCII Characters".

Tip: Throughout this manual, examples show how single-byte and multibyte characters appear. Because multibyte characters are usually ideographic (such as Japanese or Chinese characters), this manual does not use the actual multibyte characters. Instead, it uses ASCII characters to represent both single-byte and multibyte characters. For more information about how this manual represents multibyte and single-byte characters abstractly, see "Character-Representation Conventions" of the Introduction.

The Collation Order

Collation involves the sorting of character data that is either stored in a database or manipulated in a client application. The collation order affects the following tasks when you select from the database with the SQL SELECT statement:

Logical predicates in the WHERE clause SELECT * FROM tab1 WHERE col1 > 'bob'
SELECT * FROM tab1 WHERE site BETWEEN 'abc' AND 'xyz'
Sorted data that the ORDER BY clause creates SELECT * FROM tab1 ORDER BY col1
Comparisons in MATCHES and LIKE clauses SELECT * FROM tab1 WHERE col1 MATCHES 'a1*'
SELECT * FROM tab1 WHERE col1 LIKE 'dog'
SELECT * FROM tab1 WHERE col1 MATCHES 'abc[a-z]'

For more information on how choice of a locale affects the SELECT statement, see "Collation Order in SELECT Statements".

Informix database servers support the following two methods of collation of character data:

Code-set order
Localized order

Code-Set Order

Code-set order refers to the bit-pattern order of characters within a code set. The order of the code points in the code set determines the sort order. For example, in the ASCII code set, A=65 and B=66. The character A always sorts before B because a code point of 65 is less than one of 66. However, because a=97 and M=77, the string abc sorts after Me, which is not always the preferred result.

The database server sorts data in CHAR, VARCHAR, and TEXT columns in code-set order. All code sets that Informix products support include the ASCII characters as the first 127 characters. Therefore, other characters in the code set have the code points 128 and greater. When the database server sorts data in a CHAR, VARCHAR, or TEXT column, it puts character strings that begin with ASCII characters before characters strings that begin with non-ASCII characters in the sorted results.

For an example of a data set in code-set order, see Figure 3-2.

Localized Order

Localized order refers to an order of the characters that relates to a real language. The locale defines the order of the characters in the localized order. For example, even though the character À might have a code point of 133, the localized order could list this character after A and before B (A=65, À=133, B=66). In this case, the string ÀB sorts after AC but before BD.

Tip: The COLLATION category of the locale file determines the localized order. For more information on the COLLATION category, see page A-6.

The localized order can include equivalent characters, those characters that the database server is to consider as equivalent when it collates them. For example, if the locale defines uppercase and lowercase versions of a character as equivalent characters in the localized order, the database server considers the strings Arizona, ARIZONA, and arizona as equivalent and collates them together.

A localized order can also specify a certain type of collation order. It can define a telephone-book sorting order or a dictionary sort order. For example, a telephone book might require the following sort order:

Mabin

McDonald

MacDonald

Madden

A dictionary, however, might require the following sort order for these same names:

Mabin

Madden

MacDonald

McDonald

If the GLS locale defines a localized order, the database server sorts data in NCHAR and NVARCHAR columns in this localized order. For an example of a data set in localized order, see Figure 3-3.

Collation Support

The collation order that Informix database servers use depends on the data type of the database column. The following table summarizes these collation orders.


Data Types	Collation Order
`CHAR`, `VARCHAR, TEXT`	code-set order
`NCHAR`, `NVARCHAR`	localized order

The difference in collation order is the only distinction between the CHAR and NCHAR data types and the VARCHAR and NVARCHAR data types. For more information, see "Using Character Data Types".

NLS

Informix Native Language Support (NLS) database servers (before Version 7.2) use the same collation orders as Version 7.2 and later database servers: code-set order for CHAR and VARCHAR data and localized order for NCHAR and NVARCHAR data.

ALS

Informix Asian Language Support (ALS) database servers use code-set order for CHAR and VARCHAR data; they do not support NCHAR and NVARCHAR data.

If a locale does not define a localized order, the database server collates NCHAR and NVARCHAR data in code-set order.

End-User Formats

The end-user format is the format in which data appears within a client application when it is in literal strings or character variables. End-user formats are useful for data types whose format in the database is different from the format to which users are accustomed. In a database, the database server stores data for DATE, DATETIME, MONEY, and numeric data types in compact internal formats. For example, the database server stores a DATE value as an integer number of days since the date of December 31, 1899, so the date 03/19/96 is 35142. This internal format is not very intuitive.

Informix products support end-user formats so that a client application can use this more intuitive form instead of the internal format. Literal strings or character variables can appear in SQL statements as column values or as arguments of SQL API library functions.

An Informix product uses an end-user format when it encounters a string (a literal string or the value in a character variable) in the following contexts:

When an Informix product scans a string, it uses an end-user format to determine how to interpret the string so that it can convert it to a numeric value.

INSERT

INSERT INTO mytab ( date1 ) VALUES ( '03/19/96' )

When the database server receives the data from the client application, the database server uses the end-user format to interpret this literal date so that it can convert it to the appropriate internal format (35142).

When an Informix product prints a string, it uses an end-user format to determine how to format the numeric value as a string.

rdatestr()

datestr

err = rdatestr(jdate, datestr);

The rdatestr() function uses the end-user format to determine how to format the internal format (35142) as a date string before it puts the value in the datestr variable. For more information about the effect of the GLS feature on SQL API library functions, see

"Enhanced ESQL Library Functions"

A GLS locale defines end-user formats for the following types of data:

Representation of currency notation and numeric format

U.S

Representation of dates and times

U.S

The following sections describe each of these types of data in more detail.

Numeric and Monetary Formats

Numeric data is data from columns with the following data types: DECIMAL, INTEGER, SMALLINT, FLOAT, and SMALLFLOAT. Monetary data is data from a MONEY column. When an Informix product scans a string that contains monetary data, it uses the monetary end-user format to determine how to convert this string to the internal integer value for a MONEY column. When an Informix product prints a string that contains monetary data, it uses the monetary end-user format to determine how to format the internal integer value for a MONEY column as a string. In the same way, Informix products use the numeric end-user format to scan and print strings for the internal values of the numeric data types.

Important: The end-user formats of the numeric and monetary data do not affect the internal format of the numeric or MONEY data types in the database. They only affect how the client application views the data.

The end-user formats for numeric and monetary data specify the following characters and symbols:

The decimal-separator symbol, often called the radix character, that separates the integral part of the numeric value from the fractional part

In the default locale, the period is the decimal separator (3.01); in a French or other European locale, the comma is the decimal separator (3,01).

The thousands-separator symbol that appears between groups of digits in the integral part of the numeric value

In the default locale, the comma is the thousands separator (3,255); in a French locale, the space is the thousands separator (3 255).

The number of digits to group between each appearance of a non-monetary thousands separator

For example, this information might specify that numbers always omit the separator at the millions position, which produces the following output: 1234,345.

The characters that indicate positive and negative numbers

In addition to this numeric notation, monetary data also uses a currency symbol to identify the currency unit. A locale can define this symbol to appear at the front ($100) or back (100FF) of the monetary value. In this manual, the combination of currency symbol, decimal separator, and thousands separator is called currency notation.

Tip: To customize the end-user format that the locale defines for monetary values, you can use the DBMONEY environment variable. For more information, see "Customizing Monetary Values".

The NUMERIC category of the locale file defines the end-user formats for numeric data. The MONETARY category of the locale file defines the end-user formats for monetary data. For more information on the NUMERIC and MONETARY categories, see page A-7.

Date and Time Formats

Date data is for DATE or DATETIME columns. Time data is for a DATETIME column. When an Informix product scans a string that contains time data, it uses the time end-user format to determine how to convert this string to the internal integer value for a DATETIME column. When an Informix product prints a string that contains time data, it uses the time end-user format to determine how to format the internal integer value for a DATETIME column as a string. In the same way, Informix products use the date end-user format to scan and print strings for the internal values of the date data types.

Important: The end-user formats of the date and time data do not affect the internal format of the DATE or DATETIME data types in the database. They only affect how the client application views the data.

The end-user formats for date and time involve characters and symbols that format date and time values. This information includes the names and abbreviations for days of the week and months of the year. It also includes the commonly used representations for dates, time (12-hour and 24-hour), and DATETIME values.

The end-user formats can include the names of eras (as in the Japanese Imperial date system) and non-Gregorian calendars (such as the Arabic lunar calendar). For example, the Taiwan culture uses the Ming Guo year format in addition to the Julian calendar year. Ming Guo 0001 is equivalent to January 1, 1912 on the Julian calendar. For dates before 1912, Ming Guo years are negative. The Ming Guo year 0000 is undefined; any attempt to use it generates an error.

The following table shows some era-based dates.

Julian Year Ming Guo Year Remarks
1993
82
1993 - 1911 = 82

1912
01
1912 - 1911 = 01

1911
-01
1911 - 1912 = -01

1910
-02
1910 - 1912 = -02

1900
-12
1900 - 1912 = -12

Japanese Imperial-era dates are tied to the reign of the Japanese emperors. The following table shows Julian and Japanese era dates. It shows the Japanese era format in full, with abstract multibyte characters for the Japanese characters, and in an abbreviated form that uses romanized characters (gengo). The abbreviated form of the era uses the first letter of the English name for the Japanese era. For example, H represents the Heisei era.

Julian Date Abstract Japanese Era (in full) Japanese Era (gengo)
1868/09/08
A1A2B1B201/09/08
M01/09/08

1912/07/30
A1A2B1B245/07/30
M45/07/30

1912/07/31
A1A2B1B201/07/31
T01/07/31

1926/12/25
A1A2B1B215/12/25
T15/12/25

1926/12/26
A1A2B1B201/12/26
S01/12/26

1989/01/07
A1A2B1B264/01/07
S64/01/07

1989/01/08
A1A2B1B201/01/08
H01/01/08

.
.
.

1995/01/01
A1A2B1B207/01/01
H07/01/01

Tip: In the preceding table, A1A2 and B1B2 indicate multibyte Japanese characters.

The TIME category of the locale file defines the end-user formats for date and time data. For more information on the TIME category, see page A-8.

Tip: To customize the end-user formats that the locale defines for date and time values, you can use the GL_DATE and GL_DATETIME environment variables. For more information, see "Customizing Date and Time End-User Formats".

Informix Guide to GLS FunctionalityChapter 1: GLS Fundamentals Home Contents Index Master Index New Book

What Is the GLS Feature?

Code-Set Order

Informix Guide to GLS Functionality
Chapter 1: GLS Fundamentals

Home Contents Index Master Index New Book