INFORMIX
Informix Guide to GLS Functionality
Chapter 1: GLS Fundamentals
Home Contents Index Master Index New Book

What Is the GLS Feature?

In a database application, most of the tasks that the database server and the client application perform do not depend on the type of data that they handle. However, some portion of the tasks that the database server and client application perform are dependent on the data. For example, the database server must sort U.S. English data differently from Korean character data. The client application must display French currency differently than English currency.

If the Informix database server or the SQL API product included the code to perform these data-dependent tasks, each would need to be written specially to handle a different set of culture-specific data. In fact, past Informix products have done that with the Asian Language Support (ALS) products for support of Asian (multibyte) data and the Native Language Support (NLS) feature for support of single-byte, non-English data.

With support for Global Language Support (GLS), Informix products no longer need to specify how to process this culture-specific information directly. Culture-specific information resides in a GLS locale. When an Informix product needs culture-specific information, it makes a call to the GLS library. The GLS library, in turn, accesses the GLS locale and returns the information to the Informix product. A GLS locale is a set of Informix files that bring together the information about data that is specific to a particular culture, language, or territory. In particular, a GLS locale provides the following information:

For more information on a GLS locale, see page 1-7.

In addition, the GLS feature is a more portable way to support culture-specific information. Many operating systems provide support for non-English data. However, this support is usually in a form that is specific to the operating system. Not many standards yet exist for the format of culture-specific information. This lack of conformity means that if you move an application from one operating-system environment to another, you might need to change the way in which the application requests language support from the operating system. You might even might find that the new operating-system environment does not provide the same aspect of language support that the initial environment provided.

Informix products support the GLS feature. Therefore, they can access culture-specific information regardless of the operating system under which they run. They can locate the locale information on any platform to which they are ported.

GLS Support by Informix Products

The Informix Version 9.1 release provides GLS support in the following types of products and utilities:

The following sections outline the features that GLS support provides for each of these types of Informix products and utilities. For information about the GLS API, see Chapter 4, "INFORMIX-Universal Server Features."

Informix Database Servers

With the GLS feature, INFORMIX-Universal Server, INFORMIX-OnLine Dynamic Server, and the INFORMIX-SE database server provide support for the following culture-specific features:

    You can use non-ASCII characters to name user-specifiable database objects, such as tables, columns, views, statements, cursors, and stored procedures, and you can use a collation order that suits the local customs.

    You can use non-ASCII characters in regular expression comparisons that involve NCHAR and NVARCHAR data.

    You can use end-user formats that are particular to a country or culture outside the U.S. to specify date, time, numeric, and monetary values when they appear in literal strings. The database server can translate these formats to the appropriate internal database format.

Tip: For more information on how the database server provides support for the GLS feature, refer to Chapter 3, "SQL Features," Chapter 4, "INFORMIX-Universal Server Features,"Chapter 5, "INFORMIX-OnLine Dynamic Server Features," and Chapter 6, "INFORMIX-SE Features."

Informix Client Applications

To the GLS feature, a client application can be either an Informix SQL API product, such as INFORMIX-ESQL/C or INFORMIX-ESQL/COBOL, or an Informix database server utility, such as DB-Access, dbexport, or onmode. The following Informix client applications provide support for the GLS feature:

Additional GLS Support

Informix products include the GLS locale files to support the default locale, U.S. English, and most non-Asian locales. (For more information on the default locale, see page 1-18.) If you do not find a locale to support your language and territory, you must install a Language Supplement for a particular language in addition to your Informix product.

A Language Supplement provides the locale files and error messages to support a particular language or set of languages. The International Language Supplement provides support for all non-Asian languages that Informix products support. You can obtain support for an Asian language (such as Korean, Japanese, or Chinese) with special add-on supplements.

For more information about the Language Supplements, contact your Informix sales representative. For more information about how to create customized message files, see "Locating Message Files".

A GLS Locale

For data to be useful, it must be available in the format that the end user of the database application understands. This format depends on factors such as the language that the user speaks, and the culture or territory in which the user resides.

The GLS locale is set of Informix files that bring together the information about data that is specific to a particular culture, language, or territory. In particular, a GLS locale provides the following information:

This section describes each of these topics in more detail.

Tip: For information about the format of these GLS files, see Appendix A.

A Code Set

A character set is one or more natural-language alphabets together with additional symbols for digits, punctuation, and diacritical marks. Each character set has at least one code set, which maps its characters to unique bit patterns. These bit patterns are called code points. ASCII, ISO8859-1, Microsoft 1252, and EBCDIC are examples of code sets that support the English language.

The number of unique characters in the language determines the amount of storage that each character requires in a code set. Because a single byte can store values in the range 0 to 255, it can uniquely identify 256 characters. Most Western languages have fewer than 256 characters and therefore have code sets made up of single-byte characters. When an application handles data in such code sets, it can assume that 1 byte stores 1 character.

The ASCII code set contains 128 characters. Therefore, the code point for each character requires 7 bits of a byte. These single-byte characters with code points in the range 0 to 128 are sometimes called ASCII or 7-bit characters. The ASCII code set is a single-byte code set and is a subset of all code sets that Informix products support.

If a code set contains more than 128 characters, some of its characters have code points that must set the eighth bit of the byte. These non-ASCII characters might be either of the following types of characters:

    If a character set contains more than 256 characters, the code set must contain multibyte characters. A multibyte character might require from 2 to 4 bytes of storage. Many Asian languages contain 3,000 to 8,000 ideographic characters. Such languages have code sets made up of both single-byte and multibyte characters. These code sets are called multibyte code sets. Some characters in the Japanese SJIS code set are multibyte characters of 2 or 3 bytes. Applications that handle data in multibyte code sets cannot assume that 1 character takes only 1 byte of storage.

Tip: In this manual, the term non-ASCII characters applies to all characters with a code point greater than 127. Non-ASCII characters include both 8-bit and multibyte characters.
Informix products can support single-byte or multibyte code sets. For some examples of GLS locales that support non-ASCII characters, see "Supporting Non-ASCII Characters".

Tip: Throughout this manual, examples show how single-byte and multibyte characters appear. Because multibyte characters are usually ideographic (such as Japanese or Chinese characters), this manual does not use the actual multibyte characters. Instead, it uses ASCII characters to represent both single-byte and multibyte characters. For more information about how this manual represents multibyte and single-byte characters abstractly, see "Character-Representation Conventions" of the Introduction.

The Collation Order

Collation involves the sorting of character data that is either stored in a database or manipulated in a client application. The collation order affects the following tasks when you select from the database with the SQL SELECT statement:

For more information on how choice of a locale affects the SELECT statement, see "Collation Order in SELECT Statements".

Informix database servers support the following two methods of collation of character data:

Code-Set Order
Code-set order refers to the bit-pattern order of characters within a code set. The order of the code points in the code set determines the sort order. For example, in the ASCII code set, A=65 and B=66. The character A always sorts before B because a code point of 65 is less than one of 66. However, because a=97 and M=77, the string abc sorts after Me, which is not always the preferred result.

The database server sorts data in CHAR, VARCHAR, and TEXT columns in code-set order. All code sets that Informix products support include the ASCII characters as the first 127 characters. Therefore, other characters in the code set have the code points 128 and greater. When the database server sorts data in a CHAR, VARCHAR, or TEXT column, it puts character strings that begin with ASCII characters before characters strings that begin with non-ASCII characters in the sorted results.

For an example of a data set in code-set order, see Figure 3-2.

Localized Order
Localized order refers to an order of the characters that relates to a real language. The locale defines the order of the characters in the localized order. For example, even though the character À might have a code point of 133, the localized order could list this character after A and before B (A=65, À=133, B=66). In this case, the string ÀB sorts after AC but before BD.

Tip: The COLLATION category of the locale file determines the localized order. For more information on the COLLATION category, see page A-6.
The localized order can include equivalent characters, those characters that the database server is to consider as equivalent when it collates them. For example, if the locale defines uppercase and lowercase versions of a character as equivalent characters in the localized order, the database server considers the strings Arizona, ARIZONA, and arizona as equivalent and collates them together.

A localized order can also specify a certain type of collation order. It can define a telephone-book sorting order or a dictionary sort order. For example, a telephone book might require the following sort order:

A dictionary, however, might require the following sort order for these same names:

If the GLS locale defines a localized order, the database server sorts data in NCHAR and NVARCHAR columns in this localized order. For an example of a data set in localized order, see Figure 3-3.

Collation Support
The collation order that Informix database servers use depends on the data type of the database column. The following table summarizes these collation orders.
Data Types Collation Order

CHAR, VARCHAR, TEXT

code-set order

NCHAR, NVARCHAR

localized order

The difference in collation order is the only distinction between the CHAR and NCHAR data types and the VARCHAR and NVARCHAR data types. For more information, see "Using Character Data Types".

NLS
Informix Native Language Support (NLS) database servers (before Version 7.2) use the same collation orders as Version 7.2 and later database servers: code-set order for CHAR and VARCHAR data and localized order for NCHAR and NVARCHAR data.

ALS
Informix Asian Language Support (ALS) database servers use code-set order for CHAR and VARCHAR data; they do not support NCHAR and NVARCHAR data.

If a locale does not define a localized order, the database server collates NCHAR and NVARCHAR data in code-set order.

End-User Formats

The end-user format is the format in which data appears within a client application when it is in literal strings or character variables. End-user formats are useful for data types whose format in the database is different from the format to which users are accustomed. In a database, the database server stores data for DATE, DATETIME, MONEY, and numeric data types in compact internal formats. For example, the database server stores a DATE value as an integer number of days since the date of December 31, 1899, so the date 03/19/96 is 35142. This internal format is not very intuitive.

Informix products support end-user formats so that a client application can use this more intuitive form instead of the internal format. Literal strings or character variables can appear in SQL statements as column values or as arguments of SQL API library functions.

An Informix product uses an end-user format when it encounters a string (a literal string or the value in a character variable) in the following contexts:

    For example, suppose DB-Access has the default locale (U.S. English) as its client locale. The literal date in the following INSERT statement uses the end-user format for dates that the default locale defines:

INSERT INTO mytab ( date1 ) VALUES ( '03/19/96' )

    For example, suppose an ESQL/C client application has a French locale as its client locale, and this locale defines a date end-user format that formats dates as dd/mm/yy. The following rdatestr() function uses the end-user format for dates to obtain the value in the datestr character variable:

err = rdatestr(jdate, datestr);

A GLS locale defines end-user formats for the following types of data:

    You can use an end-user format that is particular to a country or culture outside the U.S. to specify monetary values.

    You can specify date and time values in an end-user format that is particular to a country or culture outside the U.S.

The following sections describe each of these types of data in more detail.

Numeric and Monetary Formats
Numeric data is data from columns with the following data types: DECIMAL, INTEGER, SMALLINT, FLOAT, and SMALLFLOAT. Monetary data is data from a MONEY column. When an Informix product scans a string that contains monetary data, it uses the monetary end-user format to determine how to convert this string to the internal integer value for a MONEY column. When an Informix product prints a string that contains monetary data, it uses the monetary end-user format to determine how to format the internal integer value for a MONEY column as a string. In the same way, Informix products use the numeric end-user format to scan and print strings for the internal values of the numeric data types.

Important: The end-user formats of the numeric and monetary data do not affect the internal format of the numeric or MONEY data types in the database. They only affect how the client application views the data.
The end-user formats for numeric and monetary data specify the following characters and symbols:

    In the default locale, the period is the decimal separator (3.01); in a French or other European locale, the comma is the decimal separator (3,01).

    In the default locale, the comma is the thousands separator (3,255); in a French locale, the space is the thousands separator (3 255).

    For example, this information might specify that numbers always omit the separator at the millions position, which produces the following output: 1234,345.

In addition to this numeric notation, monetary data also uses a currency symbol to identify the currency unit. A locale can define this symbol to appear at the front ($100) or back (100FF) of the monetary value. In this manual, the combination of currency symbol, decimal separator, and thousands separator is called currency notation.

Tip: To customize the end-user format that the locale defines for monetary values, you can use the DBMONEY environment variable. For more information, see "Customizing Monetary Values".
The NUMERIC category of the locale file defines the end-user formats for numeric data. The MONETARY category of the locale file defines the end-user formats for monetary data. For more information on the NUMERIC and MONETARY categories, see page A-7.

Date and Time Formats
Date data is for DATE or DATETIME columns. Time data is for a DATETIME column. When an Informix product scans a string that contains time data, it uses the time end-user format to determine how to convert this string to the internal integer value for a DATETIME column. When an Informix product prints a string that contains time data, it uses the time end-user format to determine how to format the internal integer value for a DATETIME column as a string. In the same way, Informix products use the date end-user format to scan and print strings for the internal values of the date data types.

Important: The end-user formats of the date and time data do not affect the internal format of the DATE or DATETIME data types in the database. They only affect how the client application views the data.
The end-user formats for date and time involve characters and symbols that format date and time values. This information includes the names and abbreviations for days of the week and months of the year. It also includes the commonly used representations for dates, time (12-hour and 24-hour), and DATETIME values.

The end-user formats can include the names of eras (as in the Japanese Imperial date system) and non-Gregorian calendars (such as the Arabic lunar calendar). For example, the Taiwan culture uses the Ming Guo year format in addition to the Julian calendar year. Ming Guo 0001 is equivalent to January 1, 1912 on the Julian calendar. For dates before 1912, Ming Guo years are negative. The Ming Guo year 0000 is undefined; any attempt to use it generates an error.

The following table shows some era-based dates.
Julian Year Ming Guo Year Remarks

1993

82

1993 - 1911 = 82

1912

01

1912 - 1911 = 01

1911

-01

1911 - 1912 = -01

1910

-02

1910 - 1912 = -02

1900

-12

1900 - 1912 = -12

Japanese Imperial-era dates are tied to the reign of the Japanese emperors. The following table shows Julian and Japanese era dates. It shows the Japanese era format in full, with abstract multibyte characters for the Japanese characters, and in an abbreviated form that uses romanized characters (gengo). The abbreviated form of the era uses the first letter of the English name for the Japanese era. For example, H represents the Heisei era.

Julian Date Abstract Japanese Era (in full) Japanese Era (gengo)

1868/09/08

A1A2B1B201/09/08

M01/09/08

1912/07/30

A1A2B1B245/07/30

M45/07/30

1912/07/31

A1A2B1B201/07/31

T01/07/31

1926/12/25

A1A2B1B215/12/25

T15/12/25

1926/12/26

A1A2B1B201/12/26

S01/12/26

1989/01/07

A1A2B1B264/01/07

S64/01/07

1989/01/08

A1A2B1B201/01/08

H01/01/08

.
.
.

1995/01/01

A1A2B1B207/01/01

H07/01/01

Tip: In the preceding table, A1A2 and B1B2 indicate multibyte Japanese characters.
The TIME category of the locale file defines the end-user formats for date and time data. For more information on the TIME category, see page A-8.

Tip: To customize the end-user formats that the locale defines for date and time values, you can use the GL_DATE and GL_DATETIME environment variables. For more information, see "Customizing Date and Time End-User Formats".




Informix Guide to GLS Functionality, version 9.1
Copyright © 1998, Informix Software, Inc. All rights reserved.