Home | Previous Page | Next Page   GLS Fundamentals > Understanding a GLS Locale >

Collation Order for Character Data

Collation is the process of sorting character strings according to some order. The database server or the client application can perform collation.

The collating order affects the following tasks in SQL SELECT statements:

For more information on how the database locale can affect the SELECT statement, see Collation Order in SELECT Statements.

Informix database servers support two collation methods:

Code-Set Order

Code-set order refers to the order of characters within a code set. The order of the code points in the code set determines the collating order. For example, in the ASCII code set, A=65 and B=66. The character A always sorts before B because a code point of 65 is less than one of 66. But because a=97 and M=77, the string abc sorts after Me, which is not always the preferred result.

The database server uses code-set order to sort columns of these data types:

All code sets that IBM Informix products support include the ASCII characters as the first 127 characters. Therefore, other characters in the code set have the code points 128 and greater. When the database server sorts values of these data types, it puts character strings that begin with ASCII characters before characters strings that begin with non-ASCII characters in the sorted results.

For an example of data sorted in code-set order, see Table 5.

Localized Order

Localized order refers to an order of the characters that relates to a natural language. The locale defines the order of the characters in the localized order. For example, even though the character À might have a code point of 133, the localized order could list this character after A and before B (A=65, À=133, B=66). In this case, the string ÀB sorts after AC but before BD.

The database server uses localized order to sort columns of these data types:

The localized order can include equivalent characters, those characters that the database server is to consider as equivalent when it collates them. For example, if the locale defines uppercase and lowercase versions of a character as equivalent in the localized order, then the strings Arizona, ARIZONA, and arizona are collated together, as if all three strings were the same string.

Tip:
The COLLATION category of the locale file specifies the localized order, if one exists. For more information, see The COLLATION Category.

A localized order can also specify a specific type of collation. For example, a telephone book might require the following sort order:

Mabin
McDonald
MacDonald
Madden

A dictionary, however, might use this collating order for the same names:

Mabin
Madden
MacDonald
McDonald

If the GLS locale defines a localized order, the database server sorts data from NCHAR and NVARCHAR columns in this localized order. For an example of data sorted in a localized order, see Table 6.

Dynamic Server

Dynamic Server supports the SET COLLATION statement, which can specify a localized collation different from the DB_LOCALE setting. The scope of the non-default collating order is the current session, but database objects that perform collation, such as indexes or triggers, use the collating order from the time of their creation when they sort NCHAR or NVARCHAR values.

End of Dynamic Server

Unicode Collation

The open-source International Components for Unicode (ICU) implementation of the Unicode code set (UTF-8) is available in GLS locales for many languages and territories. For example, the en_us.utf8 locale supports the Unicode code set. For more information about ICU, see the ICU website at http://oss.software.ibm.com/icu.

GLS locales that use the Unicode code set (UIF-8) support Unicode collation of NCHAR and NVARCHAR data by the ICU Unicode Collation Algorithm. For more information about this algorithm, see the Unicode website at http://www.unicode.org/unicode/reports/tr10.

Warning:
If Unicode produces index keys that are too long to fit in the default dbspace page size, use a larger, nondefault page size.

Collation Support

Collation by Informix database servers depends on the data type of the database column. The following table summarizes the collation rules.

Column Data Types Collating Order
CHAR, VARCHAR, TEXT Code-set order
LVARCHAR (IDS only) Code-set order
NCHAR, NVARCHAR Localized order

The difference in collation is the only distinction between the CHAR and NCHAR data types and between the VARCHAR and NVARCHAR data types. For more information about collation, see Using Character Data Types. If a locale does not define a localized order, the database server collates NCHAR and NVARCHAR data values in code-set order.

Important:
There is an exception to the general rule that CHAR, LVARCHAR and VCHAR values are always sorted in the code-set order. The MATCHES operator always uses the localized order, if one is defined, to evaluate range expressions for character values, regardless of the data type. See MATCHES Condition.
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]