Collation is the process of sorting character strings according to some order. The database server or the client application can perform collation.
The collating order affects the following tasks in SQL SELECT statements:
SELECT * FROM tab1 WHERE col1 > 'bob' SELECT * FROM tab1 WHERE site BETWEEN 'abc' AND 'xyz'
SELECT * FROM tab1 ORDER BY col1
SELECT * FROM tab1 WHERE col1 MATCHES 'a1*' SELECT * FROM tab1 WHERE col1 LIKE 'dog' SELECT * FROM tab1 WHERE col1 MATCHES 'abc[a-z]'
For more information on how the database locale can affect the SELECT statement, see Collation Order in SELECT Statements.
Informix database servers support two collation methods:
Code-set order refers to the order of characters within a code set. The order of the code points in the code set determines the collating order. For example, in the ASCII code set, A=65 and B=66. The character A always sorts before B because a code point of 65 is less than one of 66. But because a=97 and M=77, the string abc sorts after Me, which is not always the preferred result.
The database server uses code-set order to sort columns of these data types:
All code sets that IBM Informix products support include the ASCII characters as the first 127 characters. Therefore, other characters in the code set have the code points 128 and greater. When the database server sorts values of these data types, it puts character strings that begin with ASCII characters before characters strings that begin with non-ASCII characters in the sorted results.
For an example of data sorted in code-set order, see Table 5.
Localized order refers to an order of the characters that relates to a natural language. The locale defines the order of the characters in the localized order. For example, even though the character À might have a code point of 133, the localized order could list this character after A and before B (A=65, À=133, B=66). In this case, the string ÀB sorts after AC but before BD.
The database server uses localized order to sort columns of these data types:
The localized order can include equivalent characters, those characters that the database server is to consider as equivalent when it collates them. For example, if the locale defines uppercase and lowercase versions of a character as equivalent in the localized order, then the strings Arizona, ARIZONA, and arizona are collated together, as if all three strings were the same string.
A localized order can also specify a specific type of collation. For example, a telephone book might require the following sort order:
Mabin McDonald MacDonald Madden
A dictionary, however, might use this collating order for the same names:
Mabin Madden MacDonald McDonald
If the GLS locale defines a localized order, the database server sorts data from NCHAR and NVARCHAR columns in this localized order. For an example of data sorted in a localized order, see Table 6.
Dynamic Server supports the SET COLLATION statement, which can specify a localized collation different from the DB_LOCALE setting. The scope of the non-default collating order is the current session, but database objects that perform collation, such as indexes or triggers, use the collating order from the time of their creation when they sort NCHAR or NVARCHAR values.
The open-source International Components for Unicode (ICU) implementation of the Unicode code set (UTF-8) is available in GLS locales for many languages and territories. For example, the en_us.utf8 locale supports the Unicode code set. For more information about ICU, see the ICU website at http://oss.software.ibm.com/icu.
GLS locales that use the Unicode code set (UIF-8) support Unicode collation of NCHAR and NVARCHAR data by the ICU Unicode Collation Algorithm. For more information about this algorithm, see the Unicode website at http://www.unicode.org/unicode/reports/tr10.
Collation by Informix database servers depends on the data type of the database column. The following table summarizes the collation rules.
Column Data Types | Collating Order |
---|---|
CHAR, VARCHAR, TEXT | Code-set order |
LVARCHAR (IDS only) | Code-set order |
NCHAR, NVARCHAR | Localized order |
The difference in collation is the only distinction between the CHAR and NCHAR data types and between the VARCHAR and NVARCHAR data types. For more information about collation, see Using Character Data Types. If a locale does not define a localized order, the database server collates NCHAR and NVARCHAR data values in code-set order.