Original Message:
Sent: Fri March 07, 2025 03:27 AM
From: Hrvoje Zokovic
Subject: Chinese characters
Hi Walter,
Sebastian wrote this fine.
But, there are NCHAR/NVARCHAR data types - those types are very similar to CHAR/VARCHAR except:
The main difference between CHAR and NCHAR data types is the collating order.
The collation order of the CHAR data type follows the code-set order, but the collating order of the NCHAR data type can be a localized order, if DB_LOCALE (or SET COLLATION) specifies a locale that defines a localized order for collation.
see https://ibm.co/4i8wJ1Q
HTH
Hrvoje
------------------------------
Hrvoje Zokovic
Technical Support Director
ALFATEC Group
Zagreb
Original Message:
Sent: Fri March 07, 2025 03:03 AM
From: Sebastien FLAESCH
Subject: Chinese characters
Hello Walter!
If you need to store Chinese ideograms into your Informix DB, you probably need also to store non-ASCII characters from other languages.
So I would use the UTF-8 encoding to support UNICODE characters.
On client and server machines, you must set CLIENT_LOCALE, DB_LOCALE and SERVER_LOCALE env variables to a locale with UTF-8 encoding:
CLIENT_LOCALE=en_us.utf8export CLIENT_LOCALEDB_LOCALE=$CLIENT_LOCALEexport DB_LOCALESERVER_LOCALE=$CLIENT_LOCALEexport SERVER_LOCALE
The locale of the database is defined by DB_LOCALE.
This DB locale is frozen once you have created or imported your DB.
If your current DB is for ex using for ISO-8859-15 , you need to export your existing DB with CLIENT_LOCALE / DB_LOCALE set to your current encoding, then you can create/import the DB by setting CLIENT_LOCALE to en_us.8859-15, and DB_LOCALE to en_us.utf8 (Informix will do the conversions)
Something you need to consider carefully is the size of CHAR / VARCHAR columns:
UTF-8 is a multi-byte, variable character set (MBCS):
- An ASCII-7 char takes 1 byte.
- A Latin acute accent char like é takes 2 bytes
- Chinese ideogram takes 3 bytes.
Some can even be 4, 5, even 6 bytes.
Consequently, a VARCHAR(10) can contain 10 ASCII-7 chars, 5 latin acute chars, and 3 Chinese ideograms.
Informix does not have the concept of "char length semantics", like other DB engines do.
When you define a CHAR(10) with Informix, the unit of the size is by default 10 bytes.
There is this server config param SQL_LOGICAL_CHAR that can be used to apply a ratio when creating SQL tables/columns, but basically it's just a multiplier for the CHAR/VARCHAR size.
When using SQL functions like LENGTH(), it returns a number of bytes, not a number of characters.
So you have to deal with this.
Regarding char string storage, I do not recommend to change the SQL_LOGICAL_CHAR parameter:
Normally, fixed-size CHAR(N) columns are short and store ASCII-7 chars for codes, identifiers, such as "ABC-734"
VARCHAR(N) columns are typically used to store variable-string data, such as customer names, addresses, etc.
To me, existing sizes of CHAR/VARCHAR columns expressed in BYTES is sufficient, even to store Chinese ideograms:
A Chinese ideogram usually corresponds to a word in our languages, so there will be less chars to express the same thing.
Take care for ex with subscripts colname[ start, end ] : you need to specify BYTE positions, not CHAR positions!
What programming language do you use for your applications?
You have the same issue in the application code.
Usually programming languages use CHAR length semantics.
Seb
------------------------------
Sebastien FLAESCH
Original Message:
Sent: Tue March 04, 2025 09:56 AM
From: Walter Lowich
Subject: Chinese characters
I have been asked a question regarding inserting Chinese characters int a database table. I have never done this before, so I wanted to ask if anyone has ever done this type of activity and, if so, what steps did you take to ensure that the procedure was done properly? Please let me know if you have any processes to make this possible.
------------------------------
Walter Lowich
------------------------------