Informix

Informix

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only
  • 1.  Chinese characters

    Posted Thu March 06, 2025 10:11 AM

    I have been asked a question regarding inserting Chinese characters int a database table.   I have never done this before, so I wanted to ask if anyone has ever done this type of activity and, if so, what steps did you take to ensure that the procedure was done properly?   Please let me know if you have any processes to make this possible.



    ------------------------------
    Walter Lowich
    ------------------------------


  • 2.  RE: Chinese characters

    Posted Fri March 07, 2025 03:04 AM

    Hello Walter!

    If you need to store Chinese ideograms into your Informix DB, you probably need also to store non-ASCII characters from other languages.

    So I would use the UTF-8 encoding to support UNICODE characters.

    On client and server machines, you must set CLIENT_LOCALE, DB_LOCALE and SERVER_LOCALE env variables to a locale with UTF-8 encoding:

    CLIENT_LOCALE=en_us.utf8
    export CLIENT_LOCALE
    DB_LOCALE=$CLIENT_LOCALE
    export DB_LOCALE
    SERVER_LOCALE=$CLIENT_LOCALE
    export SERVER_LOCALE


    The locale of the database is defined by DB_LOCALE.
    This DB locale is frozen once you have created or imported your DB. 

    If your current DB is for ex using for ISO-8859-15 , you need to export your existing DB with CLIENT_LOCALE / DB_LOCALE set to your current encoding, then you can create/import the DB by setting CLIENT_LOCALE to en_us.8859-15, and DB_LOCALE to en_us.utf8 (Informix will do the conversions)

    Something you need to consider carefully is the size of CHAR / VARCHAR columns:

    UTF-8 is a multi-byte, variable character set (MBCS):
    - An ASCII-7 char takes 1 byte.
    - A Latin acute accent char like é takes 2 bytes
    - Chinese ideogram takes 3 bytes.
    Some can even be 4, 5, even 6 bytes.

    Consequently, a VARCHAR(10) can contain 10 ASCII-7 chars, 5 latin acute chars, and 3 Chinese ideograms.

    Informix does not have the concept of "char length semantics", like other DB engines do.

    When you define a CHAR(10) with Informix, the unit of the size is by default 10 bytes.

    There is this server config param SQL_LOGICAL_CHAR that can be used to apply a ratio when creating SQL tables/columns, but basically it's just a multiplier for the CHAR/VARCHAR size.

    When using SQL functions like LENGTH(), it returns a number of bytes, not a number of characters.

    So you have to deal with this.

    Regarding char string storage, I do not recommend to change the SQL_LOGICAL_CHAR parameter:

    Normally, fixed-size CHAR(N) columns are short and store ASCII-7 chars for codes, identifiers, such as "ABC-734"

    VARCHAR(N) columns are typically used to store variable-string data, such as customer names, addresses, etc.

    To me, existing sizes of CHAR/VARCHAR columns expressed in BYTES is sufficient, even to store Chinese ideograms:
    A Chinese ideogram usually corresponds to a word in our languages, so there will be less chars to express the same thing.

    Take care for ex with subscripts   colname[ start, end ]  : you need to specify BYTE positions, not CHAR positions!

    What programming language do you use for your applications?
    You have the same issue in the application code.
    Usually programming languages use CHAR length semantics.

    Seb



    ------------------------------
    Sebastien FLAESCH
    ------------------------------



  • 3.  RE: Chinese characters

    Posted Fri March 07, 2025 03:28 AM

    Hi Walter,

    Sebastian wrote this fine.
    But, there are NCHAR/NVARCHAR data types - those types are very similar to CHAR/VARCHAR except:

    The main difference between CHAR and NCHAR data types is the collating order.

    The collation order of the CHAR data type follows the code-set order, but the collating order of the NCHAR data type can be a localized order, if DB_LOCALE (or SET COLLATION) specifies a locale that defines a localized order for collation.
    see https://ibm.co/4i8wJ1Q
    HTH
    Hrvoje



    ------------------------------
    Hrvoje Zokovic
    Technical Support Director
    ALFATEC Group
    Zagreb
    ------------------------------



  • 4.  RE: Chinese characters

    Posted Fri March 07, 2025 04:09 AM

    Yep Hrvoje, good point!
    I have missed to mention NCHAR/NVARCHAR types.
    I am not much using these type.
    With other DB engines you can even use a different codeset for columns defined with these types - can be a mess IMO.
    Seb



    ------------------------------
    Sebastien FLAESCH
    ------------------------------