IBM i Global

IBM i 

Connect, learn, share, and engage with IBM Power.


#Power
 View Only
Expand all | Collapse all

ICU (QICU library) on the system is too obsolete despite being a public user facing API

  • 1.  ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Mon May 20, 2024 05:46 AM

    Hello...

    needing to use a library for a project to properly handle some tasks involving unicode string processing for a B2B system, I wanted to leverage the API enlisted here from RPG

    https://www.ibm.com/docs/en/i/7.5?topic=interfaces-api-finder

    as "International Components for Unicode APIs"

    ICU is a library present in a lot of systems to properly handle unicode processing (i.e. normale decomposition etc.etc.).

    But, to my surprise, on our up to date V7R4, the SRVPGMs present in the QICU library are more than 10 years old (!) , the last one I can see is one implementing the ICU 4.0 version .

    Despite being a public API of the system, it is lacking 15 years of quality features and improvement, as you can imagine, also security problems occurred in time .... the ICU4C code had also nasty (overruns, overflows... ) and public security issues (CVE) as can be seen here https://www.cvedetails.com/vulnerability-list/vendor_id-17477/Icu-project.html

    Despite I insist that a public API, publicly exposed in the documentation, should be kept up to date by the vendor with PTFs, one can as a last resort in theory compile the library himself, using the pointers here

    https://unicode-org.github.io/icu/userguide/icu4c/build.html#how-to-build-and-install-on-the-ibm-i-family-ibm-i-i5os-os400

    BUT

    the tools used in such instructions refer to the IFS folder (probably containing the icc compiler...) called 

    /QIBM/ProdData/DeveloperTools/

    that are now obsolete, and replaced by a product than itself is already obsolete.

    How to properly obtain or compile an up to date ICU on IBMi (a task that maybe in other systems and OSes would have taken 5 minutes...) using a supported workflow and resolve this Kafkaesque situation?

    As a customer I've already 

    • contacted support (that cannot solve or update the libraries but at least contacted security team apparently)
    • contacted VAR on how to obtain the eventual compiler to build the public ICU project (no answer)
    • logged an "idea" on the "ideas" site (yes apparently for IBM security fixes of a public API are also an "idea", maybe in their ideal world ; ) )

    For a OS involved geared mainly in business processing, B2C/B2B, EDI, etc. is astonishing not having a proper library for unicode tasks.



    ------------------------------
    --ft
    ------------------------------


  • 2.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue May 21, 2024 05:17 AM
    Edited by Hideyuki Yahagi Tue May 21, 2024 05:18 AM

    > But, to my surprise, on our up to date V7R4, the SRVPGMs present in the QICU library are more than 10 years old (!) 

    The real reason is unknown, but it may be that the "IBM Tools for Developers for i5/OS" (5799-PTL) on which the build is based was not supported.

    This PRPQ is already withdrawn.

    https://www.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_rp/7/ENUSP84487/index.html&request_locale=en

    > For a OS involved geared mainly in business processing, B2C/B2B, EDI, etc. is astonishing not having a proper library for unicode tasks.

    Aside from security issues, the ICU project is also considering discontinuing support for the EBCDIC platform.

    https://unicode-org.atlassian.net/browse/ICU-21672

    Personally, I would like IBM to contribute to the ICU project in an organized way.



    ------------------------------
    Hideyuki Yahagi
    ------------------------------



  • 3.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue May 21, 2024 12:32 PM

    Yes, we are not speaking about a "minor" library here, it is a fundamental library that is the defacto standard to handle unicode tasks, integrated in many systems.

    IBM should respond to this, putting the customer in a position to use a publicly documented API in a full ILE environment (i.e. bindable SRVPGM) without too much hassle on par with other systems, it's not acceptable not keeping such library up to date and not providing an alternative solution.



    ------------------------------
    --ft
    ------------------------------



  • 4.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Fri February 21, 2025 05:42 AM

    I have still no answer regarding any intent or direction regarding the updating of QICU code on the IBMi OS.

    Would be cogent for IBM to have a sustainable path in integrating and updating the ICU components (QICU) along the normal OS lifecycle and PTFs following the ICU C project, to avoid the ever increasing technical debt that such ICU component is accumulating, still distributed with new OS frozen at 4.x apparently (15 years old?).

    ICU is essential for proper unicode processing and related facilities, and a public API of the system.

    Is not clear if the problem is with the C ILE compilers, unable to compile new versions of the ICU C project?



    ------------------------------
    --ft
    ------------------------------



  • 5.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Sat May 31, 2025 08:14 AM

    Hi,

    Despite the fact that it is outdated... did anyone get the ICU API's working (especially I'm looking to do transliteration) ?

    I can get the u_isdigit example working (the sample is not even correct) but that's about it at the moment.

    PS. ICU is a requirement for using regular expressions in the SQL functions so it is a bit strange that their version is outdated.



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 6.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 04:47 AM

    Hi,

    the QICU version on the system is indeed ancient, the fact the IBM doesn't give a circumstanced answer leads to speculation, and my speculation is that the ILE C/C++ compilers are ancient too and need to be revamped, so they are unable to get to a sustainable point. ICU is core part for correct unicode processing, should be PTF'ed by IBM among the other OS stuff.

    But, you can use it, you need to look into the ICU4C (C interface) documentation.

    I'm assuming you want to use it from RPG, here an example copied brutally from some of my utilities to get you started,

    assuming that you want to apply this transform  " NFD;[:Nonspacing Mark:]Remove;NFC " (this will remove accents etc. to a string), the core of the thing is

     h = utrans_openU(id : %LEN(id) :
                            dir : *NULL : -1 : *NULL : errorCode);

           utrans_transUChars(h :
                              text  : textLength : textCapacity :
                              start : limit : errorCode);

           dst = text;

           utrans_close(h);

    full src, YMMV, use at your discretion..etc.......


           CTL-OPT DFTACTGRP(*NO) ACTGRP(*CALLER) OPTION(*SRCSTMT);
           CTL-OPT BNDDIR('QICU/QXICUAPIBD');

           DCL-C UTRANS_FORWARD 0;
           DCL-C UTRANS_REVERSE 1;
           DCL-C U_ZERO_ERROR 0;
           DCL-C U_STRING_NOT_TERMINATED_WARNING -124;
           DCL-C U_BUFFER_OVERFLOW_ERROR           15;
           DCL-C U_ILLEGAL_ARGUMENT_ERROR           1;

           DCL-S UErrorCode INT(10) TEMPLATE;
           DCL-S UChar32 INT(10) TEMPLATE;
           DCL-S UChar UNS(5) TEMPLATE;
           DCL-S UTransliterator POINTER TEMPLATE;
           DCL-DS UParseError ALIGN(*FULL) TEMPLATE INZ;
             line INT(10);
             offset INT(10);
             preContext UCS2(16);
             postContext UCS2(16);
           END-DS;

           DCL-PR utrans_countAvailableIDs INT(10)
                  EXTPROC('utrans_countAvailableIDs_4_0');
           END-PR;

           DCL-PR utrans_openU POINTER EXTPROC('utrans_openU_4_0');
             id_ UCS2(100) CCSID(*UTF16) OPTIONS(*VARSIZE);
             idLen_ INT(10) VALUE;
             utransdir_ INT(10) VALUE;
             rules_ POINTER VALUE;
             rulesLen_ INT(10) VALUE;
             parseerror_ POINTER VALUE;
             uerrorcode_ INT(10);
           END-PR;

           DCL-PR utrans_transUChars EXTPROC('utrans_transUChars_4_0');
             trans POINTER VALUE;
             text UCS2(1000) CCSID(*UTF16) OPTIONS(*VARSIZE);
             textLen INT(10);
             textCapacity INT(10) VALUE;
             start INT(10) VALUE;
             limit INT(10);
             errorcode INT(10);
           END-PR;

           DCL-PR utrans_close EXTPROC('utrans_close_4_0');
             UTrasliterator_ POINTER VALUE;
           END-PR;

           //ICU-END

           DCL-C CAPACITY 4000;
           DCL-C VARLIMIT 1500;

           DCL-PI *N;
             src UCS2(VARLIMIT) CCSID(*UTF16) CONST;
             dst UCS2(VARLIMIT) CCSID(*UTF16);
             transformsIn UCS2(500) CCSID(*UTF16) CONST OPTIONS(*OMIT : *NOPASS);
           END-PI;

           DCL-S nIDs INT(10);
           DCL-S h POINTER;

           DCL-S id UCS2(100) CCSID(*UTF16);
           DCL-S rules UCS2(100) CCSID(*UTF16);
           DCL-S dir INT(10) INZ(UTRANS_FORWARD);
           DCL-S errorCode INT(10);

           DCL-S text UCS2(CAPACITY) CCSID(*UTF16);

           DCL-S textLength INT(10);
           DCL-S textCapacity INT(10);
           DCL-S limit INT(10);
           DCL-s start INT(10) INZ(0);

           DCL-DS parseError LIKEDS(UParseError) INZ;

           id = 'NFD;[:Nonspacing Mark:]Remove;NFC';
           IF %PASSED(transformsIn);
             id = transformsIn;
           ENDIF;

           textCapacity = CAPACITY;

           textLength = VARLIMIT;
           start = 0;
           limit = VARLIMIT;
           text = src;
           //errorCode = U_ZERO_ERROR;

           h = utrans_openU(id : %LEN(id) :
                            dir : *NULL : -1 : *NULL : errorCode);

           utrans_transUChars(h :
                              text  : textLength : textCapacity :
                              start : limit : errorCode);

           dst = text;

           utrans_close(h);

           *INLR = *ON;
           RETURN; 

     



    ------------------------------
    --ft
    ------------------------------



  • 7.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 04:58 AM

    Hi,

    Thanks for sharing your code... but I had it figured out in the mean while as well (however I'm going to compare a few things).

    BTW, do you get a value from utrans_countAvailableIDs... I always get zero ?

    Kind regards,
    Paul

    PS. I'm trying to push IBM via a case as well to bring this old version to their attention.



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 8.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 05:43 AM

    Hi, 

    yes, being an old version it lacks some transforms, but querying the count API it returns many... like "Any-Latin" to transliterate to latin...

    You should get something less than 200 

    In case share your code...



    ------------------------------
    --ft
    ------------------------------



  • 9.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 05:56 AM

    Hi,

    I probably have an issue with a prototype somewhere... I also change my utrans_open to utrans_openU and now the ones with [] work fine as well.

    My main goal is however to transliterate Unicode to a specific IBM i codepage (ex. 37, 500, ...)... so it should convert unknown characters to their equivalent, but leave the ones known as is (for example a simple é should not be stripped from its accent as the character exists in CCSID 37).  On the other hand for a š the accent should be stripped leaving a normal s as the other one doesn't exist in CCSID 37.

    I don't know if this is possible with the transliteration API.

    Kind regards,
    Paul



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 10.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 06:27 AM

    You can indeed get fancy with unicode, because each character "has properties" like having a database ("decompose and remove only the caron accents etc.") and fine tune...

    I'm assuming by what you say that you don't handle scripts beside "latin style" (i.e. in EU there are two countries using cyrillic or greek, without going too far east...).

    But to be practical and having some sort of solution really down to your desidered local CCSID, maybe try

    - have the source string mapped via the ICU transengine to a string D1 (a "clean" string)

    - have the the same source string mapped via RPG using straight assignment to D2 (marked with your specific CCSID).

    RPG should in theory do a decent job in preserving info in the conversion between unicode and ebcdic if "à" exists in the destination CCSID and put x'3F' (sub character) when it cannot.

    Then replace the x'3F' with the corresponding position in D1 (better than nothing and it preserve some info in it.....).

    my 2c...



    ------------------------------
    --ft
    ------------------------------



  • 11.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 08:56 AM

    Hi,

    It seems we're thinking alike as I had a similar idea, apart from the fact that I would do it one character at a time.

    With the whole string at once there's a risk that some characters like æ (single character) gets transliterated to ae (two characters) which would break the corresponding character logic.

    It would however be fine if the ICU library could do this by itself but I doubt it based on current info.
    Anyway, thanks for sharing your ideas.

    Kind regards,
    Paul



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 12.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 10:40 AM

    No prob... Sure with RPG *NATURAL one should be able to iterate per codepoint and work per codepoint. Leveraging both ICU and stock internal conversion tables between CCSID.

    Or you can fashion something using the - pretty rich - transform ICU language (i.e. with a negative filter to exclude "whitelisted" characters or certain classes).

    I doubt that ICU could resolve such specific problem in a stock ICU straight builtin id, because it would assume that ICU (that is used in almost all the platforms) knows an "agreed upon by all" mapping between unicode and whatever local single byte EBCDIC  (that are plenty). And additionally that is ok the mapping between à to à but š to s and not say "sh" (to emulate the sound. That is a local implementor decision).

    Additionally, some trans id documented on the ICU lib web doc site are unavailable in the stock IBMi QICU... too old , stuck at 4.x level I think!



    ------------------------------
    --ft
    ------------------------------



  • 13.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Wed July 23, 2025 09:09 AM

    Hi, did you try implicit code page conversion in rpg?

    Something like:
    **free                        
    dcl-s a char(48) ccsid(*utf8);
    dcl-s b char(48) ccsid(1140); 
    a = 'àáâä?æãå';               
    b = a;                        
    b = 'àáâä?æãå';               
    a = b;                        
    dsply a;                      
    dsply b;                      
    *inlr = *on;                  
    return;                       

    In debug you will see the converted data:
    EVAL a:x                                                             
       00000     C3A0C3A1 C3A2C3A4 3FC3A6C3 A3C3A520   - CµC~CsCu.CwCtCv.
       00010     20202020 20202020 20202020 20202020   - ................
       00020     20202020 20202020 20202020 20202020   - ................
    EVAL b:x                                                             
       00000     44454243 6F9C4647 40404040 40404040   - àáâä?æãå        
       00010     40404040 40404040 40404040 40404040   -                 
       00020     40404040 40404040 40404040 40404040   -                 



    ------------------------------
    Ruurd Noppen
    ------------------------------



  • 14.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted 26 days ago
    Edited by ac 26 days ago

    To those interested, I share (after more than a year of my support request enquiry) the response from IBM:

    "the IBM i option 39 International Components for Unicode (ICU) implementation is frozen at release 4.0.1, including on IBM i next. "

    So basically the IBM conclusion is: in native ILE, you are stuck at 4.0.1. And if you want to to compile ICU4C to ILE (to a SRVPGM) you cannot, because apparently the compilers are not available or recent enough to autonomously compile from the ICU4C project online.

    And regarding the CVEs aspect, they published this bulletin

    https://www.ibm.com/support/pages/node/7241126



    ------------------------------
    --ft
    ------------------------------



  • 15.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted 21 days ago
    Edited by Hideyuki Yahagi 21 days ago

    Hi,

    the IBM i option 39 International Components for Unicode (ICU) implementation is frozen at release 4.0.1, including on IBM i next.

    I believe accurate information exchange is essential for globalization and hybrid cloud. Personally, I feel IBM has abandoned this based on the points below.
     
    • Unupdated IBM i ICU

    IBM i License Program Option 39 (International Components for Unicode) is currently built using ICU4C version 4.0*1, released on January 15, 2009 (Ref:ICU - International Components for Unicode - ICU 4.0 Archive). At the time of writing this post, the latest version of ICU is 77.1, released on March 14, 2025 (https://github.com/unicode-org/icu/releases/tag/release-77-1). Approximately 16 years have passed between versions 4.0 and 77.1. In addition to the CVE (Security Bulletin: IBM i is affected by multiple vulnerabilities in International Components for Unicode (ICU) option 39 [CVE-2017-14952 CVE-2011-4599 CVE-2017-17484].) issue, the functional enhancements listed in the Downloading ICU ( https://unicode-org.github.io/icu/download/ ) table have not been implemented. Even if you absolutely need the latest ICU, updating the ICU yourself is practically impossible*2. 
    • Reduction in IBM-provided globalization information

    IBM's globalization site was closed around 2018 and can now be viewed via the Web Archive (https://web.archive.org/web/20160324160940/http:/www.ibm.com/software/globalization/topics/), but I am unaware of any IBM site that compiles the latest detailed specifications for CDRA. The wreckage remains at https://public.dhe.ibm.com/software/globalization/gcoc/attachments/ and https://ccsids.net/, among other places.
    • The inherent incompleteness of CDRA itself

    There are character sets defined as "Growing" in some CCSIDs, including Unicode. CDRA has the concept of a growing CCSID. This CCSID is one where the code page is not full and new characters are added over time as needed. In character encoding standards, which characters are included is the most fundamental and crucial information. I have never seen a standard where the character set is defined as "undefined." Even when exchanging UTF-8 data without conversion, if the other party uses the latest Unicode standard, it is possible that IBM i cannot process it correctly.
     
    Personally, I don't mind if EBCDIC CCSID remains fixed, but I would like Unicode to clearly indicate which level of Unicode support is provided by a given version of the OS or feature.
     
    *1 In previous releases (up to ICU 4.8), the first two version fields combined to indicate the ICU release. Starting with ICU 49, the first version number field contains the ICU release version number (e.g., 49) (Ref:ICU - International Components for Unicode - ICU 4.0 Archive ).
    *2 The ICU build uses "IBM tools for Developers for IBM i" (https://wiki.midrange.com/index.php/5799-PTL), which was withdrawn at the end of 2016. This is specified on the ICU site (https://unicode-org.github.io/icu/userguide/icu4c/build.html# how-to-build-and-install-on-the-ibm-i-family-ibm-i-i5os-os400), which is no longer practical.
    .
    ------------------------------
    Hideyuki Yahagi
    ------------------------------




  • 16.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted 21 days ago

    Hi,

    Issues are beginning to surface as some of our customers have been informed about the CVE and are now asking how we plan to respond.

    The reality is that IBM has no intention of addressing the vulnerability in ICU, a component that has been outdated for over 15 years. They've made it clear that no fix is planned.

    While we do make use of ICU, it's only indirectly-through SQL regular expression functions. As such, we are not responsible for implementing the suggested mitigations. Instead, IBM should address this within their own use of ICU, specifically in their SQL regex routines.

    Unfortunately, the communication from IBM has only led to more questions and confusion.

    It's time for IBM to provide clarity and take ownership of the issue.

    Kind regards,
    Paul



    ------------------------------
    Paul Nicolay
    ------------------------------