IBM i Global

IBM i Global

Connect, learn, share, and engage with IBM Power.

 View Only
Expand all | Collapse all

ICU (QICU library) on the system is too obsolete despite being a public user facing API

  • 1.  ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Mon May 20, 2024 05:46 AM

    Hello...

    needing to use a library for a project to properly handle some tasks involving unicode string processing for a B2B system, I wanted to leverage the API enlisted here from RPG

    https://www.ibm.com/docs/en/i/7.5?topic=interfaces-api-finder

    as "International Components for Unicode APIs"

    ICU is a library present in a lot of systems to properly handle unicode processing (i.e. normale decomposition etc.etc.).

    But, to my surprise, on our up to date V7R4, the SRVPGMs present in the QICU library are more than 10 years old (!) , the last one I can see is one implementing the ICU 4.0 version .

    Despite being a public API of the system, it is lacking 15 years of quality features and improvement, as you can imagine, also security problems occurred in time .... the ICU4C code had also nasty (overruns, overflows... ) and public security issues (CVE) as can be seen here https://www.cvedetails.com/vulnerability-list/vendor_id-17477/Icu-project.html

    Despite I insist that a public API, publicly exposed in the documentation, should be kept up to date by the vendor with PTFs, one can as a last resort in theory compile the library himself, using the pointers here

    https://unicode-org.github.io/icu/userguide/icu4c/build.html#how-to-build-and-install-on-the-ibm-i-family-ibm-i-i5os-os400

    BUT

    the tools used in such instructions refer to the IFS folder (probably containing the icc compiler...) called 

    /QIBM/ProdData/DeveloperTools/

    that are now obsolete, and replaced by a product than itself is already obsolete.

    How to properly obtain or compile an up to date ICU on IBMi (a task that maybe in other systems and OSes would have taken 5 minutes...) using a supported workflow and resolve this Kafkaesque situation?

    As a customer I've already 

    • contacted support (that cannot solve or update the libraries but at least contacted security team apparently)
    • contacted VAR on how to obtain the eventual compiler to build the public ICU project (no answer)
    • logged an "idea" on the "ideas" site (yes apparently for IBM security fixes of a public API are also an "idea", maybe in their ideal world ; ) )

    For a OS involved geared mainly in business processing, B2C/B2B, EDI, etc. is astonishing not having a proper library for unicode tasks.



    ------------------------------
    --ft
    ------------------------------


  • 2.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue May 21, 2024 05:17 AM
    Edited by Hideyuki Yahagi Tue May 21, 2024 05:18 AM

    > But, to my surprise, on our up to date V7R4, the SRVPGMs present in the QICU library are more than 10 years old (!) 

    The real reason is unknown, but it may be that the "IBM Tools for Developers for i5/OS" (5799-PTL) on which the build is based was not supported.

    This PRPQ is already withdrawn.

    https://www.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_rp/7/ENUSP84487/index.html&request_locale=en

    > For a OS involved geared mainly in business processing, B2C/B2B, EDI, etc. is astonishing not having a proper library for unicode tasks.

    Aside from security issues, the ICU project is also considering discontinuing support for the EBCDIC platform.

    https://unicode-org.atlassian.net/browse/ICU-21672

    Personally, I would like IBM to contribute to the ICU project in an organized way.



    ------------------------------
    Hideyuki Yahagi
    ------------------------------



  • 3.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue May 21, 2024 12:32 PM

    Yes, we are not speaking about a "minor" library here, it is a fundamental library that is the defacto standard to handle unicode tasks, integrated in many systems.

    IBM should respond to this, putting the customer in a position to use a publicly documented API in a full ILE environment (i.e. bindable SRVPGM) without too much hassle on par with other systems, it's not acceptable not keeping such library up to date and not providing an alternative solution.



    ------------------------------
    --ft
    ------------------------------



  • 4.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Fri February 21, 2025 05:42 AM

    I have still no answer regarding any intent or direction regarding the updating of QICU code on the IBMi OS.

    Would be cogent for IBM to have a sustainable path in integrating and updating the ICU components (QICU) along the normal OS lifecycle and PTFs following the ICU C project, to avoid the ever increasing technical debt that such ICU component is accumulating, still distributed with new OS frozen at 4.x apparently (15 years old?).

    ICU is essential for proper unicode processing and related facilities, and a public API of the system.

    Is not clear if the problem is with the C ILE compilers, unable to compile new versions of the ICU C project?



    ------------------------------
    --ft
    ------------------------------



  • 5.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Sat May 31, 2025 08:14 AM

    Hi,

    Despite the fact that it is outdated... did anyone get the ICU API's working (especially I'm looking to do transliteration) ?

    I can get the u_isdigit example working (the sample is not even correct) but that's about it at the moment.

    PS. ICU is a requirement for using regular expressions in the SQL functions so it is a bit strange that their version is outdated.



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 6.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 04:47 AM

    Hi,

    the QICU version on the system is indeed ancient, the fact the IBM doesn't give a circumstanced answer leads to speculation, and my speculation is that the ILE C/C++ compilers are ancient too and need to be revamped, so they are unable to get to a sustainable point. ICU is core part for correct unicode processing, should be PTF'ed by IBM among the other OS stuff.

    But, you can use it, you need to look into the ICU4C (C interface) documentation.

    I'm assuming you want to use it from RPG, here an example copied brutally from some of my utilities to get you started,

    assuming that you want to apply this transform  " NFD;[:Nonspacing Mark:]Remove;NFC " (this will remove accents etc. to a string), the core of the thing is

     h = utrans_openU(id : %LEN(id) :
                            dir : *NULL : -1 : *NULL : errorCode);

           utrans_transUChars(h :
                              text  : textLength : textCapacity :
                              start : limit : errorCode);

           dst = text;

           utrans_close(h);

    full src, YMMV, use at your discretion..etc.......


           CTL-OPT DFTACTGRP(*NO) ACTGRP(*CALLER) OPTION(*SRCSTMT);
           CTL-OPT BNDDIR('QICU/QXICUAPIBD');

           DCL-C UTRANS_FORWARD 0;
           DCL-C UTRANS_REVERSE 1;
           DCL-C U_ZERO_ERROR 0;
           DCL-C U_STRING_NOT_TERMINATED_WARNING -124;
           DCL-C U_BUFFER_OVERFLOW_ERROR           15;
           DCL-C U_ILLEGAL_ARGUMENT_ERROR           1;

           DCL-S UErrorCode INT(10) TEMPLATE;
           DCL-S UChar32 INT(10) TEMPLATE;
           DCL-S UChar UNS(5) TEMPLATE;
           DCL-S UTransliterator POINTER TEMPLATE;
           DCL-DS UParseError ALIGN(*FULL) TEMPLATE INZ;
             line INT(10);
             offset INT(10);
             preContext UCS2(16);
             postContext UCS2(16);
           END-DS;

           DCL-PR utrans_countAvailableIDs INT(10)
                  EXTPROC('utrans_countAvailableIDs_4_0');
           END-PR;

           DCL-PR utrans_openU POINTER EXTPROC('utrans_openU_4_0');
             id_ UCS2(100) CCSID(*UTF16) OPTIONS(*VARSIZE);
             idLen_ INT(10) VALUE;
             utransdir_ INT(10) VALUE;
             rules_ POINTER VALUE;
             rulesLen_ INT(10) VALUE;
             parseerror_ POINTER VALUE;
             uerrorcode_ INT(10);
           END-PR;

           DCL-PR utrans_transUChars EXTPROC('utrans_transUChars_4_0');
             trans POINTER VALUE;
             text UCS2(1000) CCSID(*UTF16) OPTIONS(*VARSIZE);
             textLen INT(10);
             textCapacity INT(10) VALUE;
             start INT(10) VALUE;
             limit INT(10);
             errorcode INT(10);
           END-PR;

           DCL-PR utrans_close EXTPROC('utrans_close_4_0');
             UTrasliterator_ POINTER VALUE;
           END-PR;

           //ICU-END

           DCL-C CAPACITY 4000;
           DCL-C VARLIMIT 1500;

           DCL-PI *N;
             src UCS2(VARLIMIT) CCSID(*UTF16) CONST;
             dst UCS2(VARLIMIT) CCSID(*UTF16);
             transformsIn UCS2(500) CCSID(*UTF16) CONST OPTIONS(*OMIT : *NOPASS);
           END-PI;

           DCL-S nIDs INT(10);
           DCL-S h POINTER;

           DCL-S id UCS2(100) CCSID(*UTF16);
           DCL-S rules UCS2(100) CCSID(*UTF16);
           DCL-S dir INT(10) INZ(UTRANS_FORWARD);
           DCL-S errorCode INT(10);

           DCL-S text UCS2(CAPACITY) CCSID(*UTF16);

           DCL-S textLength INT(10);
           DCL-S textCapacity INT(10);
           DCL-S limit INT(10);
           DCL-s start INT(10) INZ(0);

           DCL-DS parseError LIKEDS(UParseError) INZ;

           id = 'NFD;[:Nonspacing Mark:]Remove;NFC';
           IF %PASSED(transformsIn);
             id = transformsIn;
           ENDIF;

           textCapacity = CAPACITY;

           textLength = VARLIMIT;
           start = 0;
           limit = VARLIMIT;
           text = src;
           //errorCode = U_ZERO_ERROR;

           h = utrans_openU(id : %LEN(id) :
                            dir : *NULL : -1 : *NULL : errorCode);

           utrans_transUChars(h :
                              text  : textLength : textCapacity :
                              start : limit : errorCode);

           dst = text;

           utrans_close(h);

           *INLR = *ON;
           RETURN; 

     



    ------------------------------
    --ft
    ------------------------------



  • 7.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 04:58 AM

    Hi,

    Thanks for sharing your code... but I had it figured out in the mean while as well (however I'm going to compare a few things).

    BTW, do you get a value from utrans_countAvailableIDs... I always get zero ?

    Kind regards,
    Paul

    PS. I'm trying to push IBM via a case as well to bring this old version to their attention.



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 8.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 05:43 AM

    Hi, 

    yes, being an old version it lacks some transforms, but querying the count API it returns many... like "Any-Latin" to transliterate to latin...

    You should get something less than 200 

    In case share your code...



    ------------------------------
    --ft
    ------------------------------



  • 9.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 05:56 AM

    Hi,

    I probably have an issue with a prototype somewhere... I also change my utrans_open to utrans_openU and now the ones with [] work fine as well.

    My main goal is however to transliterate Unicode to a specific IBM i codepage (ex. 37, 500, ...)... so it should convert unknown characters to their equivalent, but leave the ones known as is (for example a simple é should not be stripped from its accent as the character exists in CCSID 37).  On the other hand for a š the accent should be stripped leaving a normal s as the other one doesn't exist in CCSID 37.

    I don't know if this is possible with the transliteration API.

    Kind regards,
    Paul



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 10.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 06:27 AM

    You can indeed get fancy with unicode, because each character "has properties" like having a database ("decompose and remove only the caron accents etc.") and fine tune...

    I'm assuming by what you say that you don't handle scripts beside "latin style" (i.e. in EU there are two countries using cyrillic or greek, without going too far east...).

    But to be practical and having some sort of solution really down to your desidered local CCSID, maybe try

    - have the source string mapped via the ICU transengine to a string D1 (a "clean" string)

    - have the the same source string mapped via RPG using straight assignment to D2 (marked with your specific CCSID).

    RPG should in theory do a decent job in preserving info in the conversion between unicode and ebcdic if "à" exists in the destination CCSID and put x'3F' (sub character) when it cannot.

    Then replace the x'3F' with the corresponding position in D1 (better than nothing and it preserve some info in it.....).

    my 2c...



    ------------------------------
    --ft
    ------------------------------



  • 11.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 08:56 AM

    Hi,

    It seems we're thinking alike as I had a similar idea, apart from the fact that I would do it one character at a time.

    With the whole string at once there's a risk that some characters like æ (single character) gets transliterated to ae (two characters) which would break the corresponding character logic.

    It would however be fine if the ICU library could do this by itself but I doubt it based on current info.
    Anyway, thanks for sharing your ideas.

    Kind regards,
    Paul



    ------------------------------
    Paul Nicolay
    ------------------------------



  • 12.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted Tue June 03, 2025 10:40 AM

    No prob... Sure with RPG *NATURAL one should be able to iterate per codepoint and work per codepoint. Leveraging both ICU and stock internal conversion tables between CCSID.

    Or you can fashion something using the - pretty rich - transform ICU language (i.e. with a negative filter to exclude "whitelisted" characters or certain classes).

    I doubt that ICU could resolve such specific problem in a stock ICU straight builtin id, because it would assume that ICU (that is used in almost all the platforms) knows an "agreed upon by all" mapping between unicode and whatever local single byte EBCDIC  (that are plenty). And additionally that is ok the mapping between à to à but š to s and not say "sh" (to emulate the sound. That is a local implementor decision).

    Additionally, some trans id documented on the ICU lib web doc site are unavailable in the stock IBMi QICU... too old , stuck at 4.x level I think!



    ------------------------------
    --ft
    ------------------------------



  • 13.  RE: ICU (QICU library) on the system is too obsolete despite being a public user facing API

    Posted 12 days ago

    Hi, did you try implicit code page conversion in rpg?

    Something like:
    **free                        
    dcl-s a char(48) ccsid(*utf8);
    dcl-s b char(48) ccsid(1140); 
    a = 'àáâä?æãå';               
    b = a;                        
    b = 'àáâä?æãå';               
    a = b;                        
    dsply a;                      
    dsply b;                      
    *inlr = *on;                  
    return;                       

    In debug you will see the converted data:
    EVAL a:x                                                             
       00000     C3A0C3A1 C3A2C3A4 3FC3A6C3 A3C3A520   - CµC~CsCu.CwCtCv.
       00010     20202020 20202020 20202020 20202020   - ................
       00020     20202020 20202020 20202020 20202020   - ................
    EVAL b:x                                                             
       00000     44454243 6F9C4647 40404040 40404040   - àáâä?æãå        
       00010     40404040 40404040 40404040 40404040   -                 
       00020     40404040 40404040 40404040 40404040   -                 



    ------------------------------
    Ruurd Noppen
    ------------------------------