No prob... Sure with RPG *NATURAL one should be able to iterate per codepoint and work per codepoint. Leveraging both ICU and stock internal conversion tables between CCSID.
Or you can fashion something using the - pretty rich - transform ICU language (i.e. with a negative filter to exclude "whitelisted" characters or certain classes).
I doubt that ICU could resolve such specific problem in a stock ICU straight builtin id, because it would assume that ICU (that is used in almost all the platforms) knows an "agreed upon by all" mapping between unicode and whatever local single byte EBCDIC (that are plenty). And additionally that is ok the mapping between à to à but š to s and not say "sh" (to emulate the sound. That is a local implementor decision).
Additionally, some trans id documented on the ICU lib web doc site are unavailable in the stock IBMi QICU... too old , stuck at 4.x level I think!
Original Message:
Sent: Tue June 03, 2025 08:56 AM
From: Paul Nicolay
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
Hi,
It seems we're thinking alike as I had a similar idea, apart from the fact that I would do it one character at a time.
With the whole string at once there's a risk that some characters like æ (single character) gets transliterated to ae (two characters) which would break the corresponding character logic.
It would however be fine if the ICU library could do this by itself but I doubt it based on current info.
Anyway, thanks for sharing your ideas.
Kind regards,
Paul
------------------------------
Paul Nicolay
Original Message:
Sent: Tue June 03, 2025 06:26 AM
From: ace ace
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
You can indeed get fancy with unicode, because each character "has properties" like having a database ("decompose and remove only the caron accents etc.") and fine tune...
I'm assuming by what you say that you don't handle scripts beside "latin style" (i.e. in EU there are two countries using cyrillic or greek, without going too far east...).
But to be practical and having some sort of solution really down to your desidered local CCSID, maybe try
- have the source string mapped via the ICU transengine to a string D1 (a "clean" string)
- have the the same source string mapped via RPG using straight assignment to D2 (marked with your specific CCSID).
RPG should in theory do a decent job in preserving info in the conversion between unicode and ebcdic if "à" exists in the destination CCSID and put x'3F' (sub character) when it cannot.
Then replace the x'3F' with the corresponding position in D1 (better than nothing and it preserve some info in it.....).
my 2c...
------------------------------
--ft
Original Message:
Sent: Tue June 03, 2025 05:56 AM
From: Paul Nicolay
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
Hi,
I probably have an issue with a prototype somewhere... I also change my utrans_open to utrans_openU and now the ones with [] work fine as well.
My main goal is however to transliterate Unicode to a specific IBM i codepage (ex. 37, 500, ...)... so it should convert unknown characters to their equivalent, but leave the ones known as is (for example a simple é should not be stripped from its accent as the character exists in CCSID 37). On the other hand for a š the accent should be stripped leaving a normal s as the other one doesn't exist in CCSID 37.
I don't know if this is possible with the transliteration API.
Kind regards,
Paul
------------------------------
Paul Nicolay
Original Message:
Sent: Tue June 03, 2025 05:42 AM
From: ace ace
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
Hi,
yes, being an old version it lacks some transforms, but querying the count API it returns many... like "Any-Latin" to transliterate to latin...
You should get something less than 200
In case share your code...
------------------------------
--ft
Original Message:
Sent: Tue June 03, 2025 04:57 AM
From: Paul Nicolay
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
Hi,
Thanks for sharing your code... but I had it figured out in the mean while as well (however I'm going to compare a few things).
BTW, do you get a value from utrans_countAvailableIDs... I always get zero ?
Kind regards,
Paul
PS. I'm trying to push IBM via a case as well to bring this old version to their attention.
------------------------------
Paul Nicolay
Original Message:
Sent: Tue June 03, 2025 04:47 AM
From: ace ace
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
Hi,
the QICU version on the system is indeed ancient, the fact the IBM doesn't give a circumstanced answer leads to speculation, and my speculation is that the ILE C/C++ compilers are ancient too and need to be revamped, so they are unable to get to a sustainable point. ICU is core part for correct unicode processing, should be PTF'ed by IBM among the other OS stuff.
But, you can use it, you need to look into the ICU4C (C interface) documentation.
I'm assuming you want to use it from RPG, here an example copied brutally from some of my utilities to get you started,
assuming that you want to apply this transform " NFD;[:Nonspacing Mark:]Remove;NFC " (this will remove accents etc. to a string), the core of the thing is
h = utrans_openU(id : %LEN(id) :
dir : *NULL : -1 : *NULL : errorCode);
utrans_transUChars(h :
text : textLength : textCapacity :
start : limit : errorCode);
dst = text;
utrans_close(h);
full src, YMMV, use at your discretion..etc.......
CTL-OPT DFTACTGRP(*NO) ACTGRP(*CALLER) OPTION(*SRCSTMT);
CTL-OPT BNDDIR('QICU/QXICUAPIBD');
DCL-C UTRANS_FORWARD 0;
DCL-C UTRANS_REVERSE 1;
DCL-C U_ZERO_ERROR 0;
DCL-C U_STRING_NOT_TERMINATED_WARNING -124;
DCL-C U_BUFFER_OVERFLOW_ERROR 15;
DCL-C U_ILLEGAL_ARGUMENT_ERROR 1;
DCL-S UErrorCode INT(10) TEMPLATE;
DCL-S UChar32 INT(10) TEMPLATE;
DCL-S UChar UNS(5) TEMPLATE;
DCL-S UTransliterator POINTER TEMPLATE;
DCL-DS UParseError ALIGN(*FULL) TEMPLATE INZ;
line INT(10);
offset INT(10);
preContext UCS2(16);
postContext UCS2(16);
END-DS;
DCL-PR utrans_countAvailableIDs INT(10)
EXTPROC('utrans_countAvailableIDs_4_0');
END-PR;
DCL-PR utrans_openU POINTER EXTPROC('utrans_openU_4_0');
id_ UCS2(100) CCSID(*UTF16) OPTIONS(*VARSIZE);
idLen_ INT(10) VALUE;
utransdir_ INT(10) VALUE;
rules_ POINTER VALUE;
rulesLen_ INT(10) VALUE;
parseerror_ POINTER VALUE;
uerrorcode_ INT(10);
END-PR;
DCL-PR utrans_transUChars EXTPROC('utrans_transUChars_4_0');
trans POINTER VALUE;
text UCS2(1000) CCSID(*UTF16) OPTIONS(*VARSIZE);
textLen INT(10);
textCapacity INT(10) VALUE;
start INT(10) VALUE;
limit INT(10);
errorcode INT(10);
END-PR;
DCL-PR utrans_close EXTPROC('utrans_close_4_0');
UTrasliterator_ POINTER VALUE;
END-PR;
//ICU-END
DCL-C CAPACITY 4000;
DCL-C VARLIMIT 1500;
DCL-PI *N;
src UCS2(VARLIMIT) CCSID(*UTF16) CONST;
dst UCS2(VARLIMIT) CCSID(*UTF16);
transformsIn UCS2(500) CCSID(*UTF16) CONST OPTIONS(*OMIT : *NOPASS);
END-PI;
DCL-S nIDs INT(10);
DCL-S h POINTER;
DCL-S id UCS2(100) CCSID(*UTF16);
DCL-S rules UCS2(100) CCSID(*UTF16);
DCL-S dir INT(10) INZ(UTRANS_FORWARD);
DCL-S errorCode INT(10);
DCL-S text UCS2(CAPACITY) CCSID(*UTF16);
DCL-S textLength INT(10);
DCL-S textCapacity INT(10);
DCL-S limit INT(10);
DCL-s start INT(10) INZ(0);
DCL-DS parseError LIKEDS(UParseError) INZ;
id = 'NFD;[:Nonspacing Mark:]Remove;NFC';
IF %PASSED(transformsIn);
id = transformsIn;
ENDIF;
textCapacity = CAPACITY;
textLength = VARLIMIT;
start = 0;
limit = VARLIMIT;
text = src;
//errorCode = U_ZERO_ERROR;
h = utrans_openU(id : %LEN(id) :
dir : *NULL : -1 : *NULL : errorCode);
utrans_transUChars(h :
text : textLength : textCapacity :
start : limit : errorCode);
dst = text;
utrans_close(h);
*INLR = *ON;
RETURN;
------------------------------
--ft
Original Message:
Sent: Sat May 31, 2025 08:13 AM
From: Paul Nicolay
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
Hi,
Despite the fact that it is outdated... did anyone get the ICU API's working (especially I'm looking to do transliteration) ?
I can get the u_isdigit example working (the sample is not even correct) but that's about it at the moment.
PS. ICU is a requirement for using regular expressions in the SQL functions so it is a bit strange that their version is outdated.
------------------------------
Paul Nicolay
Original Message:
Sent: Mon May 20, 2024 05:45 AM
From: ace ace
Subject: ICU (QICU library) on the system is too obsolete despite being a public user facing API
Hello...
needing to use a library for a project to properly handle some tasks involving unicode string processing for a B2B system, I wanted to leverage the API enlisted here from RPG
https://www.ibm.com/docs/en/i/7.5?topic=interfaces-api-finder
as "International Components for Unicode APIs"
ICU is a library present in a lot of systems to properly handle unicode processing (i.e. normale decomposition etc.etc.).
But, to my surprise, on our up to date V7R4, the SRVPGMs present in the QICU library are more than 10 years old (!) , the last one I can see is one implementing the ICU 4.0 version .
Despite being a public API of the system, it is lacking 15 years of quality features and improvement, as you can imagine, also security problems occurred in time .... the ICU4C code had also nasty (overruns, overflows... ) and public security issues (CVE) as can be seen here https://www.cvedetails.com/vulnerability-list/vendor_id-17477/Icu-project.html
Despite I insist that a public API, publicly exposed in the documentation, should be kept up to date by the vendor with PTFs, one can as a last resort in theory compile the library himself, using the pointers here
https://unicode-org.github.io/icu/userguide/icu4c/build.html#how-to-build-and-install-on-the-ibm-i-family-ibm-i-i5os-os400
BUT
the tools used in such instructions refer to the IFS folder (probably containing the icc compiler...) called
/QIBM/ProdData/DeveloperTools/
that are now obsolete, and replaced by a product than itself is already obsolete.
How to properly obtain or compile an up to date ICU on IBMi (a task that maybe in other systems and OSes would have taken 5 minutes...) using a supported workflow and resolve this Kafkaesque situation?
As a customer I've already
- contacted support (that cannot solve or update the libraries but at least contacted security team apparently)
- contacted VAR on how to obtain the eventual compiler to build the public ICU project (no answer)
- logged an "idea" on the "ideas" site (yes apparently for IBM security fixes of a public API are also an "idea", maybe in their ideal world ; ) )
For a OS involved geared mainly in business processing, B2C/B2B, EDI, etc. is astonishing not having a proper library for unicode tasks.
------------------------------
--ft
------------------------------