Db2

Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

Full reverse: A REVERSE function that handles unicode 

Mon March 09, 2020 03:43 PM

Background

This week a co-worker converting an Oracle application to DB2 contacted me and requested a REVERSE function.
REVERS() is a function which reverses strings by reordering all the characters in reverse order.

For example:
'Hello World!'
becomes
'!dlroW olleH'. 
Not a terribly useful function within an application, generally speaking.

However reversing strings can reduce contention on index pages when inserting rows into a table in ascending order. 
At any rate, I have learned over the years that conversions are much more successful when one does not demand the customer to redesign whatever it is they have been doing for reasons that may elude me.
 

REVERSE written in SQL

It is fairly trivial to write a function in SQL PL which walks an input strings and spits it out in reverse order.
However, such a function, if written in inline SQL PL will be effectively limited to non DPF scenarios since inline SQL PL executes on the coordinator.
In fact my coworker had written such a beast and found it to be too slow for that reason.
Writing a function in compiled SQL PL won't solve the DPF problem and would make it likely even slower.
So the first question is:Can the function be written as SQL without the need for BEGIN.. END.
Here is one possible solution using recursion:
CREATE OR REPLACE FUNCTION REVERSE(INSTR VARCHAR(4000))
  RETURNS VARCHAR(4000) SPECIFIC REVERSE
   DETERMINISTIC NO EXTERNAL ACTION CONTAINS SQL
 RETURN WITH rec(pos, res) AS (VALUES (1, CAST('' AS VARCHAR(4000)))
                               UNION ALL
                               SELECT pos + 1, SUBSTR(INSTR, pos , 1) || res 
FROM rec
WHERE pos <= LENGTH(INSTR)
AND pos < 5000) SELECT res FROM rec WHERE pos > LENGTH(INSTR); / VALUES reverse('Hello World!'); ERROR near line 1: SQL1585N A temporary table could not be created because there is no available system temporary table space that has a compatible page size.
OK, we need to have system temp >4KB to hold teh temp table for the recursion.
CREATE BUFFERPOOL bp32 PAGESIZE 32K;

CREATE SYSTEM TEMPORARY TABLESPACE tsp32 PAGESIZE 32K BUFFERPOOL bp32;

VALUES reverse('Hello World!');
1                                                 
--------------------------------------------------
!dlroW olleH  
Success!
 
We may stop here. However this SQL UDF has a couple of drawbacks:
  • As DB2 informed us, as part of the execution of the recursion a temporary table has to be created.
  • We need to scan that temp table to get the final result.
While this logic is certainly quite fast it is a lot of machinery just to reverse a string.
If this function is called a lot, then it could become a performance drag.
 

REVERSE written in C

When executing heavy logic with no SQL in need for high performance unfenced C-UDF are the better choice.
A C-UDF is quite easy to write and deploy, assuming you have a C-compiler available. 
In this case we need three files:
 
udfreverse.c:
#include <sqludf.h>

#ifdef __cplusplus
extern "C"
#endif
void SQL_API_FN ReverseSBCP(SQLUDF_VARCHAR  *inVarchar,
                            SQLUDF_VARCHAR  *outVarchar,
                            SQLUDF_SMALLINT *inVarcharNullInd,
                            SQLUDF_SMALLINT *outVarcharNullInd,
                            SQLUDF_TRAIL_ARGS)
{
  int inLen, inPos, outPos;

  if (*inVarcharNullInd == -1)
  {
    *outVarcharNullInd = -1;
  }
  else
  {
    inLen = strlen(inVarchar);
    for (inPos = 0, outPos = inLen -1; inPos < inLen; inPos++, outPos--)
	{	
	  outVarchar[outPos] = inVarchar[inPos];
	}	
	outVarchar[inLen] = '\0'; 
    *outVarcharNullInd = 0;
  }
  return;
}
 This file should be copied into the sqllib/samples/c directory.
 
udfreserve.def:
LIBRARY UDFREVERSE
DESCRIPTION 'Library for DB2 REVERSE function'
EXPORTS
        ReverseSBCP
This file is needed by Windows only to export the entry points of the function.
It too needs to be copied to sqllib/samples/c.

If you are on a Unix or Linux system you use the following file instead:

udfreverse.exp
ReverseSBCP
Now run the bldrtn make file to compile, link and copy the executable into sqllib/FUNCTION/ROUTINE:
bldrtn udfreverse
That being done we can register the function and test it:
CREATE OR REPLACE FUNCTION REVERSE(VARCHAR(4000))
  RETURNS VARCHAR(4000)
  SPECIFIC REVERSE
  NO SQL EXTERNAL ACTION DETERMINISTIC
  LANGUAGE C PARAMETER STYLE SQL
  NOT FENCED
  EXTERNAL NAME 'udfreverse!ReverseSBCP';
/

VALUES reverse('Hello World!');
1                                                 
--------------------------------------------------
!dlroW olleH  
Nice.
 
You may have noted that I named the C function ReverseSBCP for "Single Byte Code Page".
The problem is that the function, as implemented, only works if we use a single byte code page.
In a Unicode database, which is default in DB2, this would only work for ASCII characters as we can easily see when we add some German umlaute:
VALUES REVERSE('führen');
1                                                 
--------------------------------------------------
nerh��f         
Back to the drawing board...
 

REVERSE written in C with Unicode support

In Unicode UTF-8 which is what is being used in a DB2 Unicode database for the VARCHAR data type every character is between one and 4 bytes long.
The first few bits of the first byte encode how many more bytes are needed to encode the entire character.
That means our implementation will look like this:

udfreverse.c
#include <sqludf.h>

#ifdef __cplusplus extern "C" #endif void SQL_API_FN Reverse1208(SQLUDF_VARCHAR *inVarchar, SQLUDF_VARCHAR *outVarchar, SQLUDF_SMALLINT *inVarcharNullInd, SQLUDF_SMALLINT *outVarcharNullInd, SQLUDF_TRAIL_ARGS) { int inLen, inPos, outPos; char c; if (*inVarcharNullInd == -1) { *outVarcharNullInd = -1; } else { inLen = strlen(inVarchar); inPos = 0; outPos = inLen - 1; while (inPos < inLen) { c = inVarchar[inPos]; // An ASCII character 00-7F is one byte long if ((c & 0x80) == 0x00) { outVarchar[outPos--] = c; inPos++; } // 110xxxxx introduces a 2 byte character else if ((c & 0xE0) == 0xC0 && (inVarchar[inPos + 1] & 0xC0) == 0x80) { outVarchar[outPos - 1] = c; outVarchar[outPos] = inVarchar[inPos + 1]; inPos += 2; outPos -= 2; } // 1110xxxx introduces a 3 byte character else if ((c & 0xF0) == 0xE0 && (inVarchar[inPos + 1] & 0xC0) == 0x80 && (inVarchar[inPos + 2] & 0xC0) == 0x80) { outVarchar[outPos - 2] = c; outVarchar[outPos - 1] = inVarchar[inPos + 1]; outVarchar[outPos ] = inVarchar[inPos + 2]; inPos += 3; outPos -= 3; } // 11110xxx introduces a 4 byte character else if ((c & 0xF8) == 0xF0 && (inVarchar[inPos + 1] & 0xC0) == 0x80 && (inVarchar[inPos + 2] & 0xC0) == 0x80 && (inVarchar[inPos + 3] & 0xC0) == 0x80) { outVarchar[outPos - 3] = c; outVarchar[outPos - 2] = inVarchar[inPos + 1]; outVarchar[outPos - 1] = inVarchar[inPos + 2]; outVarchar[outPos ] = inVarchar[inPos + 3]; inPos += 4; outPos -= 4; } // anything else is not legal Unicode in UTF-8 else { strcpy(SQLUDF_STATE, "78000"); strcpy(SQLUDF_MSGTX, "REVERSE: Broken UTF-8 character found."); break; } } outVarchar[inLen] = '\0'; *outVarcharNullInd = 0; } return; }
I gave the function a different name so we need to adjust the .def and .exp files accordingly:
 
udfreserve.def:
LIBRARY UDFREVERSE
DESCRIPTION 'Library for DB2 REVERSE function
EXPORTS
        Reverse1208
Or, if you are on a Unix or Linux system you use the following file:

udfreverse.exp
Reverse1208
Now run the bldrtn again:
bldrtn udfreverse
And we need to re-register to the new entry point:
CREATE OR REPLACE FUNCTION REVERSE(VARCHAR(4000))
  RETURNS VARCHAR(4000)
  SPECIFIC REVERSE
  NO SQL EXTERNAL ACTION DETERMINISTIC
  LANGUAGE C PARAMETER STYLE SQL
  NOT FENCED
  EXTERNAL NAME 'udfreverse!Reverse1208';
/

VALUES REVERSE('fhren');
1                                                 
--------------------------------------------------
nerhüf         
Done!
 

Summary

In the actual attached files:
I have provided both implementations together.
So you can pick which suits you best based on the usage requirements.
udfreverse.sql contains all three SQL definitions to choose from.
 

#Db2

Statistics
0 Favorited
10 Views
0 Files
0 Shares
0 Downloads