After 18 years of working heavily with ODBC and VBA in Microsoft Access to import data from .csv and other files into DB2 databases, I ran into an extremely arcane situation in which DB2 considers the length of a string to be different from how Windows sees the length. Current DB2level 11.1.3030.240 on Windows server & client.
The ODBC driver has always automatically truncated strings in INSERT statements to the length of the DB2 destination column if the length of the string in the INSERT exceeds the destination column length. But last week, I found that some INSERT statements were silently failing, and I eventually found this: my INSERT included a string of 40 characters to populate a VARCHAR(40) field, but my string also included an en dash, that is, the medium-length dash whose ASCII value is 150. The source data was an Excel spreadsheet, and Excel had apparently used an n dash instead of hyphen in this particular string.
So I went into the UI of the users' application that uses this DB2 database and tried the same thing. It failed, but it returned a SQL0302N / SQLSTATE 22001 error, which roughly equates to "too long for this column", even though the number of characters in the string was equal to, not exceeding, the column length. So I reduced my string length by a few characters, and my INSERT succeeded (same with UPDATE, of course). That told me that it was not that the characters were incompatible but that there was some character-length issue.
My next test was to try the same thing on another client's DB2 system, and I could not replicate the problem. And that led me down to database code pages: the failing system is UTF-8 encoded, while the succeeding system is IBM-1252 encoded. So I ran this query in both systems. The dash is not a hyphen; it is an en dash (ASCII 150) entered into Windows as Alt-0150.
SELECT LENGTH ('–') FROM DUAL
Sure enough, the IBM-1252 system correctly reports 1, but the UTF-8 encoded system reports 3. So I tried the degree symbol (ASCII 176, entered as Alt-0176):
SELECT LENGTH ('°') FROM DUAL
The IBM-1252 system correctly reports 1, but the UTF-8 encoded system reports 2.
If I am correct, this all tells me that Windows (UTF-16 encoding, right?) recognizes each character as having a length of one, but a UTF-8 encoded database sees some characters as having a length greater than one, while IBM-1252 apparently matches Windows.
By observation, it is apparent that the ODBC driver provided by IBM truncates inbound strings for INSERT/UPDATE based on a comparison of the Windows string length (character count) to the destination column length, not to the string length as it will be encoded in DB2. And this presents a serious problem: I have no way of knowing when this will occur. This seems like a serious deficiency in the ODBC driver.
I have thus far been unable to find anything in the Visual Basic documentation describing any way to convert my strings
My solution thus far was very complex and slows down my import process measurably. I had to write code in my VBA application to truncate strings before invoking the INSERT statement: query DB2 for the column length, truncate to that length, then loop through the remaining characters to determine if any of them are ASCII > 127 and, for each that is, open a connection to DB2, query DUAL for the length of that character, then truncate further by the excess length the character will consume in DB2.
Which brings me to my question: is there some encoding-related CLI parameter or other option I can configure in my DSN or the client computer to ensure that strings passed through the DSN to DB2 are correctly truncated based on destination length, not source length?
#Db2#Db2forLUW#Support#SupportMigration