The compressed output format comprises a sequence of blocks, as Roy describes. Each block starts with a control byte, the high order bit of this byte determines whether a sequence of repeated characters is encoded, or input characters are copied as-is into the output. The low-order 7 bits of the control byte form a length value, in the range 0 to 127. The first block of the output must be x'80' to indicate RLE-compressed blocks follow. If the high-order bit of a subsequent control byte is set, it is followed by a single byte which holds the character to be repeated the number of times encoded in the low-order 7 bits of the control byte. If the high-order bit of the control byte is reset, the length in the low-order 7 bits of the control byte holds the number of following bytes that have been copied as-is into the output block.
Here's a worked example, with a file consisting of long sequences of blanks and non-repeating strings, to illustrate how the RLE format looks:
Input:
000000: 40404040 40404040 40404040 40404040 |@@@@@@@@@@@@@@@@| | |
-------- same as above --------
000120: 40404040 40404040 40404081 82838485 |@@@@@@@@@@@.....| | abcde|
000130: 86878889 91929394 95969798 99A2A3A4 |................| |fghijklmnopqrstu|
000140: A5A6A7A8 A9F1F2F3 F4F5F6F7 F8F9F0C1 |................| |vwxyz1234567890A|
000150: C2C3C4C5 C6C7C8C9 D1D2D3D4 C1D5D6D7 |................| |BCDEFGHIJKLMANOP|
000160: D8E2E3E4 E5E6E7E8 E94E617E 81828384 |.........Na~....| |QSTUVWXYZ+/=abcd|
000170: 85868788 89919293 94959697 9899A2A3 |................| |efghijklmnopqrst|
000180: A4A5A6A7 A8A9F1F2 F3F4F5F6 F7F8F9F0 |................| |uvwxyz1234567890|
000190: C1C2C3C4 C5C6C7C8 C9D1D2D3 D4C1D5D6 |................| |ABCDEFGHIJKLMANO|
0001A0: D7D8E2E3 E4E5E6E7 E8E94E61 7E818283 |..........Na~...| |PQSTUVWXYZ+/=abc|
0001B0: 84858687 88899192 93949596 979899A2 |................| |defghijklmnopqrs|
0001C0: A3A4A5A6 A7A8A9F1 F2F3F4F5 F6F7F8F9 |................| |tuvwxyz123456789|
0001D0: F0C1C2C3 C4C5C6C7 C8C9D1D2 D3D4C1D5 |................| |0ABCDEFGHIJKLMAN|
0001E0: D6D7D8E2 E3E4E5E6 E7E8E94E 617E8182 |...........Na~..| |OPQSTUVWXYZ+/=ab|
0001F0: 83848586 87888991 92939495 96979899 |................| |cdefghijklmnopqr|
000200: A2A3A4A5 A6A7A8A9 F1F2F3F4 F5F6F7F8 |................| |stuvwxyz12345678|
000210: F9F0C1C2 C3C4C5C6 C7C8C9D1 D2D3D4C1 |................| |90ABCDEFGHIJKLMA|
000220: D5D6D7D8 E2E3E4E5 E6E7E8E9 4E617E81 |............Na~.| |NOPQSTUVWXYZ+/=a|
000230: 82838485 86878889 91929394 95969798 |................| |bcdefghijklmnopq|
000240: 99A2A3A4 A5A6A7A8 A9F1F2F3 F4F5F6F7 |................| |rstuvwxyz1234567|
000250: F8F9F0C1 C2C3C4C5 C6C7C8C9 D1D2D3D4 |................| |890ABCDEFGHIJKLM|
000260: C1D5D6D7 D8E2E3E4 E5E6E7E8 E94E617E |.............Na~| |ANOPQSTUVWXYZ+/=|
000270: 40404040 40404040 40404040 40404040 |@@@@@@@@@@@@@@@@| | |
-------- same as above --------
000290: 40E3C8C5 C5D5C4 |@...... | | THEEND |
Output:
000000: 80FF40FF 40AD407F 81828384 85868788 |..@.@.@.........| |.. . . "abcdefgh|
000010: 89919293 94959697 9899A2A3 A4A5A6A7 |................| |ijklmnopqrstuvwx|
000020: A8A9F1F2 F3F4F5F6 F7F8F9F0 C1C2C3C4 |................| |yz1234567890ABCD|
000030: C5C6C7C8 C9D1D2D3 D4C1D5D6 D7D8E2E3 |................| |EFGHIJKLMANOPQST|
000040: E4E5E6E7 E8E94E61 7E818283 84858687 |......Na~.......| |UVWXYZ+/=abcdefg|
000050: 88899192 93949596 979899A2 A3A4A5A6 |................| |hijklmnopqrstuvw|
000060: A7A8A9F1 F2F3F4F5 F6F7F8F9 F0C1C2C3 |................| |xyz1234567890ABC|
000070: C4C5C6C7 C8C9D1D2 D3D4C1D5 D6D7D8E2 |................| |DEFGHIJKLMANOPQS|
000080: E3E4E5E6 E7E8E97F 4E617E81 82838485 |........Na~.....| |TUVWXYZ"+/=abcde|
000090: 86878889 91929394 95969798 99A2A3A4 |................| |fghijklmnopqrstu|
0000A0: A5A6A7A8 A9F1F2F3 F4F5F6F7 F8F9F0C1 |................| |vwxyz1234567890A|
0000B0: C2C3C4C5 C6C7C8C9 D1D2D3D4 C1D5D6D7 |................| |BCDEFGHIJKLMANOP|
0000C0: D8E2E3E4 E5E6E7E8 E94E617E 81828384 |.........Na~....| |QSTUVWXYZ+/=abcd|
0000D0: 85868788 89919293 94959697 9899A2A3 |................| |efghijklmnopqrst|
0000E0: A4A5A6A7 A8A9F1F2 F3F4F5F6 F7F8F9F0 |................| |uvwxyz1234567890|
0000F0: C1C2C3C4 C5C6C7C8 C9D1D2D3 D4C1D5D6 |................| |ABCDEFGHIJKLMANO|
000100: D7D8E2E3 E4E5E647 E7E8E94E 617E8182 |.......G...Na~..| |PQSTUVW.XYZ+/=ab|
000110: 83848586 87888991 92939495 96979899 |................| |cdefghijklmnopqr|
000120: A2A3A4A5 A6A7A8A9 F1F2F3F4 F5F6F7F8 |................| |stuvwxyz12345678|
000130: F9F0C1C2 C3C4C5C6 C7C8C9D1 D2D3D4C1 |................| |90ABCDEFGHIJKLMA|
000140: D5D6D7D8 E2E3E4E5 E6E7E8E9 4E617EA1 |............Na~.| |NOPQSTUVWXYZ+/=~|
000150: 4006E3C8 C5C5D5C4 |@....... | | .THEEND |
------------------------------
Andrew Mattingly
------------------------------
Original Message:
Sent: Tue July 23, 2024 03:08 AM
From: Til Mohr
Subject: Exact Algorithm behind CSRCESRV
Hi there,
I would like to expand a String compressed using the macro CSRCESRV in another programming language (in my particular case: Java). I understand that CSRCESRV applies run-time encoding to save on space. Unfortunately, the exact procedure applied is not documented anywhere, and so far I haven't been successful in reversing this compression myself. I am especially confused by the presence of multiple control bytes in the compressed string (ASCII, bytes 0x00 to 0x31, as well as 0x7F), which I am confident (yet not 100% sure) are not a part of the original uncompressed string.
Could you please state the exact algorithm that CSRCESRV uses to compress a String?
Reference:
- https://www.ibm.com/docs/en/SSLTBW_3.1.0/com.ibm.zos.v3r1.ieaa700/macro.htm
- https://www.ibm.com/docs/en/zos/3.1.0?topic=services-provided-by-csrcesrv
Thank you,
Til
------------------------------
Til Mohr
------------------------------