A Game Changing Series. Part 8
DFSMSdfp CDA Compression
Andrew Wilt
If you haven’t seen the rest of the Game Changing Series Blog posts you can check them out here to get a high-level explanation of CDA and the other enhancements!
The CDA team has got more for you! Now it’s time to decompress and enjoy the expanded fruits of our labor. Okay! Enough of the bad puns. The Cloud Data Access team is pleased to announce that CDA now supports the zEDC and gzip compression algorithms when processing user data. This support is available on z/OS 3.1 and higher via the PTF for OA66536.
Data compression is the process of reducing the size or amount of data in such a way that it can be brought back to the original size later. It is similar to shipping a mattress. It costs extra to ship such a big thing, so they suck out all the air making the package smaller. Then when the package arrives at your house, you open it and it inflates to the original size.
This is exciting news because with this compression support, you can reduce the amount of space used in the cloud object server, and hopefully reduce the cost when you upload data! Additionally, sending less data over TCP/IP is always good. Likewise, if there is data in the cloud that has been compressed with gzip compression (i.e. object-name.csv.gz), you can download it to z/OS and expand it as it arrives.
Daaaaaaaaaaaaaaaaaaatttttttttttttttttttttttttttttttttttaaaaaaaaaaaaaaaaaaaaaa
1.1 GDKUTIL Enhancements
We added a new COMPRESS keyword for the UPLOAD command. You can use the zEDC or gzip sub-parameters to indicate the type of compression you want performed on the data set or z/OS UNIX file you are putting in the cloud as an object. (Note that the sub-parameters are case-sensitive.) When you use this command, CDA will attach metadata to the object with the key: zos-compression, and a value of zEDC or gzip, depending on which compression algorithm you requested.
Additionally, we wanted to allow you to tailor the compression being done, using the new COMPLEVEL keyword. The options for COMPLEVEL are: MAX|SPEED|DEFAULT|{1-9}
· MAX indicates that CDA should try to get the most compression for data being sent. It can result in extra CPU usage as the best compression result is attempted.
· SPEED indicates that CDA should request the compression performed be done as quickly as possible, even if the compression ratio is not that good.
· DEFAULT indicates that the default compression level should be used. It is a mid-point between SPEED and MAX.
· 1-9 – A number that indicates the specific compression level to be given to the zlib APIs.
When downloading something from the object store, CDA will automatically recognize the zos-compression metadata associated with the cloud object and perform the decompression of the data. However, if the object does not have the zos-compression metadata, or if the provider file you are using does not have the METAHEADER instructions needed to recognize the zos-compression metadata header, you can tell CDA that you want decompression done on the object by using the DECOMPRESS keyword.
The DECOMPRESS keyword has zEDC | gzip | zlib | NONE as sub-parameters:
· zEDC – Indicates that you know the data in the object has been compressed using the zEDC algorithm. If the data within was not compressed using zEDC, then an error will be reported when the zlib API returns an error.
· gzip – Indicates that you know the data in the object was compressed using the gzip algorithm. If the data within was not compressed using gzip, then an error will be reported when the zlib API returns an error.
· zlib – Indicates that the zlib headers imbedded in the object data should be used to determine the algorithm to use.
· NONE – Do not perform any decompression, regardless of the existence of the zos-compression metadata tag on the object.
When would you want to use the DECOMPRESS keyword?
Mostly, I expect you won’t need to use it. Everything should be automatic. Since the intent of CDA is to help z/OS get to data in cloud object storage, I expect that some objects were not created by CDA and therefore may not have the zlib headers or metadata tags that help CDA understand what to do with the data in that object.
zEDC or gzip are for the times when there is an object that was created by another application, but you are told that the contents were compressed by the zEDC or gzip algorithms. Likewise, maybe an application on another platform used the zlib APIs for their platform to compress the data, thus the object has the zlib headers and trailers imbedded in the data, so you would want CDA to use the option that looks at the headers to figure out how to decompress the data.
The NONE sub-parameter is included so that you can download the data as-is without any decompression happening as the data is brought into z/OS.
1.2 CDA API changes
We changed all three paths through the GDKGET and GDKWRITE code to be able to compress/decompress data. The three paths are:
· GDK_BUFFERDATALOCATION
· GDK_PATHDATALOCATION
· GDK_EXITDATALOCATION
1.2.1 GDKGET
If you are writing your own application to send data to cloud object stores, the GDKGET API is what you call to send data to a cloud object. The GDKGET API recognizes new optional parameters:
· “compression” – The value for this key is a null-terminated string with one of two values, indicating the type of compression algorithm you expect to be used after receiving the data. You only need to pass this optional parameter when the object does not have the zos-compression metadata tag.
o “zEDC” means that you expect the data to have been compressed with the zEDC algorithm.
o “gzip” means that you expect the data to have been compressed with the gzip algorithm.
· “decompress” – The value for this key is a null-terminated string with one of the following values:
o “true” means that CDA will decompress the data. If the “compression” optional parameter wasn’t passed, then CDA will assume the data should have the zlib headers so the compression algorithm can be automatically determined.
o “false” means that CDA should not do any decompression of the data, and not even look at the zos-compression metadata value. (If the data is compressed, it will be retrieved as-is.)
o “attempt-inflate” means that CDA should try to decompress the data, but that if an error occurs trying to decompress the first set of data, then the entire thing is returned as-is. You would use this option when you aren’t sure if the object contains zlib compressed data.
Check the publication updates for details on the new return codes that can be returned. https://public.dhe.ibm.com/eserver/zseries/zos/DFSMS/CDA/OA66536/OA66536_publicationUpdates.pdf
1.2.2 GDKWRITE
The GDKWRITE API now recognizes some new optional parameters that allow you to request compression of the data being sent to the object. The optional parameters are recognized for all three data location types.
· “compression” – The value is a null-terminated string that indicates that compression should be used on the data sent to the object store. The values accepted are:
o “zEDC” – The zEDC compression algorithm should be used to compress the data.
o “gzip” – The gzip compression algorithm should be used.
· “compLevel” – The value is a null-terminated string that indicates compression processing tuning levels. The options are:
o “SPEED” – Performs the fasted compression with minimal CPU usage. A lower compression ration may be seen.
o “MAX” – Maximizes the compression ratio. Higher CPU usage is expected.
o “DEFAULT” – Performs efficient compression while maintaining good speed.
o {1-9} – A number indicating the compression level to be used. This is a zlib parameter.
· “Get-Sent-Data-LengthE” – The value is a pointer to an 8-byte number field where CDA should place the total number of bytes sent to the server (after compression) when the request is successful. One of our API callers wanted a way to calculate what the compression ratio was and while they knew how much was sent to CDA, there wasn’t an existing way to return the total amount of data CDA sent on to the object server.
Note that users of the GDK_EXITDATALOCATION will have their data buffered by CDA as the data is compressed. The size of the buffer is determined by the multipartThreshold and multipartChunksize key/value pairs in the provider file for the WRITELARGEOBJECT operation, or 8*1024*1024 as a default.
1.3 Provider file Updates
Earlier, I mentioned something about the METAHEADER instructions being needed for CDA to recognize that an object has compressed data in it. The CDA provider file contains instructions that tell CDA how to communicate with the object server. One of those instructions provides details regarding what a metadata header looks like. For example, the S3 REST API uses the header prefix of x-amz-meta- to indicate metadata headers sent to the object server. Others, such Microsoft Azure blob service use x-ms-meta- for their metadata.
To let CDA attach metadata to an object on UPLOAD, the WRITEOBJECT and WRITELARGEOBJECT operations need to have the METAHEADER JSON object in the requestParameters array. An example is:
Provider file excerpt
|
"requestParameters": [
{
"mechanism": "MESSAGE_BODY",
"descriptor": "<GDK_DATA>",
"signedS3Payload": "false",
"contentType": "text/plain"
},
{
"mechanism": "METAHEADER",
"descriptor": "x-amz-meta-"
},
{
"mechanism": "HEADER",
"descriptor": "x-amz-date: <DATE_ISO_8601>"
}
]
},
|
On the DOWNLOAD side, CDA needs to be able to recognize metadata headers. So, we would need something similar in the responseResults array for CDA to understand what a metadata header starts with. The GETOBJECT operation needs the METAHEADER object in the responseResults, and the GETLARGEOBJECT operation needs the METAHEADER object in responseResults in both the getSize and data actions.
Provider file excerpt
|
"responseResults": [
{
"mechanism": "HEADER",
"name": "Content-Length",
"content": "GDK_LENGTH"
},
{
"mechanism": "METAHEADER",
"descriptor": "x-amz-meta-"
}
]
|
Check out our Cloud Data Access content solution page to learn more and get started!
We would love to hear your thoughts on CDA and these new enhancements! Leave a comment on this blog post, start a discussion on the DFSMS Community Page, or join Mainframe Data Management LinkedIn group to discuss there!
Author:
Andrew Wilt
Editor:
Alexis Kapica