IBM® POWER9TM systems include a new GZIP based hardware accelerator that is supported on AIX 7.2. This article presents the technology, how to utilize it, and a performance study showing a significant speedup of compression when utilizing the hardware accelerator.
A new zlib library, the pigz command, and a new xgzip command is available on AIX® 7.2 that transparently utilize the accelerator significantly accelerating zlib-based compression. The new zlibNX library is available on the AIX expansion pack, pigz is available in the AIX Toolbox for Linux Applications, and the new xgzip command is available from the AIX web download packs. pigz and xgzip transparently take advantage of the accelerator when available through the use of zlibNX.
Contributors: @NICK STILWELL, @Brian F. Veale, @Arnold Flores, and @Carl Burnett
Hardware Acceleration on POWER9 Systems
Each processor chip in a POWER9 server has an on-chip “nest” accelerator called the NX unit that provides specialized functions for general data compression, gzip compression, encryption, and random number generation. These accelerators are used transparently across the systems software stack to speed up operations related to Live Partition Migration, IPSec, JFS2 Encrypted File Systems, PKCS11 encryption, and random number generation through /dev/urandom.
The on-chip NX GZIP accelerator on POWER9 systems implements a high throughput deflate format (RFC 1951) compression engine capable of performing the equivalent work of tens to hundreds of cores.
Acceleration of zlib and gzip operations on AIX
The zlib open source library is a widely used lossless data compression library that implements the DEFLATE (RFC1951), zlib (RFC1950), and gzip (RFC1952) compression formats through software algorithms.
AIX 7.2 Technology Level 4 delivers a new zlibNX library and xzgip command that uses NX gzip compression acceleration when running on POWER9 servers starting with server firmware FW940 allowing for faster compression of files and speedup of middle-ware and applications that either dynamically link to the zlib library or are modified to use the new library.
The zlibNX package on the AIX 7.2 Technology Level 4 Expansion Pack provides a compatible version of zlib which supports the sending of in-memory compression and decompression requests to the nest (NX) accelerator unit on the IBM® POWER9™ processor.
The compressed data formats are portable across platforms. The NX-accelerated zlib library is provided as UNIX archive files that can be statically or dynamically linked to applications that currently use zlib. Because all of the function signatures are the same, existing zlib-enabled programs can use zlibNX.
There are also several open source packages available in the AIX Toolbox for Linux Applications that link to the zlib library. Since packages in the AIX Toolbox are built to dynamically link to the zlib library, they can also take advantage of accelerated compression through zlibNX. Notable packages include: parallel gzip (pigz), MySQL, ClamAV (an antivirus engine), MariaDB, SQLite, PostgreSQL, mongo-c-driver (which is used to access MongoDB), and GIT. A full list of available packages linking to zlib today is available near the end of this article.
Additionally, IBM has released a port of parallel gzip (pgiz) and a new xgzip compression utility for AIX. The pigz and xgzip utilities link to the zlibNX library to take advantage of accelerated compression and functions similar to the well-known gzip utility.
Installation and Configuration on AIX
System Requirements
The system must be a POWER9 system running FW version FW940 or later and the partition must be configured to run in the new POWER9 Processor Compatibility Mode that is enabled by FW version FW940. Note, this is a different mode than the default or POWER9_base mode that was available at the initial launch of the POWER9 line of systems.
Other system requirements will vary depending on the application workload running on the partition. Minimum recommendations are 1 processor and 6 GB of memory.
The FW level and configuration can be verified via the AIX command line via the prtconf command:
# prtconf
System Model: *
Machine Serial Number: *
Processor Type: PowerPC_POWER9
Processor Implementation Mode: POWER 9
Processor Version: PV_9_Compat
Number Of Processors: 4
Processor Clock Speed: 3000 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: *
Memory Size: 32768 MB
Good Memory Size: 32768 MB
Platform Firmware level: VH940_027
Firmware Version: IBM,FW940.00 (VH940_027)
The current processor compatibility mode can be verified from HMC that manages the partition. Navigate to the partition’s properties page, click the Processor tab and click Advanced. The Effective Processor Compatibility Mode must be set to POWER9 for AIX to be able to utilize the GZIP accelerator. Note that changes to the mode require a reboot of the partition in order to take effect.
Figure 1. Configuring the Processor Compatibility Mode via the HMC
Installation of the zlibNX library and pigz and xgzip Commands
The zlibNX library is available on the AIX 7.2 TL 5 Expansion Pack as zlibNX.rte. To install the package using the installp command, mount the expansion pack media copy the zlibNX.rte fileset to the partition and then run installp. In the example below, the expansion pack media is mounted on /dev/cd0.
# installp -aXgqY -d/dev/cd0 zlibNX.rte
Verify installation with the lslpp command:
# lslpp -l zlibNX.rte
Fileset Level State Description
---------------------------------------------------------------------
Path: /usr/lib/objrepos
zlibNX.rte 7.2.4.0 COMMITTED NX accelerated zlib
compression library
Install the parallel gzip (pigz) package from the AIX Toolbox for Linux Applications using yum as shown below:
# yum install pigz
Verify installation with the yum command:
# yum list installed pigz
Installed Packages
pigz.ppc 2.4-1 @AIX_Toolbox
Add /opt/freeware/bin to your path if it is not already there:
# echo $PATH
/usr/bin:/etc:/usr/sbin:/usr/ucb:/sbin:.
# export PATH=/opt/freeware/bin:$PATH
Next download the xgzip fileset from the AIX Web Download Pack Programs and install it using installp as shown below:
# installp -aXgqY -d ./xgzip.rte xgzip.rte
Verify installation with the lslpp command:
# lslpp -l xgzip.rte
Fileset Level State Description
---------------------------------------------------------------------------
Path: /usr/lib/objrepos
xgzip.rte 4.0.20.0 COMMITTED A command utility to exploit
NX accelerated zlib
compression library
Note, the zlibNX library was first made available on the AIX 7.2 TL 4 Expansion Pack. Versions 7.2.4.0 and 7.2.4.2 have an issue that can affect the integrity of compressed archives. These versions of the library are included in the IBM AIX V7.2 Expansion Pack 11/2019 and the IBM AIX V7.2 Expansion Pack 5/2020. If you are using an affected version, it is recommended that you install the ifix for APAR IJ28579. For more information see: https://www.ibm.com/support/pages/apar/IJ28579
Enabling Existing Applications to Utilize zlibNX
Applications that dynamically link to the standard zlib can be made to link with the accelerated zlibNX without application modification. There are several environment variables that can be set to load the zlibNX shared library.
Set the LDR_PRELOAD or LDR_PRELOAD64 variable:
# LDR_PRELOAD="/usr/opt/zlibNX/lib/libz.a(libz.so.1)" <32-bit application>
# LDR_PRELOAD64="/usr/opt/zlibNX/lib/libz.a(libz.so.1)" <64-bit application>
Set the LD_LIBRARY_PATH variable:
# LD_LIBRARY_PATH=/usr/opt/zlibNX/lib:$LD_LIBRARY_PATH <application>
Set the LIBPATH variable:
# LIBPATH=/usr/opt/zlibNX/lib:$LIBPATH <application>
Example Usage with the SQLite Archiver Tool
The SQLite Archiver command (sqlar) is a tool released by the SQLite project. It is a command line utility that that takes a file or a list of files and creates an SQLite database with the files as stored as BLOBs (binary large objects). By default, the utility compresses files using zlib.
# time sqlar corpus.sqlar silesia_corpus/*
real 0m14.70s
user 0m4.17s
sys 0m0.10s
# export LDR_PRELOAD="/usr/opt/zlibNX/lib/libz.a(libz.so.1)"
# time sqlar corpus.sqlar silesia_corpus/*
real 0m1.40s
user 0m0.12s
system 0m0.25s
When linked with zlibNX versus the standard zlib, sqlar runs approximately 10x faster and reduces CPU time by 91%. Note: time is end-to-end, including I/O time not accelerated by zlibNX.
Using the pigz and xgzip Commands
pigz and xgzip uses similar flags and parameters as the gzip command. Below is an example of compressing a file (and keeping the original file) using gzip, pigz, and xgzip:
# time pigz –c mybackup > mybackup.gz
# time xgzip -c mybackup > mybackup.gz
# time gzip -c mybackup > mybackup.gz
|
gzip
|
xgzip (accelerated)
|
pigz (accelerated)
|
pigz vs gzip
|
xgzip vs gzip
|
real time
|
5m 14.75s
|
0m 19.07s
|
0m 11.38s
|
27.6x faster
|
16.5x faster
|
user time
|
1m 30.01s
|
0m 2.15s
|
0m 7.94s
|
84% less CPU time
|
94% less CPU time
|
sys time
|
0m 1.80s
|
0m 3.20s
|
0m 6.55s
|
compressed size (MBs) Input file: 4287.89
|
1890.64 56% smaller
|
2133.29 50% smaller
|
2006.44 53% smaller
|
3% less compression
|
6% less compression
|
Notes: Time is end-to-end, including I/O time not accelerated by zlibNX. pigz and xgzip results are for HW accelerated compression using zlibNX. gzip uses software based compression.
Management of AIX Backups: Compressing mksysb files
The GZIP compression accelerator can be used to compress AIX backups generated through the use of the mksysb command. This can significantly reduce the size of the resulting backup file making it easier and faster to transfer to a different system or storage.
The simplest way to do this is to run the mksysb command with packing turned off (-p option specified) and then run xgzip or pigz on the resulting backup file. Note, once you are ready to restore the AIX backup, you will have to uncompress it before restoring it.
To capture a mksysb of your root volume group (rootvg) and compress it using xgzip the following commands can be used. In this example, the resulting uncompressed mksysb file is written to /data/mksysb and the compressed file is written to /data/mksysb.gz.
# mksysb -p /data/mksysb
# pigz -c /data/mksysb > /data/mksysb.gz
Similarly, these commands can be used to perform compression with xgzip:
# mksysb -p /data/mksysb
# xgzip -c /data/mksysb > /data/mksysb.gz
The resulting compressed backup can be uncompressed with these commands:
Using pigz:
# pigz -d -c /data/mksysb.gz > /data/mksysb
Using xgzip:
# xgzip -d -c /data/mksysb.gz > /data/mksysb
In the above examples, the -c option to pigz and xgzip cause the command to write to stdout and not delete the source file. If you would like to have pigz or xgzip delete the source file you can simply specify one of the following forms for compression:
# pigz /data/mksysb
# xgzip /data/mksysb
or one the following forms for decompression:
# pigz -d /data/mksysb.gz
# xgzip -d /data/mksysb.gz
More information on creating system backups can be found here: https://www.ibm.com/support/knowledgecenter/ssw_aix_72/install/create_sys_backup.html
Performance Evaluation
A performance study was performed on a partition on a E980 Power9 server running FW version FW940 and AIX 7.2 TL 4 configured with 4 dedicated processors (cores) in SMT-8 mode and 32 GB of dedicated memory. The partition had access to 1 NX unit (1 per multi-core chip) containing one GZIP compression accelerator.
Benchmarking was done using data from the Silesia compression corpus which includes typical data types used in modern processing including English text, executable programs, databases, source code, xml, and medical images. The data used is available here: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
Performance Summary
Hardware accelerated GZIP on POWER9 performs best when compressing larger amounts of data. For a minor decrease in compression ratio, POWER9 GZIP accelerated compression is significantly faster than software-only based GZIP: up to 190 times faster when using the zlibNX library compared against the standard zlib library and up to 29 times faster when using the xgzip command compared against the gzip command.
Decompression is not as CPU intensive as compression and therefore decompression performance benefits less from the POWER9 GZIP accelerator compared to compression.
Detailed Analysis
Compression throughput using zlibNX with a single SMT thread is shown in Figure 2 with a speedup of up to 102 for a 64 MB buffer input buffer. zlibNX performance surpasses that of zlib with an input buffer size of 16 KB with a speedup of 3.3.
Figure 2. GZIP Compression Throughput using zlibNX.
(default compression strategy, single SMT8 thread performance)
Figure 3 shows compression throughput for a multi-threaded application with up to 32 SMT8 threads using 1 to 4 processor cores compressing text data. zlib throughput performance peaks at ~625 MB/second (for 32 threads compressing 2 KB of data each) while zlibNX throughput peaks at ~5.8 GB/second (for 32 threads, 4 MB of data each).
Figure 3. Multithreaded Application Throughput.
(default compression strategy, 1 to 32 SMT8 threads performance, 1 to 4 processor cores)
Both zlibNX and xgzip see similar compression ratios when compared against zlib and gzip, respectively. Compression ratios for several compression options are shown in Figure 4. zlib has a compression ratio of approximately 3.11 compared to a ratio of approximately 2.68 for zlibNX using the default compression strategy. Note, the compression ratio can vary based on the type of data being compressed.
Figure 4. Compression Ratio of silesia.tar.
Decompression is not as CPU intensive as compression and therefore does not benefit as much from off-loading operations to the accelerator as compression does. Figure 5 shows that decompression using zlibNX has a speedup of up to 9 for a 64 MB buffer input buffer. zlibNX performance surpasses that of zlib with an input buffer size of 16 KB with a speedup of 1.4.
Figure 5. GZIP Decompression Throughput using zlibNX.
(default compression strategy, single SMT8 thread performance)
Compression throughput using the xgzip command is shown in Figure 6 and achieves up to a 31.7 speedup for a filesize of 8 MB and surpasses the performance of the gzip command for file sizes of 32KB and higher. xgzip utilizes the zlibNX library to leverage the accelerator. xgzip decompression throughput is shown in Figure 7 and achieves a speedup of up to 2.5 for a filesize of 8 MB and surpasses gzip for file sizes of 256 KB and higher. The compression ratio for xgzip compared to gzip is similar to that of zlibNX compared to zlib.
Figure 6. GZIP Compression Throughput using the xgzip command
Figure 7. GZIP Decompression Throughput using the xgzip command
Application Buffer Size Effect
Applications using the zlibNX (and zlib) control the size of input data to be compressed or uncompressed and the size of the output space the results are written to. Performance of the compression algorithms are dependent upon the size of this space. As shown in Figure 8, for compression, zlibNX is equivalent or faster than zlib for all reasonable buffer sizes and sizes of 16 KB and greater perform best. For decompression, zlibNX is equivalent or faster for all reasonable buffer sizes and size of 32 KB and greater perform best. Note that for decompression, equal size input and output buffers are not optimal when using zlibNX; sizes similar to the compression ratio are best and a 1:4 ratio (input vs output buffer size) is good for general use.
Figure 8. Application Buffer Size Effect: Ratio of time spent for zlib vs zlibNX processing using a range of application buffer sizes (where input size is the same as output size).
Accelerating Applications Using zlibNX
Today, more and more data is transferred between servers in the data center, the cloud, and to/from customer sites than ever before. Compression can provide a significant improvement by reducing the amount of data that has to be transferred. The performance study above of zlibNX shows that even with the default compression strategy a significant improvement can be made in compression throughput compared to the software based zlib library. This opens up new opportunities for utilizing compression in your applications.
Usage of zlibNX based compression in your own applications is fairly straightforward even if it does not already use compression. zlib itself was built to be unencumbered by patents and other legal requirements. The zlib API is very straightforward. A great example is the code for zpipe.c which compresses a file using the zlib inflated and deflate calls. Sample code is available here: https://zlib.net/zpipe.c
An example of how to use zlibNX to accelerate existing applications that dynamically link to the zlib library is shown above under “Enabling Existing Applications to Utilize zlibNX”.
Open Source Packages in the AIX Toolbox that Dynamically Link to zlib
Many of the open source packages available in the AIX Toolbox for Linux Applications dynamically link to the zlib library. These packages can be used with accelerated compression on POWER9 systems by linking them with the accelerated zlibNX without application modification as discussed above in the section titled: Enabling Existing Applications to Utilize zlibNX.
The following is a list of packages available today that dynamically link to zlib: parallel gzip (pigz), ImageMagick, MySQL, R, bbcp, bind, binutils, cairo, clamav, cups, curl, cvs, freetype2, ganglia, gcc, git, glib2, gnupg2, gnutls, httpd, lftp, libfontenc, libgd, libpng, libssh2, libtiff, libxml2, lynx, mariadb, mkfontscale, mongo-c-driver, neon, nginx, pcre, php, postgresql, proftpd, protobuf, python, python3, rrdtool, ruby, samba, serf, slang, sqlite, subversion, sudo, tcl, tightvnc, and wget.
Conclusion
zlibNX allows new and existing applications to perform high-speed compression, reduce processor utilization, improve disk usage, and optimize cross-platform exchange of data. This can lower the costs associated with data processing and transfer, while maintaining high performance and throughput.
For more information about zlibNX, see Data compression by using the zlibNX library.
Resources