AIX

Power9 GZIP Data Acceleration with IBM AIX

By Brian F. Veale posted Mon November 09, 2020 06:29 PM

  

IBM® POWER9TM  systems include a new GZIP based hardware accelerator that is supported on AIX 7.2. This article presents the technology, how to utilize it, and a performance study showing a significant speedup of compression when utilizing the hardware accelerator.

A new zlib library, the pigz command, and a new xgzip command is available on AIX® 7.2 that transparently utilize the accelerator significantly accelerating zlib-based compression. The new zlibNX library is available on the AIX expansion pack, pigz is available in the AIX Toolbox for Linux Applications, and the new xgzip command is available from the AIX web download packs. pigz and xgzip transparently take advantage of the accelerator when available through the use of zlibNX.

Contributors: @NICK STILWELL,
@Brian F. Veale, @Arnold Flores, and @Rodney Burnett

Hardware Acceleration on POWER9 Systems 

Each processor chip in a POWER9 server has an on-chip “nest” accelerator called the NX unit that provides specialized functions for general data compression, gzip compression, encryption, and random number generation. These accelerators are used transparently across the systems software stack to speed up operations related to Live Partition Migration, IPSec, JFS2 Encrypted File Systems, PKCS11 encryption, and random number generation through /dev/urandom.

The on-chip NX GZIP accelerator on POWER9 systems implements a high throughput deflate format (RFC 1951) compression engine capable of performing the equivalent work of tens to hundreds of cores. 

Acceleration of zlib and gzip operations on AIX

The zlib open source library is a widely used lossless data compression library that implements the DEFLATE (RFC1951), zlib (RFC1950), and gzip (RFC1952) compression formats through software algorithms.

 AIX 7.2 Technology Level 4 delivers a new zlibNX library and xzgip command that uses NX gzip compression acceleration when running on POWER9 servers starting with server firmware FW940 allowing for faster compression of files and speedup of middle-ware and applications that either dynamically link to the zlib library or are modified to use the new library.

 The zlibNX package on the AIX 7.2 Technology Level 4 Expansion Pack provides a compatible version of zlib which supports the sending of in-memory compression and decompression requests to the nest (NX) accelerator unit on the IBM® POWER9™ processor.

 The compressed data formats are portable across platforms. The NX-accelerated zlib library is provided as UNIX archive files that can be statically or dynamically linked to applications that currently use zlib. Because all of the function signatures are the same, existing zlib-enabled programs can use zlibNX.

 There are also several open source packages available in the AIX Toolbox for Linux Applications that link to the zlib library. Since packages in the AIX Toolbox are built to dynamically link to the zlib library, they can also take advantage of accelerated compression through zlibNX. Notable packages include: parallel gzip (pigz), MySQL, ClamAV (an antivirus engine), MariaDB, SQLite, PostgreSQL, mongo-c-driver (which is used to access MongoDB), and GIT. A full list of available packages linking to zlib today is available near the end of this article.

Additionally, IBM has released a port of parallel gzip (pgiz) and a new xgzip compression utility for AIX. The pigz and xgzip utilities link to the zlibNX library to take advantage of accelerated compression and functions similar to the well-known gzip utility.

Installation and Configuration on AIX 

System Requirements

The system must be a POWER9 system running FW version FW940 or later and the partition must be configured to run in the new POWER9 Processor Compatibility Mode that is enabled by FW version FW940. Note, this is a different mode than the default or POWER9_base mode that was available at the initial launch of the POWER9 line of systems.

Other system requirements will vary depending on the application workload running on the partition. Minimum recommendations are 1 processor and 6 GB of memory.

The FW level and configuration can be verified via the AIX command line via the prtconf command:
# prtconf
System Model: *
Machine Serial Number: *
Processor Type: PowerPC_POWER9
Processor Implementation Mode: POWER 9
Processor Version: PV_9_Compat
Number Of Processors: 4
Processor Clock Speed: 3000 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: *
Memory Size: 32768 MB
Good Memory Size: 32768 MB
Platform Firmware level: VH940_027
Firmware Version: IBM,FW940.00 (VH940_027)


The current processor compatibility mode can be verified from HMC that manages the partition. Navigate to the partition’s properties page, click the Processor tab and click Advanced. The Effective Processor Compatibility Mode must be set to POWER9 for AIX to be able to utilize the GZIP accelerator. Note that changes to the mode require a reboot of the partition in order to take effect.

 Setting the Processor Compatibility Mode to POWER9

Figure 1. Configuring the Processor Compatibility Mode via the HMC

Installation of the zlibNX library and pigz and xgzip Commands

The  zlibNX library is available on the AIX 7.2 TL 5 Expansion Pack as zlibNX.rte. To install the package using the installp command, mount the expansion pack media copy the zlibNX.rte fileset to the partition and then run installp.  In the example below, the expansion pack media is mounted on /dev/cd0.

# installp -aXgqY -d/dev/cd0 zlibNX.rte

Verify installation with the lslpp command:

# lslpp -l zlibNX.rte
  Fileset                      Level  State      Description
  ---------------------------------------------------------------------
Path: /usr/lib/objrepos
  zlibNX.rte                 7.2.4.0  COMMITTED  NX accelerated zlib
                                                 compression library

 
Install the parallel gzip (pigz) package from the AIX Toolbox for Linux Applications using yum as shown below:

# yum install pigz

Verify installation with the yum command:

# yum list installed pigz
Installed Packages
pigz.ppc                    2.4-1                       @AIX_Toolbox


Add /opt/freeware/bin to your path if it is not already there:

# echo $PATH
/usr/bin:/etc:/usr/sbin:/usr/ucb:/sbin:.
# export PATH=/opt/freeware/bin:$PATH


Next download the xgzip fileset from the AIX Web Download Pack Programs and install it using installp as shown below:

# installp -aXgqY -d ./xgzip.rte xgzip.rte

Verify installation with the lslpp command:

# lslpp -l xgzip.rte
  Fileset                      Level  State      Description
  ---------------------------------------------------------------------------
Path: /usr/lib/objrepos
  xgzip.rte                 4.0.20.0  COMMITTED  A command utility to exploit
                                                 NX accelerated zlib
                                                 compression library


Note, the zlibNX library was first made available on the AIX 7.2 TL 4 Expansion Pack.  Versions 7.2.4.0 and 7.2.4.2 have an issue that can affect the integrity of compressed archives.  These versions of the library are included in the IBM AIX V7.2 Expansion Pack  11/2019 and the IBM AIX V7.2 Expansion Pack 5/2020.  If you are using an affected version, it is recommended that you install the ifix for APAR IJ28579.  For more information see: https://www.ibm.com/support/pages/apar/IJ28579


Enabling Existing Applications to Utilize zlibNX

Applications that dynamically link to the standard zlib can be made to link with the accelerated zlibNX without application modification. There are several environment variables that can be set to load the zlibNX shared library. 

Set the LDR_PRELOAD or LDR_PRELOAD64 variable:

# LDR_PRELOAD="/usr/opt/zlibNX/lib/libz.a(libz.so.1)" <32-bit application>

# LDR_PRELOAD64="/usr/opt/zlibNX/lib/libz.a(libz.so.1)" <64-bit application>


Set the LD_LIBRARY_PATH variable:

# LD_LIBRARY_PATH=/usr/opt/zlibNX/lib:$LD_LIBRARY_PATH <application>


Set the LIBPATH variable:

# LIBPATH=/usr/opt/zlibNX/lib:$LIBPATH <application>

Example Usage with the SQLite Archiver Tool

The SQLite Archiver command (sqlar) is a tool released by the SQLite project. It is a command line utility that that takes a file or a list of files and creates an SQLite database with the files as stored as BLOBs (binary large objects). By default, the utility compresses files using zlib.

# time sqlar corpus.sqlar silesia_corpus/*
real    0m14.70s
user    0m4.17s
sys     0m0.10s

 

# export LDR_PRELOAD="/usr/opt/zlibNX/lib/libz.a(libz.so.1)"
# time sqlar corpus.sqlar silesia_corpus/*
real   0m1.40s
user   0m0.12s
system 0m0.25s

When linked with zlibNX versus the standard zlib, sqlar runs approximately 10x faster and reduces CPU time by 91%. Note: time is end-to-end, including I/O time not accelerated by zlibNX.

Using the pigz and xgzip Commands

pigz and xgzip uses similar flags and parameters as the gzip command. Below is an example of compressing a file (and keeping the original file) using gzip, pigz, and xgzip:

# time pigz –c mybackup > mybackup.gz
# time xgzip -c mybackup > mybackup.gz
# time gzip -c mybackup > mybackup.gz

 

 

gzip

xgzip (accelerated)

pigz (accelerated)

pigz vs gzip

xgzip vs gzip

real time

5m 14.75s

0m 19.07s

0m 11.38s

27.6x faster

16.5x faster

user time

1m 30.01s

0m 2.15s

0m 7.94s

84% less CPU time

94% less CPU time

sys time

0m 1.80s

0m 3.20s

0m 6.55s

compressed size (MBs)
Input file: 4287.89

1890.64
56% smaller

2133.29
50% smaller

2006.44
53% smaller

3% less compression

6% less compression

Notes: Time is end-to-end, including I/O time not accelerated by zlibNX. pigz and xgzip results are for HW accelerated compression using zlibNX. gzip uses software based compression.

Management of AIX Backups: Compressing mksysb files

The GZIP compression accelerator can be used to compress AIX backups generated through the use of the mksysb command. This can significantly reduce the size of the resulting backup file making it easier and faster to transfer to a different system or storage.

The simplest way to do this is to run the mksysb command with packing turned off (-p option specified) and then run xgzip or pigz on the resulting backup file.  Note, once you are ready to restore the AIX backup, you will have to uncompress it before restoring it.

To capture a mksysb of your root volume group (rootvg) and compress it using xgzip the following commands can be used. In this example, the resulting uncompressed mksysb file is written to /data/mksysb and the compressed file is written to /data/mksysb.gz.

# mksysb -p /data/mksysb
# pigz -c /data/mksysb > /data/mksysb.gz

 
Similarly, these commands can be used to perform compression with xgzip:

# mksysb -p /data/mksysb
# xgzip -c /data/mksysb > /data/mksysb.gz

 
The resulting compressed backup can be uncompressed with these commands:

Using pigz:

# pigz -d -c /data/mksysb.gz > /data/mksysb

 Using xgzip:

# xgzip -d -c /data/mksysb.gz > /data/mksysb

 

In the above examples, the -c option to pigz and xgzip cause the command to write to stdout and not delete the source file. If you would like to have pigz or xgzip delete the source file you can simply specify one of the following forms for compression:

# pigz /data/mksysb
# xgzip /data/mksysb

 or one the following forms for decompression:

# pigz -d /data/mksysb.gz
# xgzip -d /data/mksysb.gz

 
More information on creating system backups can be found here: https://www.ibm.com/support/knowledgecenter/ssw_aix_72/install/create_sys_backup.html

​​​Performance Evaluation

A performance study was performed on a partition on a E980 Power9 server running FW version FW940 and AIX 7.2 TL 4 configured with 4 dedicated processors (cores) in SMT-8 mode and 32 GB of dedicated memory. The partition had access to 1 NX unit (1 per multi-core chip) containing one GZIP compression accelerator.

Benchmarking was done using data from the Silesia compression corpus which includes typical data types used in modern processing including English text, executable programs, databases, source code, xml, and medical images. The data used is available here: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia

Performance Summary

Hardware accelerated GZIP on POWER9 performs best when compressing larger amounts of data. For a minor decrease in compression ratio, POWER9 GZIP accelerated compression is significantly faster than software-only based GZIP: up to 190 times faster when using the zlibNX library compared against the standard zlib library and up to 29 times faster when using the xgzip command compared against the gzip command.

 Decompression is not as CPU intensive as compression and therefore decompression performance benefits less from the POWER9 GZIP accelerator compared to compression.

Detailed Analysis

Compression throughput using zlibNX with a single SMT thread is shown in Figure 2 with a speedup of up to 102 for a 64 MB buffer input buffer. zlibNX performance surpasses that of zlib with an input buffer size of 16 KB with a speedup of 3.3.


GZIP Compression Throughput using zlibNX (default compression strategy, single SMT8 thread performance)

Figure 2. GZIP Compression Throughput using zlibNX.
(default compression strategy, single SMT8 thread performance)


Figure 3 shows compression throughput for a multi-threaded application with up to 32 SMT8 threads using 1 to 4 processor cores compressing text data. zlib throughput performance peaks at ~625 MB/second (for 32 threads compressing 2 KB of data each) while zlibNX throughput peaks at ~5.8 GB/second (for 32 threads, 4 MB of data each).

Multithreaded Application Throughput. (default compression strategy, 1 to 32 SMT8 threads performance, 1 to 4 processor cores)

Figure 3. Multithreaded Application Throughput.
(default compression strategy, 1 to 32 SMT8 threads performance, 1 to 4 processor cores)


Both zlibNX and xgzip see similar compression ratios when compared against zlib and gzip, respectively. Compression ratios for several compression options are shown in Figure 4. zlib has a compression ratio of approximately 3.11 compared to a ratio of approximately 2.68 for zlibNX using the default compression strategy. Note, the compression ratio can vary based on the type of data being compressed.


Compression Ratio of silesia.tar

Figure 4. Compression Ratio of silesia.tar.


Decompression is not as CPU intensive as compression and therefore does not benefit as much from off-loading operations to the accelerator as compression does. Figure 5 shows that decompression using zlibNX has a speedup of up to 9 for a 64 MB buffer input buffer. zlibNX performance surpasses that of zlib with an input buffer size of 16 KB with a speedup of 1.4.


GZIP Decompression Throughput using zlibNX. (default compression strategy, single SMT8 thread performance)

Figure 5. GZIP Decompression Throughput using zlibNX.
(default compression strategy, single SMT8 thread performance)


Compression throughput using the xgzip command is shown in Figure 6 and achieves up to a 31.7 speedup for a filesize of 8 MB and surpasses the performance of the gzip command for file sizes of 32KB and higher. xgzip utilizes the zlibNX library to leverage the accelerator. xgzip decompression throughput is shown in Figure 7 and achieves a speedup of up to 2.5 for a filesize of 8 MB and surpasses gzip for file sizes of 256 KB and higher. The compression ratio for xgzip compared to gzip is similar to that of zlibNX compared to zlib.


GZIP Compression Throughput using the xgzip command

Figure 6. GZIP Compression Throughput using the xgzip command

GZIP Decompression Throughput using the xgzip command

Figure 7. GZIP Decompression Throughput using the xgzip command

Application Buffer Size Effect

Applications using the zlibNX (and zlib) control the size of input data to be compressed or uncompressed and the size of the output space the results are written to. Performance of the compression algorithms are dependent upon the size of this space. As shown in Figure 8, for compression, zlibNX is equivalent or faster than zlib for all reasonable buffer sizes and sizes of 16 KB and greater perform best. For decompression, zlibNX is equivalent or faster for all reasonable buffer sizes and size of 32 KB and greater perform best. Note that for decompression, equal size input and output buffers are not optimal when using zlibNX; sizes similar to the compression ratio are best and a 1:4 ratio (input vs output buffer size) is good for general use.

Application Buffer Size Effect: Ratio of time spent for zlib vs zlibNX processing using a range of application buffer sizes (where input size is the same as output size).

Figure 8. Application Buffer Size Effect: Ratio of time spent for zlib vs zlibNX processing using a range of application buffer sizes (where input size is the same as output size).

Accelerating Applications Using zlibNX

Today, more and more data is transferred between servers in the data center, the cloud, and to/from customer sites than ever before. Compression can provide a significant improvement by reducing the amount of data that has to be transferred. The performance study above of zlibNX shows that even with the default compression strategy a significant improvement can be made in compression throughput compared to the software based zlib library. This opens up new opportunities for utilizing compression in your applications.

Usage of zlibNX based compression in your own applications is fairly straightforward even if it does not already use compression. zlib itself was built to be unencumbered by patents and other legal requirements. The zlib API is very straightforward. A great example is the code for zpipe.c which compresses a file using the zlib inflated and deflate calls. Sample code is available here: https://zlib.net/zpipe.c

An example of how to use zlibNX to accelerate existing applications that dynamically link to the zlib library is shown above under “Enabling Existing Applications to Utilize zlibNX”. 

Open Source Packages in the AIX Toolbox that Dynamically Link to zlib

Many of the open source packages available in the AIX Toolbox for Linux Applications dynamically link to the zlib library. These packages can be used with accelerated compression on POWER9 systems by linking them with the accelerated zlibNX without application modification as discussed above in the section titled: Enabling Existing Applications to Utilize zlibNX.

The following is a list of packages available today that dynamically link to zlib: parallel gzip (pigz), ImageMagick, MySQL, R, bbcp, bind, binutils, cairo, clamav, cups, curl, cvs, freetype2, ganglia, gcc, git, glib2, gnupg2, gnutls, httpd, lftp, libfontenc, libgd, libpng, libssh2, libtiff, libxml2, lynx, mariadb, mkfontscale, mongo-c-driver, neon, nginx, pcre, php, postgresql, proftpd, protobuf, python, python3, rrdtool, ruby, samba, serf, slang, sqlite, subversion, sudo, tcl, tightvnc, and wget.

Conclusion

zlibNX allows new and existing applications to perform high-speed compression, reduce processor utilization, improve disk usage, and optimize cross-platform exchange of data.  This can lower the costs associated with data processing and transfer, while maintaining high performance and throughput.

For more information about zlibNX, see Data compression by using the zlibNX library.

Resources

0 comments
57 views

Permalink