Perf Comparison of RGW QAT vs non QAT RGW daemons
This blog discusses IBM Ceph RGW QAT vs non-QAT daemons. QAT setup steps are documented in our official IBM Ceph docs.
Introduction to Ceph RGW QAT:
This feature was introduced as part of the IBM Ceph 7.0. Intel QAT (QuickAssist Technology) can provide extended accelerated encryption and compression services by offloading the actual encryption and compression request(s) to the hardware QuickAssist accelerators, which are more efficient in terms of cost and power than general purpose CPUs for those specific compute-intensive workloads. More details about this feature can be found here
The IBM Ceph ready nodes have the QAT Hardware onboard, and there is no need to procure an external QAT hardware (PCIe based etc).
Pre-requisistes:
- IBM Ceph cluster 7.x or above running on RHEL 9.x
- At least one of the nodes in the cluster has a QAT Hardware setup and is configured for use with Ceph RGW.
Compression Tests:
Ceph RGW supports four compression algorithms namely Zlib, Std, Snappy and Lz4 . Here we check the performance of all 4 for QAT against non-QAT RGW daemons for both compression ratios and CPU Util.
Data type |
QAT enabled |
RGW compression type |
Before Compression |
After compression* |
Ratio |
non QAT |
ISO |
Yes |
Zlib |
9.59G |
9.789G |
1:0.968 |
1:0.9675 |
Video file |
Yes |
Zlib |
305.22M |
297.72M |
1:0.975 |
1:0.9751 |
100G file |
Yes |
Zlib |
100G |
658M |
1:0.0064 |
1:0.0009 |
JSON DB file |
Yes |
Zlib |
413M |
95.32M |
1:0.231 |
1:0.206 |
Audio |
Yes |
Zlib |
123M |
106.3M |
1:0.868 |
1:0.867 |
RGW logs |
Yes |
Zlib |
335M |
332.7M |
1:0.995 |
1:0.994 |
the above table shows the Compression ratios for Zlib compression for a variety of data.
Compression Algorithm |
ISO COmpression Ratio |
Video Compression Ratio |
JSON DB file COmpression |
Audio file Compression Ratio |
Time for ISO Compression upload + Max speed |
Non QAT ISO upload + Max speed |
Zlib |
1:0.9933 |
1:0.9764 |
1:0.2305 |
1:0.8689 |
15.492s - 194.51 MB/s |
15.019s - 203.26 MB/s |
Zstd |
1:0.9928 |
1:0.9752 |
1:0.1913 |
1:0.8665 |
13.470s - 236.91 MB/s |
15.024s - 203.07 MB/s |
Snappy |
1:0.9947 |
1:0.9893 |
1:0.3440 |
1:0.9976 |
13.076s - 246.95 MB/s |
15.107s - 203.43 MB/s |
LZ4 |
1:0.9984 |
1:0.9823 |
1:0.3362 |
1:1.0014 |
13.038s - 247.90 MB/s |
15.092s - 201.78 MB/s |

We can see that Zlib compression has a significant improvements using QAT wrt CPU Util.
ISO File : 2.2G Fedora ISO
Video : Meeting Recording 306M
Json DB : LA City records 437M
Audio file : 123M wav file
Encryption Tests:
Here we tested the connections handled by RGW per second. QAT RGW daemon was SSL enabled, and the non-QAT daemon was running on HTTP port 8080.
Pre-requisite:
- We used a benchmark tool called hey ,details and setup instructions are here : https://github.com/rakyll/hey?tab=readme-ov-file
- QAT RGW was SSL-enabled with a self-signed certificate
- non-QAT RGW daemon running on HTTP port 8080
Configuration Parameters:
rgw_max_concurrent_requests 8192
rgw_thread_pool_size 8192
CLI Used :
QAT : # ./hey_linux_amd64 -z 20s -c 1000 --cpus 40 https://localhost --disable-compression
Non QAT : # ./hey_linux_amd64 -z 20s -c 1000 --cpus 40 http://localhost:8081 --disable-compression

CPU Util :
QAT : 3300-3400 %
Non QAT : 3400-3500%
Conclusion
We can see performance improvements where SSL is enabled for RGW with QAT with higher connections per second as well as lower CPU util. On the Compression side we can see significantly lower CPU Util using Zlib over QAT.
#Highlights#Highlights-home