The AWS S3 API [1] established a de-facto standard to process unstructured data as objects. More and more customers integrate the S3 object access protocol in their workflows to acquire, process and manage unstructured data. To better support these evolving workflow requirements, IBM modernized Storage Scale’s built-in support for S3 access to data.
IBM CES S3 (non-containerized S3) [2] is part of IBM Storage Scale, formerly known as Spectrum Scale or GPFS. CES S3 provides high-performant and scalable S3 object access to data which is stored in Storage Scale filesystems. It is integrated with Cluster Export Services (CES) to provide highly available S3 access service in a GPFS cluster. It enables clients to access data that is stored in IBM Storage Scale file systems, mapping files to S3 objects and directories to S3 buckets, and vice versa.
The purpose of this blog is to describe the performance evaluation executed on a bare-metal CES S3 environment, including high-level configuration details, performance tunings applied, and results. This blog also showcases the performance of object operations, specifically focusing on reading and writing both large and small objects of varying sizes, to provide insights into their efficiency and effectiveness in different scenarios. Small objects are frequently used in AI & analytic workloads, financial services trading, and among other use cases, which require efficient data access to satisfy the application’s requirements. CES S3 (non-containerized S3) will optimize for such workloads.
To assess the performance of IBM Storage Scale CES S3, we established an environment featuring a CES cluster and employed COSBench for conducting the measurements. COSBench, an open-source benchmarking tool developed by Intel, serves as the industry-standard for evaluating the performance of cloud object storage services. This tool enables us to execute write and read tests under various workload conditions, providing valuable insights into the capabilities and efficiency of the storage solution.
For comparison purposes, similar workloads were ran as the works presented in “IBM Storage Scale DAS 5.1.3.1 performance evaluation using COSBench” [7] and “IBM Data Access Services 5.1.6 read performance evaluation of small objects using COSBench” [8].
Benchmark Environment
The environment configured for the performance evaluation is illustrated in Figure 1:
- COSBench cluster with each application node having:
- x86_64 architecture
- 2x CPU AMD EPYC 7F72 24-Core
- 512GB Memory
- Bond – 2x 100GbE 1 port Ethernet
- RHEL 9.3 OS
- COSBench v0.4.2
- CES cluster with each CES node having:
- x86_64 architecture
- 2x CPU Intel(R) Xeon(R) Gold 6346 CPU 32-Core
- 256GB Memory
- Bond1 – 2x 100GbE 1 port Ethernet (to APP nodes)
- Bond2 – 2x 200GbE 1 port Ethernet (to ESS)
- RHEL 9.3 OS
- GPFS 5.2.0
- Ganesha v5.7
- Noobaa v5.15.0
- GPFS Storage cluster – Dedicated ESS3200
- 100 GbE switch between Application and CES nodes
- 200 GbE switch between CES nodes and IBM Storage System Cluster
Performance Tuning
In a CES S3 environment using COSBench tool [4], there are several factors that impact performance, especially the numbers of drivers, workers, and object sizes. Server hardware performance, network configuration and storage capability are important factors too. Another source of variability in the performance evaluation is the number of endpoint forks (Noobaa stack endpoints) used in the CES S3 configuration. Performance engineering was done to optimize the benchmark environment. The result of this study was a collection of parameters fine-tuned accross various layers of the setup.
COSBench Configuration (App Nodes side)
In the controller.conf file in the COSBench application, 12 drivers were created to have total 2 drivers per each APP node.
GPFS and CES S3 Configurations (from CES Nodes)
IBM Storage Scale enables customers to change the Storage Scale configuration parameters [6]. The following configuration was used for performance evaluation.
- General GPFS Configuration
- Remote File systems were mounted in the Storage Scale environment (ess3200hpo1a-hs/ess3200hpo1b).
- 3 CES IPs were configured in the 3 CES-node cluster, which were specified as Noobaa endpoint IPs in the COSBench configuration (App nodes)
- Storage Scale CES S3 provides the mms3 CLI command that enables customers to adjust the configuration of the Noobaa S3 service. For this test, the endpoint forks were set to 12. With this, CES S3 creates a total of 36 NooBaa endpoints (12 per CES node).
Test Description
A COSBench cluster was defined with 12 drivers, distributed in 6 physical nodes, using 3 CES IP’s addresses to communicate with the CES S3 cluster.
Performance evaluation was done for large objects and small objects. Measurements for large objects were gathered with the following configuration:
- Buckets: 10
- Objects: 100 (evenly distributed in the 10 buckets)
- Test duration: 5 minutes per work-stage.
- Workers: 1, 8, 32, 64, 128, 256, 512
- Object size: 1GB.
- Operation: 100% read, 100% write
For the below configuration , Performance metrics was calculated for the small objects:
- Buckets: 10
- Objects: 1000 (per bucket / per object size)
- Test duration: 5 minutes per Workstage.
- Workers: 1, 8, 32, 64, 128, 256, 512
- Object size: 4KB, 32KB, 64KB, 128KB, 256KB 1MB, 4MB, 8MB.
- Operation: 100% read, 100% write
Performance Results for Large Objects
This section shows the performance results gathered from the environment previously described.
The performance evaluation for reading and writing of large objects of size 1GB are described in the following sections, that show the performance results when writing and reading data with different number of workers: 1, 8, 32, 64, 128, 256 and 512.
Performance results for Reading Large Objects
For the read tests with 1GB objects, the max bandwidth measured was 63.64 GB/s and a success ratio of 100%, as illustrated in Table 2 and Fig 4.
Table 1. Performance Results for READ Large Objects (1GB) on CES S3 environment.
For read operation with large objects, the 60GB/s bandwidth achieved is not limited by ESS3200, but it is close to the networking limit.
Fig 2. Large objects read performance results: (a) COSBench Web console summary. (b) Bandwidth per work-stage increasing number of COSBench workers.
Fig 3. CPU Utilization of CES nodes when reading large objects using COSBench.
Performance results for Writing Large Objets
For the write tests with 1GB objects, the max bandwidth measured was 24.17 GB/s, and a success ratio of 100%, which is close to the max of what the ES3200 can provide. This is illustrated in Table 1 and Fig 2.
Table 2. Performance Results for WRITE Large Objects (1GB) on CES S3 environment
Fig 4. Large objects Write performance results: (a) COSBench Web console summary. (b) Bandwidth per work-stage increasing number of COSBench workers.
Fig 5. CPU Utilization of CES nodes when writing large objects using COSBench.
Performance results for Small Objects
For small objects, we evaluate the number of performance operations per second (op/s) for read and write bandwidth (MB/s).
Performance results for Reading Small Objects
The following charts show the performance measurement when reading data with different number of workers: 1, 8, 32, 64, 128, 256, and 512 using small object sizes of 4KB, 32KB, 64KB, 128KB, 256KB, 1MB, 4MB, and 8MB.
The maximum number of operations per second measured was 56000 for a workload with 512 workers and 4KB object size.
It is notable that for reading small objects, the number of operations per second scales consistently as the number of COSBench workers increases.
Fig 6. Throughput comparison with different object sizes and number of COSBench workers.
For objects size of 4MB and 256 workers the max bandwidth obtained was 70GB/s.
Fig 7. Bandwidth comparison with different object sizes and number of COSBench workers.
Fig 8. CPU Utilization of CES nodes when reading small objects using COSBench
Performance results for Writing Small Objects
The following charts show the performance measurement when writing data with different number of workers: 1, 8, 32, 64, 128, 256, and 512 using small object sizes of 4KB, 32KB, 64KB, 128KB, 256KB, 1MB, 4MB, and 8MB.
The maximum number of operations per second measured was 4682 for a workload with 128 workers and 32KB object size.
Fig 9. Throughput comparison with different object sizes and number of COSBench workers.
The max bandwidth measured was 8.7GB/s and was obtained when running tests with objects of 8MB and 256 workers.
Fig 10: Bandwidth comparison with different object sizes and number of COSBench workers.
Fig 11. CPU Utilization of CES nodes when writing small objects using COSBench.
Conclusions
This blog entry described a set of tests executed to evaluate the performance of the IBM Storage Scale CES S3, using COSBench and large objects (1GB) and small objects. It also provided some tunings that were done to improve the performance of the tests.
With the current cluster setup, the bandwidth measured for CES S3 for reading large objects is 63 GB/s and 24 GB/s for writes.
For reading small objects, the maximum number of operations per second was in the range of 56000, using object sizes of 1KB and 4 KB. Interesting bandwidth results were observed with 4MB object size in combination with 256 and 512 workers, getting peaks of 70GB/s.
Also, it was observed that CPU utilization increased based on the number of COSBench workers. Starting with a very low utilization for 1 and 8 workers and having a max utilization for greater number of workers.
Performance engineering work will continue with the execution of diverse tests. In future entries, we will describe performance evaluations using COSBench with different workload characteristics [9] as well as other benchmarking tools.
References
[1] Amazon S3 REST API Introduction. https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html
[2] CES S3 support page <link>
[3] NooBaa Software Defined Storage. https://www.noobaa.io/
[4] COSBench - Cloud Object Storage Benchmark. https://github.com/intel-cloud/cosbench
[5] IBM Elastic Storage System 3200 data sheet. https://www.ibm.com/downloads/cas/MQ4MY4WV
[6] IBM Spectrum Scale: multi-connection over TCP (MCOT): tuning may be required. https://www.ibm.com/support/pages/node/6446651
[7] IBM Spectrum Scale DAS 5.1.3.1 performance evaluation using COSBench. https://community.ibm.com/community/user/storage/blogs/silvana-de-gyves-avila1/2022/05/20/ibm-data-access-services-performance-evaluation
[8] IBM IBM Data Access Services 5.1.6 read performance evaluation of small objects using COSBench. https://community.ibm.com/community/user/storage/blogs/silvana-de-gyves-avila1/2023/01/11/ibm-data-access-services-516-read-performance-eval
#IBM
#IBMStorageScale
#ces