File and Object Storage

File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform 

 View Only

IBM Storage Scale CES S3 evaluation of large and small objects using COSBench

By Rogelio Rivera Gutierrez posted Mon September 09, 2024 04:28 PM

  

The AWS S3 API [1] established a de-facto standard to process unstructured data as objects. More and more customers integrate the S3 object access protocol in their workflows to acquire, process and manage unstructured data. To better support these evolving workflow requirements, IBM modernized Storage Scale’s built-in support for S3 access to data.

IBM CES S3 (non-containerized S3) [2] is part of IBM Storage Scale, formerly known as Spectrum Scale or GPFS. CES S3 provides high-performant and scalable S3 object access to data which is stored in Storage Scale filesystems. It is integrated with Cluster Export Services (CES) to provide highly available S3 access service in a GPFS cluster. It enables clients to access data that is stored in IBM Storage Scale file systems, mapping files to S3 objects and directories to S3 buckets, and vice versa.

The purpose of this blog is to describe the performance evaluation executed on a bare-metal CES S3 environment, including high-level configuration details, performance tunings applied, and results. This blog also showcases the performance of object operations, specifically focusing on reading and writing both large and small objects of varying sizes, to provide insights into their efficiency and effectiveness in different scenarios. Small objects are frequently used in AI & analytic workloads, financial services trading, and among other use cases, which require efficient data access to satisfy the application’s requirements. CES S3 (non-containerized S3) will optimize for such workloads. 

To assess the performance of IBM Storage Scale CES S3, we established an environment featuring a CES cluster and employed COSBench for conducting the measurements. COSBench, an open-source benchmarking tool developed by Intel, serves as the industry-standard for evaluating the performance of cloud object storage services. This tool enables us to execute write and read tests under various workload conditions, providing valuable insights into the capabilities and efficiency of the storage solution.

For comparison purposes, similar workloads were run as the works presented in “IBM Storage Scale DAS 5.1.3.1 performance evaluation using COSBench” [3] and “IBM Storage Scale CES S3 (Tech preview) Performance evaluation of large and small objects using COSBench” [4].

Benchmark Environment

The environment configured for the performance evaluation is illustrated in Figure 1:

  • COSBench cluster with each application node having:
    • x86_64 architecture
    • 2x CPU AMD EPYC 7F72 24-Core
    • 512GB Memory 
    • Bond – 2x 100GbE 1 port Ethernet
    • RHEL 9.3 OS
    • COSBench v0.4.2
  • CES cluster with each CES node having:
    • x86_64 architecture
    • 2x CPU Intel(R) Xeon(R) Gold 6346 CPU 32-Core
    • 256GB Memory
    • Bond1 – 2x 100GbE 1 port Ethernet (to APP nodes)
    • Bond2 – 2x 200GbE 1 port Ethernet (to ESS)
    • RHEL 9.3 OS
    • GPFS 5.2.1
    • Ganesha v5.7
    • Noobaa v5.15.3
  • GPFS Storage cluster – Dedicated ESS3200
  • 2x Mellanox SN3700 200GBE switches between CES nodes and IBM Storage System Cluster
  • 2x Mellanox SN2700 100GBE switches between Application and CES nodes
  • 1GBE switches for admin networks

Performance Tuning

In a CES S3 environment using COSBench tool [5], there are several factors that impact performance, especially the numbers of drivers, workers, and object sizes. Server hardware performance, network configuration and storage capability are important factors too. Another source of variability in the performance evaluation is the number of endpoint forks (Noobaa stack endpoints) [6] used in the CES S3 configuration. Performance engineering was done to optimize the benchmark environment. The result of this study was a collection of parameters fine-tuned accross various layers of the setup.

COSBench Configuration (App Nodes side)

In the controller.conf file in the COSBench application, 18 drivers were created to have total 3 drivers per each APP node.

GPFS and CES S3 Configurations (from CES Nodes)

IBM Storage Scale enables customers to change the Storage Scale configuration parameters [7]. The following configuration was used for performance evaluation.

  • General GPFS configuration:
Text Box: mmlsconfig Configuration data for cluster danrf.gpfs:------------------------------------------clusterName danrf.gpfsclusterId 15299468826331659279autoload nodmapiFileHandleSize 32minReleaseLevel 5.2.0.0tscCmdAllowRemoteConnections noccrEnabled yessdrNotifyAuthEnabled yescifsBypassTraversalChecking yessyncSambaMetadataOps yescifsBypassShareLocksOnRename yes[client]pagepool 48GnumaMemoryInterleave yesmaxFilesToCache 128kprefetchPct 50prefetchThreads 128maxStatCache 128kmaxMBpS 24000workerThreads 1024ignorePrefetchLUNCount yes[common]maxTcpConnsPerNodeConn 8cesSharedRoot /gpfs/rf-cesshared/cipherList AUTHONLYcesCidrPool <IP=172.20.100.50><ATTRIBUTE=><GROUP=group1><PREFIX=>+<IP=172.20.100.51><ATTRIBUTE=><GROUP=group1><PREFIX=>+<IP=172.20.100.52><ATTRIBUTE=><GROUP=group1><PREFIX=>adminMode centralFile systems in cluster danrf.gpfs:-----------------------------------(none)

  • Remote File systems were mounted in the Storage Scale environment (ess3200hpo1a-hs/ess3200hpo1b).
Text Box: mmremotecluster showCluster name:    test01.test.netCluster id:      15069619416586816484Contact nodes:   ess3200hpo1a-hs,ess3200hpo1b-hsSHA digest:      aa44835a00656595fda8b79be4efbede71ec56a7e48dbaffe3d3c74cfe4bd23dKey Expiration:  2034-05-13 14:01:10 (-0400)File systems:    rf-cesshared (rf-cesshared)  rf_fs1 (rf_fs1)

Text Box: mmremotefs show all
Local Name  Remote Name  Cluster name       Mount Point        Mount Options    Automount  Drive  Priority
rf-cesshared rf-cesshared test01.test.net    /gpfs/rf-cesshared rw               no           -        0
rf_fs1      rf_fs1       test01.test.net    /gpfs/rf_fs1       rw               no           -        0

  • 3 CES IPs were configured in the 3 CES-node cluster, which were specified as Noobaa endpoint IPs in the COSBench configuration (App nodes).
Text Box: mmlscluster --ces

GPFS cluster information
========================
  GPFS cluster name:         danrf.gpfs
  GPFS cluster id:           15299468826331659279

Cluster Export Services global parameters
-----------------------------------------
  Shared root directory:                /gpfs/rf-cesshared/
  Enabled Services:                     NFS SMB S3
  Log level:                            0
  Address distribution policy:          none

Node   Daemon node name            IP address       CES IP address list
-----------------------------------------------------------------------
   1   dan5ib.gpfs.net             172.20.200.136   172.20.100.50
   2   dan6ib.gpfs.net             172.20.200.137   172.20.100.51
   3   dan7ib.gpfs.net             172.20.200.138   172.20.100.52

  • Storage Scale CES S3 provides the mms3 CLI command that enables customers to adjust the configuration of the Noobaa S3 service. For this test, the endpoint forks were set to 12. With this, CES S3 creates a total of 36 NooBaa endpoints (12 per CES node).
Text Box: mms3 config list

 S3 Configuration: 
 ======================= 
 ALLOW_HTTP : true
 DEBUGLEVEL : default
 ENABLEMD5 : false
 ENDPOINT_FORKS : 12
 ENDPOINT_PORT : 6001
 ENDPOINT_SSL_PORT : 6443
 GPFSDLPATH : /usr/lpp/mmfs/lib/libgpfs.so
 NC_MASTER_KEYS_GET_EXECUTABLE : /usr/lpp/mmfs/bin/cess3_key_get
 NC_MASTER_KEYS_PUT_EXECUTABLE : /usr/lpp/mmfs/bin/cess3_key_put
 NC_MASTER_KEYS_STORE_TYPE : executable
 NSFS_DIR_CACHE_MAX_DIR_SIZE : 536870912
 NSFS_DIR_CACHE_MAX_TOTAL_SIZE : 1073741824
 NSFS_NC_CONFIG_DIR_BACKEND : GPFS
 NSFS_NC_STORAGE_BACKEND : GPFS
 UVTHREADPOOLSIZE : 16
=======================

Evaluation Description

A COSBench cluster was defined with 18 drivers, distributed in 6 physical nodes, using 3 CES IP’s addresses to communicate with the CES S3 cluster. Workload from COSBench nodes 7-8, 9-10 and 11-12, were run to 3 CES nodes, Dan5, Dan6, Dan7, respectively, this, using their corresponding IP addresses.

This evaluation focused on distinct scenarios: handling large objects and small objects for Read and Write operations. Each scenario was assessed using different numbers of workers and object sizes to understand how CES S3 manages different data workloads. Also, besides both large and small objects across a range of configurations, scenarios with low and high numbers of existing objects were evaluated to demonstrate how pagepool cache hit can benefit performance.

Details of these scenarios are described below:

SCENARIO WITH LOW VOLUME OF OBJECTS

LARGE OBJECTS

·       Buckets: 10 

·       Objects: 10 (evenly distributed in the 10 buckets).

·       Test duration: 5 minutes per work-stage

·       Workers: 1, 3, 6, 12, 24, 96, 192, 384

·       Object size: 1GB 

·       Operation: 100% read, 100% write


SMALL OBJECTS

·       Buckets: 10

·       Objects: 1000

·       Test duration: 5 minutes per workstage

·       Workers: 1, 3, 6, 12, 24, 96, 192, 384

·       Object size: 4KB, 32KB, 64KB, 128KB, 256KB 1MB, 4MB, 8MB

·       Operation: 100% read, 100% write

SCENARIO WITH HIGH VOLUME OF OBJECTS

LARGE OBJECTS

  • Buckets: 10 
  • Objects: 1000 (evenly distributed in the 10 buckets).
  • Test duration: 5 minutes per work-stage
  • Workers: 1, 3, 6, 12, 24, 96, 192, 384
  • Object size: 1GB 
  • Operation: 100% read


SMALL OBJECTS

  • Buckets: 10
  • Objects: Objects needed to achieve high capacity for each object size to minimize pagepool cache hit
  • Test duration: 5 minutes per workstage
  • Workers: 1, 3, 6, 12, 24, 96, 192
  • Object size: 4KB, 32KB, 64KB, 128KB, 256KB 1MB, 4MB, 8MB
  • Operation: 100% read

Dashboards from Grafana was used to monitor the resources utilization from the NooBaa endpoints perspective. The COSBench performance data is collected from the COSBench GUI

Maximum System Performance POXIS  (3 CES Nodes / ESS3200)

Maximum system sequential read and write performance capability of the CES Cluster (3 CES nodes) and the ESS3200 [8] storage system was examined using the IOR benchmark. Benchmark results demonstrated that the system can achieve peak throughput of 38 GB/s for sequential read operations and 25 GB/s for sequential write operations.

Performance results for READ Large Objects

This section shows the performance results gathered from the environment previously described and the characteristics of large object storage systems under two distinct scenarios: one with a low volume of objects and another with a high volume of objects. The purpose of these scenarios is to understand how the performance is influenced by the total space occupied by the objects, particularly in relation to the effectiveness of the pagepool cache mechanism employed by CES S3 nodes. We assess the system's scalability, efficiency, and potential bottlenecks based on the number of objects and workers.

The table below shows the performance results related to reading large objects, detailing both minimal and significant pagepool cache hit ratio in CES nodes.

Table 1. Performance Results for READ large objects (1GB) on CES S3 environment.

For reads, depend on the total objects space, performance can benefit from CES S3 nodes pagepool cache hit. When using low volume of objects, the pagepool cache hit ratio is about 40%, the max read bandwidth can reach 63 GB/s.  The pagepool cache hit ratio is close to 0% when large volume of objects are accessed, and the COSbench can achieve ~38GB/s read bandwidth, which is the maximum system bandwidth obtained from the read POSIX evaluation.

Fig. 2. Performance Results for READ large objects (1GB) on CES S3 environment.  

When a significant number of large objects are reading, and the capacity exceeds the pagepool size configured on CES S3 nodes, resulting in close to zero cache hit ratio. This behavior is illustrated in the Grafana dashboard shown below.

Fig. 3. GPFSFileSystem (gpfs_fs_bytes_read) and GPFSFileSystemAPI (gpfs_fis_bytes_read) bandwidth report from Grafana when READING Large objects (1GB) on CES S3 environment when close to 0% pagepool cache hit ratio.

Read performance can benefit from the page pool cache hit ratio when reading a low volume of objects and capacity is considerably less as the pagepool size. This behavior is depicted in the following Grafana dashboard while the file system shows a maximum performance about 33 GB/s, and the application (FIS) reports a max performance close to 63 GB/s, indicating about 40% pagepool cache hit ratio.

Fig. 4. GPFSFileSystem (gpfs_fs_bytes_read) and GPFSFileSystemAPI (gpfs_fis_bytes_read) bandwidth report from Grafana when READING Large objects (1GB) on CES S3 environment when cache hit ratio is about 40%.

Performance results for WRITE Large Objects

The following table shows the performance results associated with writing large objects.

Table 2. Performance Results for WRITE large objects (1GB) on CES S3 environment.

The current CES S3 configuration can also push the storage file system to its limit for write, having a peak of maximum performance of 25.44 GB/s.

Fig. 5. Performance Results for WRITE Large objects (1GB) on CES S3 environment.

Performance results for Small Objects

This section presents the performance results for various small object sizes and worker counts (1, 3, 6, 12, 24, 96, 192). It also examines scenarios where IOs exhibit a range of page pool cache hit rates, from minimal to varying levels of cache hit ratio. The performance of small objects is assessed by measuring the number of operations per second (op/s).

The following figures show the performance results related to reading small objects where IOs have minimal and various pagepool cache hits ratio. 

Fig. 6. Performance Results for READ Small objects on CES S3 environment with minimal and various IOs pagepool hit.        

The maximum number of operations per second measured for READ with minimal cache hit ratio is 47.8K op/s with 384 workers and 4KB object size, versus 56K op/s for the same object size and number of workers in a scenario with the benefit of higher cache hit ratio.

The following chart show the performance results related to writing small objects. On this experiment, maximum performance throughput is 5.6K op/s obtained with a workload of 96 workers and 4KB object size.

Fig. 7. Performance Results for WRITE Small objects on CES S3 environment.

Conclusions 

This blog entry outlined a series of tests designed to evaluate the performance of IBM Storage Scale CES S3, using COSBench for read and write operations with both large and small objects. It also provided some tunings to improve test performance and described the impact of the pagepool cache hit ratio on performance measurements.

When handling a large volume of objects that exceed the page pool size on CES S3 nodes, it results in a close to 0 cache hit ratio, obtaining performance metrics matching those from POSIX evaluations, with system limits of about 38 GB/s for reads and 25 GB/s for writes. On the contrary, when the page pool cache hit ratio is high, it significantly benefits performance, for reads, achieving peaks of read bandwidth close to 63 GB/s with close to 40% cache hit ratio in this environment.

Benefit for read performance is also observed in small objects for scenarios where cache hits ratio is higher, the peak number of operations per second for reads on this experiment using 4KB object size and 384 workers was 56K op/s, while the maximum performance observed is 47.8K op/s when cache hit ratio is minimal. In contrast, for writing small objects, max performance of 5.6K op/s was obtained.

Performance engineering work will continue with the execution of diverse scenarios. In future entries, we will describe performance evaluations using COSBench with different workload characteristics as well as other benchmarking tools.

References

[1] Amazon S3 REST API Introduction. https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html 

[2] CES S3 support page https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=gpfs-s3-support-overview 

[3] IBM Spectrum Scale DAS 5.1.3.1 performance evaluation using COSBench. https://community.ibm.com/community/user/storage/blogs/silvana-de-gyves-avila1/2022/05/20/ibm-data-access-services-performance-evaluation

[4] IBM Storage Scale CES S3 (Tech preview) Performance evaluation of large and small objects using COSBench. https://community.ibm.com/community/user/storage/blogs/rogelio-rivera-gutierrez/2024/04/25/ibm-storage-scale-performance-ces-s3-tech-preview

[5] COSBench - Cloud Object Storage Benchmark. https://github.com/intel-cloud/cosbench

[6] NooBaa Software Defined Storage. https://www.noobaa.io/

[7] IBM Spectrum Scale: multi-connection over TCP (MCOT): tuning may be required. https://www.ibm.com/support/pages/node/6446651

[8] IBM Elastic Storage System 3200 data sheet. https://www.ibm.com/downloads/cas/MQ4MY4WV


#Highlights
#Highlights-home
0 comments
61 views

Permalink