Small objects are frequently used in AI & analytic workloads, financial services trading, among other use cases, which require good data access to satisfy the application’s requirements. IBM Spectrum Scale Data Access Services (DAS) S3 [1] is an extension of IBM Spectrum Scale Container Native (CNSA) [2], optimized for such workloads. It enables clients to access data that is stored in IBM Spectrum Scale file systems, mapping files to S3 objects and directories to S3 buckets, and vice versa. Data Access Services S3 runs in Kubernetes clusters built using Red Hat OpenShift [3], and leverages Red Hat OpenShift Data Foundation [4] (limited license included with IBM Spectrum Scale).
The purpose of this blog entry is to demonstrate the performance of Spectrum Scale DAS S3 when reading small objects with different sizes. This work is a follow up of the one presented in “IBM Spectrum Scale DAS 5.1.3.1 performance evaluation using COSBench” [5].
Benchmark Environment
To assess the performance of DAS S3, a dedicated environment (see Figure 1) was configured with the following components:
-
COSBench cluster: 6 COSBench [6] nodes, with 18 drivers and 1 controller, running on RHEL 8.6.
-
Shared data network: shared 100Gb Ethernet network for S3 access.
-
IBM Spectrum Scale DAS cluster: dedicated compact Red Hat OpenShift 4.11 cluster running on bare-metal x86_64 servers. Compact Red Hat OpenShift clusters are three-node clusters in which each Red Hat OpenShift node acts as combined master and worker node. These nodes are referred as Data Access Nodes (DAN).
-
Dedicated data network: dedicated 200Gb Ethernet network for Spectrum Scale and OpenShift.
-
Storage cluster: dedicated ESS 3200 [7], containing 24x3.84T NVMe drives.
Fig. 1. Benchmark environment.
The software levels used for the evaluation are listed as follows:
- Spectrum Scale Storage Cluster: Spectrum Scale 5.1.4-1
- Spectrum Scale CNSA Cluster: Spectrum Scale CNSA 5.1.6
- Spectrum Scale CSI: Spectrum Scale CSI 2.8.0
- Spectrum Scale DAS: Spectrum Scale DAS 5.1.6
- OpenShift: OCP 4.11.0
- OpenShift Storage: ODF 4.11.4
- COSBench: 0.4.2.c4
Performance Tuning
For this test, the value corresponding to maxTcpConnsPerNodeConn was adjusted in both Spectrum Scale installations (Storage cluster and DAS cluster). Also, maxMBpS was changed in the DAS cluster. In the storage cluster, the following command was executed, then Spectrum Scale was restarted.
mmchconfig maxTcpConnsPerNodeConn=8 |
In the DAS cluster, the cluster resource was modified by adding the new configuration parameters in the cluster profile.
spec: daemon: clusterProfile: maxTcpConnsPerNodeConn: "8" maxMBpS: "25000" |
Using the mmdas CLI command, the scale factor was set to 12. With this, DAS creates a total of 36 NooBaa [8] endpoints, evenly distributed among the 3 DAN nodes (12 per node). From a management node, the following command was executed.
mmdas service update s3 --scaleFactor=12 |
Tests Description
A COSBench cluster was defined with 18 drivers, distributed in 6 physical nodes, using 3 IP addresses to communicate with the DAS cluster. Workload from COSBench nodes 1-2, 3-4 and 5-6, was driven to DAN1, DAN2 and DAN3, respectively. This, using their corresponding IP addresses.
Two experiments were performed. Measurements for experiment 1 were gathered with the following configuration:
- Buckets: 10
- Objects: 1000 (per bucket)
- Test duration: 5 minutes per workstage
- Workers: 1, 8, 32, 64, 128, 256, 512
- Object size: 1KB, 4KB, 32KB, 64KB, 128KB, 256KB, 1MB, 4MB, 8MB
- Operation: 100% read
Measurements for experiment 2 were gathered with the following configuration:
- Buckets: 10
- Objects: 100, 200, 500, 800, 1000 (per bucket)
- Test duration: 5 minutes per workstage
- Workers: 512
- Object size: 1KB, 4KB, 32KB, 64KB, 128KB, 256KB, 1MB, 4MB, 8MB
- Operation: 100% read
Dashboards from the OpenShift Web console were used to monitor the resources utilization from the NooBaa endpoints perspective. All performance data was collected from the COSBench GUI.
Performance Results
The first experiment was configured with 1000 objects per bucket, executed with different object sizes and increasing the number of COSBench workers. The maximum number of operations per second measured was in the range of 33,942. This was obtained when using 512 workers and the smallest objects size (1KB). In general, the number of operations per second scales consistently as the number of COSBench workers increases. The summary of results corresponding to throughput is depicted in the following figure.
Fig. 2. Throughput comparison with different object sizes and number of COSBench workers.
The max bandwidth measured was 55 GB/s and was obtained when running the test with objects of 4MB and 512 workers (see Table 1). In terms of response time, as observed in Figure 3, smaller objects (from 1KB to 1MB) have measurements in the same ranges, from 4ms to 17ms. Larger object sizes showed the same curve, with larger response times.
Table 1. Bandwidth comparison with different object sizes and number of COSBench workers.
Fig. 3. Response time comparison with different object sizes and number of COSBench workers.
The aggregated CPU utilization corresponding to the NooBaa endpoints when executing the tests with 1KB objects and 8MB objects are depicted in Figures 4 and 5. It can be observed that CPU utilization increases in both charts, based on the number of COSBench workers. Starting with a very low utilization for 1 and 8 workers, and having a max utilization, around 50 CPUs, for 512 workers. CPU utilization charts corresponding to the tests with file sizes in between, follow the same behaviour.
Fig. 4. Compute resources pods dashboard - CPU usage for 1KB objects.
Fig. 5. Compute resources pods dashboard - CPU usage for 8MB objects.
The second experiment was configured with different objects sizes, changing the number of objects per bucket, and a fixed number of COSBench workers (512). The major difference in terms of throughput (ops/s) was observed when using 4MB and 8MB objects. The smaller sizes follow nearly the same trend as compared to experiment 1, with very close number of ops/s and GB/s (see Figure 6). With larger objects sizes and smaller number of objects per bucket, the number of operations per second increased. Since the number of operations per second is larger, the bandwidth measured values also increased, with a max bandwidth of 69 GB/s when running the test with 100 objects of 8MB (see Figure 7).
Fig. 6. Throughput comparison with different object sizes and number of objects.
Fig. 7. Bandwidth comparison with different object sizes and number of objects.
CPU utilization for cases corresponding to 100 and 1000 objects are depicted in the following figures. It can be observed that CPU utilization was higher when executing the test with a smaller number of objects per bucket (see Figure 8) as compared to a larger number of objects (see Figure 9).
Fig. 8. Compute resources pods dashboard - CPU usage for 100 objects per bucket.
Fig. 9. Compute resources pods dashboard - CPU usage for 1000 objects per bucket.
Conclusion
This blog entry described a set of experiments executed to evaluate the performance of the IBM Data Access Service (DAS), using COSBench and a variety of small objects sizes (1KB, 4KB, 32KB, 64KB, 128KB, 256KB, 1MB, 4MB, 8MB). It can be noted that the number of operations per second scales consistently, as the number of COSBench workers increases. For objects with sizes from 1KB to 1MB, the number of operations per second is in the same range, while when using larger objects (4MB and 8MB), the number of ops/s decreased, but the bandwidth increased. This caused by the object size.
Running tests with different number of objects per bucket also provided interesting results. The test with larger objects sizes and smaller number of objects per bucket, showed higher bandwidth. In future entries we plan to document tests with different environment setups and other benchmarking tools.
References
#Highlights-home
#Highlights