Primary Storage

 View Only

Elastic Storage with GPFS Native RAID performance test results with Tivoli Storage Manager over 40 GBit Ethernet

By Archive User posted Thu December 18, 2014 04:54 AM

  

Originally posted by: Nils Haustein


By Nils Haustein, Andre Gaschler, Sven Oehme and Joerg Walter

We would like to share the first preliminary results (see figure 1) of our Proof of Concept (PoC) running Tivoli Storage Manager (TSM) workloads on Elastic Storage with GPFS Native RAID (GNR) over 40 Gbit Ethernet connections. Thanks to Sven Oehme from IBM Research who provided us with the infrastructure and expertise configuring the GNR and TSM server systems.

The peak throughput results are amazing as you can see below:

 

Single TSM server BACKUP with multiple sessions

4017 MB/sec

 

Dual TSM server BACKUP with multiple sessions

4981 MB/sec

 

Single TSM server RESTORE with multiple sessions          

3834 MB/sec

 

Dual TSM server RESTORE with multiple sessions             

5424 MB/sec

 

Dual TSM server MIXED with multiple sessions  

4821 MB/sec

 

image

Figure 1: Summary of the 40 Gbit Ethernet tests with TSM and GNR

Test setup

The test setup is shown the figure 2 the TSM software (client and server) was installed on separate x86 servers, configured as GPFS NSD clients and connected via Ethernet to the Elastic Storage GNR system. The TSM servers used a single 40 Gbit line dedicated to each server. The Ethernet connection to the GNR system was also 40 Gbit/sec for each GNR server. The GNR system used for this test was comparable to an Elastic Storage Server model GL2, comprised of two GNR servers and 116 NL-SAS drives.

image

Figure 2: Proof of Concept Setup Overview

The TSM client workload was performed on the same servers as the TSM servers leveraging shared memory resources. In addition the TSM client workload was artificially generated in memory, using a special workload generator program. This eliminated TSM client system bottlenecks caused by limited disk and network performance.

The GNR system provided a single shared GPFS file system for the TSM servers. Both TSM servers were configured as GPFS NSD clients. As members of the GNR cluster they mounted the file system provided by the GNR system.

As shown in Figure 3 the single GPFS GNR file system used by the TSM server comprised two GPFS pools whereby each pool was configured on virtual disk (vdisk) providing 3-way fault tolerance. One pool – the system pool - was configured on two vdisk with 3-way replication to store the file system metadata. The other pool – the data pool – was configured on two vdisk with 8+2P RAID protection to store file system data including the TSM database and storage pools. Thus, TSM DB and storage pools were stored on the same GPFS file system pool and on the same vdisks that were distributed over all physical disk drives in the GNR system.

For each GPFS pool, two vdisk were configured with one vdisk per recovery group. Each of the two recovery groups was configured with one declustered array, using half of the physical disks comprised in the GNR system (58 disk per RG). Each of the two recovery group was managed by one GNR server.

 

image

Figure 4: GNR File system layout

Summary

To summarize the important findings for the tests with TSM on Elastic Storage GPFS Native RAID system using 40 Gbit Ethernet connections:

  • The single server tests achieved approx. 4 GB/sec, whereby the backup tests were a bit faster with 4017 MB/sec compared to the single server restore tests at 3834 MB/sec. The single server tests are limited to a single 40 Gbit Ethernet connection between the TSM server and the GNR system, providing a theoretical throughput of 5 GB/sec. The key limitation is most likely the overhead associated TCP/IP protocol used for the communication between the TSM server and the GNR system.
  • With the dual server tests two 40 GBit Ethernet connection have been used, one for each TSM server and one for each GNR server. So the theoretical bandwidth of the Ethernet connection was 2 x 40 Gbit (approx. 10 GB/sec).
  • The dual server tests achieved 5+ GB/sec, whereby the restore tests were a bit faster with 5424 MB/sec compared to the dual server backup tests at 4981 MB/sec. The dual server tests are limited by this particular GNR system with 116 NL-SAS disk providing a maximum throughput of +5 GB/sec. When higher throughput is required an ESS Model GL4 or GL6 or a combination of multiple ESS building block could be used.
  • The key difference to the previous PoC* that was using InfiniBand connection between the TSM servers and GNR system were:
    • The connection between the TSM servers and the GNR system was based on 40 Gbit Ethernet compared 56 Gbit InfiniBand. The InfiniBand connection does not only offer higher bandwidth but also provides Remote Direct Memory Access (RDMA). The Ethernet connection between the TSM servers and GNR system was configured to use TCP/IP causing additional protocol overhead.
    • The GNR system used in this PoC was configured with 116 NL-SAS disk, providing a nominal throughput of 5+ GB/sec. The GNR system used in the previous PoC was equipped with 348 NL-SAS disk providing a nominal throughput of 10+ GB/sec.

As next step we plan to test the TSM server workloads over a 10 Gbit Ethernet connections, stay tuned !!








2 comments
4 views

Permalink

Comments

Mon May 11, 2015 01:17 PM

Originally posted by: unixanalyst


Hi, Any ideas on a likely release date for the 10GbE version?

Wed December 24, 2014 01:56 AM

Originally posted by: NareshDalal


fantastic, waiting for the results on 10Gbit Ethernet as 40Gbit is hardly available with customers. Now GPFS file systems at the backend of TSM server, can we achieve LANFREE backup in this setup without any further configuration?