File and Object Storage

IBM Storage with NVIDIA DGX A100 Systems

By DOUGLAS O'FLAHERTY posted Fri February 12, 2021 02:38 PM

  

IBM Storage has been working with NVIDIA for many years. As a leader in scalable and high-performance storage for HPC, AI, enterprise cluster computing, and big data, we knew that NVIDIA GPU-accelerated computing was going to need very high data throughput. In fact, the development program for IBM Spectrum Scale 5.0 (November 2017) was focused on delivering the performance needed for the fastest GPU-accelerated supercomputers in the world, Summit and Sierra.

This week we completed our most recent reference architecture, IBM Storage with NVIDIA DGX POD. This document provides a prescriptive solution for IT administrators to deploy a validated solution with shared, extensible storage that is designed for deep learning, inference, data exploration, and other computationally and I/O-intensive work.
Half rack of DGX A100 Systems with IBM Storage

Organizations looking to satisfy the needs for multiple workloads, from providing “as-a-Service” access for small interactive jobs, to supporting cluster-wide jobs that make full use of  multi-GPU and multi-node resources will find answers to their questions in this document.

Because many of our clients are concerned about how to start and how to grow, we demonstrated growing both performance and scalability from two NVIDIA DGX A100 systems to a full rack of eight NVIDIA DGX A100 systems  connected with NVIDIA Mellanox InfiniBand networking IBM achieved a data throughput rate of over 90GB/s for the full rack which was delivered by using a pair of IBM Elastic Storage System 3000s that require only 4U of rack space. The unique design of IBM Spectrum Scale delivers nearly linear performance scalability and supports data tiering to save storage costs.

For the IT administrator, this reference architecture provides the specifications for networking, storage, and infrastructure that have been proven to enable improved scalability, performance, and cost-effective manageability.

IBM Storage Reference Architecture with NVIDIA DGX A100 Systems will also help data scientists enhance team productivity, data reuse, and logical data locality through shared storage that can integrate with the DL workflow.

IBM Storage provides a storage framework with performance, scalability, extensibility, and enterprise attributes that include data protection, data tiering, and hybrid cloud integration.

Read the IBM Storage with NVIDIA DGX POD reference architecture to learn more (https://www.ibm.com/downloads/cas/MJLMALGL) 

To learn more about IBM Storage for data and AI read: https://www.ibm.com/it-infrastructure/storage/ai-infrastructure

0 comments
236 views

Permalink