File and Object Storage

Optimize running NVIDIA GPU-enabled AI workloads with data orchestration solution

By Pallavi Galgali posted 26 days ago


As adoption of AI becomes mainstream in organizations, storage infrastructure to support the large amounts of datasets fueling AI models becomes critical. IBM storage has been working with NVIDIA on NVIDIA DGX POD reference architectures built with NVIDIA DGX systems and IBM Spectrum Scale storage. IBM is simplifying AI development by delivering high performance storage for NVIDIA GPUs in enterprise AI workflows. IBM Spectrum Scale combined with ESS 3000 2U NVMe building blocks delivers up to 40GB/s throughput with linear scalability as additional ESS 3000 nodes are attached to an NVIDIA DGX cluster.

IBM Storage for Data and AI now brings data orchestration capabilities that differentiate our solution from other storage options, transforming ESS 3000 from a “high performance storage” to “high performance smart storage tier.” It provides the ability to connect ESS 3000 to an organization’s file and object store data lakes and cache required data to allow AI modeling in an NVIDIA DGX POD. Only IBM can intelligently cache required data from file and object storage with a global federated namespace that can span up to 8YB. This approach allows IBM customers to optimize TCO and speed productivity while they enjoy superior high performance for their AI workloads.

IBM Storage for Data and AI offers customers the unique capabilities listed below:

  • Optimize TCO – This approach allows enterprises to eliminate unnecessary data copies and the associated data management, ultimately helping to minimize TCO for AI infrastructure.
  • Speed productivity – The caching functionality allows enterprises to prefetch required data or automatically evict unused data to create space for active data. This reduces the manual efforts involved in data movement and speeds overall productivity of the data pipeline.
  • Improve agility – NVIDIA DGX A100 systems use Multi-instance GPU (MIG) technology to allocate resources for each NVIDIA A100 GPU. MIG can partition the NVIDIA A100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Customers can mix analytics, training and inference jobs on every DGX system in a cluster, making it a universal system for AI. Data orchestration functionality offers the additional support from storage side for maintaining this agility by bringing in right data for the right workload in the high-performance storage tier.  

To deliver this differentiated value, we have enhanced IBM Spectrum Scale AFM (Active File Management) capability to become IBM’s native data mover. This data mover can now connect to any NAS or S3 based data stores like IBM Cloud Object Storage along with the IBM Spectrum Scale global federated namespace.

IBM Spectrum Discover becomes the brain behind this data orchestration by providing the ability to select the right dataset for movement based on its metadata indexing. IBM Spectrum Scale data mover combined with Spectrum Discover is designed to offer a comprehensive data orchestration solution that can serve use cases like active archiving, data migration, caching or AI acceleration. This allows IBM Spectrum Scale and ESS to deliver value for not just traditional on-prem AI, HPC, and HPDA workloads, but also hybrid cloud and edge computing workloads.

To continue to deliver best in class storage performance, we are working to support NVIDIA GPUDirect Storage (GDS) with IBM Spectrum Scale.   Additionally, IBM has released a deployment guide that demonstrates using IBM Spectrum Scale storage with Red Hat OpenShift on NVIDIA DGX systems.   

“Our strategy at IBM when designing storage for modern data and AI workloads goes a step beyond just building a high performance NVMe storage box. We build storage solutions to support our customers in their journey to AI that offer performance, scalability, cost optimization and required integrations to build enterprise-class infrastructure” said Sam Werner, Vice President of storage offering management at IBM.

“AI is helping enterprises create new business opportunities and transform their industries,” said Tony Paikeday Senior Director of Product Marketing, Artificial Intelligence Systems at NVIDIA. “The combination of NVIDIA DGX systems with IBM Spectrum Scale Storage provides customers with a flexible, intelligent solution for powering the entire AI data lifecycle.”

Check out following resources to learn more