View Only

A Global Data Platform: Unifying Unstructured Data

By Matthew Geiser posted Thu March 24, 2022 08:00 AM

Like

Introduction

As artificial intelligence and machine learning become more pervasive and IT infrastructure options continue expanding, the need for a thoughtful information architecture for storing, securing, accessing, and using data becomes more clear. Data storage and management challenges are amplified by aggressive application development schedules and the growing demand for data-intensive AI/ML use cases.

Data Virtualization and Global Data Platform

IT infrastructure choices for business applications continue expanding and evolving from traditional data centers to public cloud, and even out to the edge and remote offices. These choices expand further as businesses begin using GPU-accelerated infrastructure for data-intensive use cases such as AI and machine learning. These choices highlight a critical requirement for better methods for accessing data and collaborating securely while eliminating data redundancy and inconsistencies.

As new applications and use cases roll out, data silos can appear across the various IT infrastructure options. These silos present challenges such as operational/cost inefficiencies and lack of data governance, highlighting a need for storage solutions that can unify unstructured data repositories across IT infrastructures.

A global data platform for file and object data can address these challenges.

To be successful a global data platform must offer three key characteristics.

Shared multi-protocol access with simultaneous access to the data using whichever protocol the workload requires. A global data platform must be able to “speak the language” of the applications and use cases. The data access must also be “multi-lingual” meaning certain applications will create and access data via a certain protocol, and others may require access to the same data via a different protocol.
Data caching for a single source of truth from edge to core to cloud ensuring users have access to the most up-to-date version of the data. A global data platform must offer data access independent from where the data exists. For example, when a cloud-native application requires high performance access to data stored in an S3 bucket, the global data platform should seamlessly execute “vertical data caching” and automatically fetch the data into the global data platform’s high-performance tier. Likewise, for cloud-bursting use cases, the global data platform should transparently execute “horizontal data caching” and make on-premises data available to public cloud infrastructure.
- Vertical caching to virtualize existing file and object storage while accelerating the AI data pipeline by using a high-performance tier to feed data-hungry GPUs
- Horizontal caching for agility to bring required data to applications when needed and enabling collaboration on a consistent cache easily and efficiently, independent of IT infrastructure choices

Data orchestration for cataloging and managing an organization’s unstructured data. A global data platform must offer visibility, control and automation facilitating data awareness, movement and policy-driven data life cycle management.
- Abstracting file and object storage repositories and enabling users to easily find required data without being aware of the IT infrastructure where the data is physically stored, for example, NVMe, fast disk, slow disk, tape or on public cloud.
- Keeping a single copy of data combined with the ability to tier data to the best storage, prefetching data as needed to high-performance storage and enforcing policies for data protection, enabling organizations to optimize and lower their storage costs.
- Implementing metadata augmentation and automated data enrichment to contextualize data with semantics and knowledge to optimize time to results for data-intensive workloads

IBM’s Global Data Platform

IBM’s global data platform is based on Spectrum Scale and offers industry leading performance and scalability coupled with enterprise-ready reliability and resiliency. IBM’s Elastic Storage System (ESS) offers the simplest and fastest way to deploy IBM’s global data platform. In a 2U form factor, ESS 3200 offers speed with NVMe flash, offering up to 80 GB/sec of read performance. Upcoming releases of ESS will continue offering industry leading performance and storage capacity.

As organizations work to define and execute a data fabric strategy, they find it insufficient without an approach for unifying unstructured data. IBM’s Spectrum Scale and ESS for a global data platform provides this data unification/single source of truth. IBM’s global data platform offers an unstructured data foundation for successful enterprise AI implementation. In addition, IBM’s close partnership with NVIDIA and the joint IBM Storage for AI and NVIDIA DGX reference architecture helps customers get started easily and quickly on their AI journey.

1 comment

43 views

Permalink

Comments

Eric Wendel

Fri April 01, 2022 03:51 PM

This is VERY cool, VERY relevant messaging that data-invested stakeholders in organizations of all sizes need to hear.

Data democratization -- clearly, IBM 'gets it'!

File and Object Storage