File and Object Storage

 View Only

Operationalize AI with a data lake for IBM watsonx.data that is designed to improve with scale

By David Wohlford posted Tue August 08, 2023 12:16 AM

  

Many organizations today are moving from pilot projects to operationalizing AI – integrating it into their core business processes. Their growing catalog of AI workloads now require access to massive and continually expanding repositories of highly diverse datasets.

Managing all this data presents costs, risks, and challenges, including the content of the datasets, where the data’s stored, who has access, the regulatory and compliance requirements, and more. As organizations operationalize AI, these challenges grow, making it essential that they identify infrastructure  solutions that are optimized for both performance and cost.

A data lakehouse architecture is ideal and can be built with open-source software and commodity x86 hardware. Organizations can deploy quickly yet still scale massively by combining IBM Storage Ceph, an enterprise-grade software-defined storage platform, and IBM watsonx.data, an open, hybrid, governed fit-for-purpose data store.

Together, IBM Storage Ceph and IBM watsonx.data provide a highly performant data lakehouse that allows organizations to:

Consolidate multiple data types into a single elastic repository that expands capacity online as requirements grow;
Access data across on-premises and cloud environments through a single-entry point with a shared metadata layer leveraging open data  and open table formats;
Connect to storage and analytics environments in minutes and enhance trust in data with built-in governance, security, and automation;
Reduce data lake and lakehouse costs with unified data storage options and fit-for-purpose compute engines that are clustered and capable of scaling automatically.

IBM Storage Ceph Highlights

Scalable: Non-disruptively grow from as few as three nodes to thousands, capable of addressing billions of pieces of information.
Affordable: IBM Storage Ceph is software-defined storage built with open standards, which helps keep CAPEX and OPEX costs in line with underlying commodity hardware prices.
Data Availability: Using a fault tolerant architecture, data is distributed across multiple disks over multiple servers to provide a single storage cluster, with no single point of failure – thus ensuring data access is always available.
Feature Rich: Includes data reduction for disk usage optimization, partial or complete reads/writes with atomic transactions, replication and erasure coding for data protection, policy-based optimization, and much more.
Optimized: Self-healing and self-managing rebalancing data distribution throughout the cluster handles failures without interruption, automatically recovering to the desired predefined data resilience level. 
Capacity Planning: Built-in storage analytics help organizations monitor capacity utilization and growth so they can plan for near-term and long-term capacity needs.
Secured: Object lock for write-once-read many (WORM) data governance and protection; FIPS 140-2 cryptography; key management integration and server-side encryption.

IBM watsonx.data includes a 768 TB software license and support for IBM Storage Ceph to quickly get started with compliant object storage using your choice of x86 servers.

  Learn how easy to get started

  Read the data sheet

0 comments
28 views

Permalink