File and Object Storage

 View Only

IBM Storage Ceph IBM Redbooks

By Daniel Alexander Parkes posted 21 days ago

  

IBM Storage Ceph IBM Redbooks

Introducing the latest addition to the IBM Storage Ceph IBM Redbook family: "IBM Storage Ceph as a Data Lakehouse Platform for IBM watsonx.data and Beyond" This new publication dives into how IBM Storage Ceph, with its robust and scalable object storage capabilities, is ideally suited for modern Data Lakehouse environments. Leveraging the efficiency and performance of Iceberg for curated tabular data, IBM Storage Ceph stands out as a premier on-premises object store solution. Using a quote from the initial chapter of the book:

"This is where IBM Storage Ceph takes the stage. IBM Storage Ceph is open source, software-defined, runs on industry-standard hardware, and has best-in-class coverage of the lingua franca of object storage, the Amazon S3 API. It was designed from the ground up as an object store, contrasting with approaches that bolt-on S3 API servers to a distributed filesystem. With Ceph, data placement is by algorithm instead of by lookup. This allows Ceph to scale well into the billions of objects, even on modestly sized clusters. Data stored in Ceph is protected with efficient erasure coding, in-flight and at-rest checksums and encryption, and robust access control that thoughtfully integrates with enterprise identity systems."

Chapter three of the book demonstrates how the IBM Storage Ceph Object Storage Feature Set is a perfect candidate for your on-prem Datalake/Lakehouse requirements; each section covers Scale, Performance, Security, Efficiency, Resiliency, Cost Effectiveness and Management.

In Chapter Six of the book, we take you through a hands-on example with a Retail use case, where we, step by step, take you through the setup and configuration of the required data pipelines and transformations. We start with raw, unstructured data: browser logs, transaction data, and customer feedback, ending up with easily consumable curated data, allowing us to create visualization dashboards to interpret and take action on the data:

Chapter Six provides comprehensive, hands-on examples of configuring various IBM Storage Ceph Object features. This chapter serves not only as an illustrative use case but also as a detailed configuration guide. Here is a brief list of some features we cover:

  • Object Storage S3 Life Cycle Policy Expiration filtering by tags.
  • Object Data at rest encryption using S3 SSE-KMS.
  • Object Storage Secure Token Service. IDP authentication through SSO.
  • Object Storage IAM Role RBAC-based Authorization.
  • Object Storage Bucket Notification Integration with Knative & Kafka.
  • S3 Object Lock and Versioning Configuration. 
  • Object Multi-Factor Authentication delete.
  • Object Cold Storage Classes, Data Tiering Configuration.
  • Object Early data query and filtering with S3 Select.

Also, it's important to remember that the books have an associated GitHub repo containing the code and examples of the configurations used during Chapter Six: https://github.com/IBMRedbooks/IBM-Storage-Ceph-as-a-Data-Lakehouse-platform-for-IBM-watsonx.data-and-Beyond

IBM Storage Ceph already has two other IBM Books available:

For more information, you can visit the official IBM Redbooks website: IBM Redbooks.

Enjoy!

0 comments
23 views

Permalink