File and Object Storage

 View Only

S3 tiering to tape with NooBaa Part 1 – Introduction

By Nils Haustein posted Wed February 21, 2024 05:34 AM

  

Authored by Nils Haustein, Jan-Frode Myklebust, Guy Margalit, Khanh V Ngo

Introduction

Object storage solutions providing S3 capabilities become more and more relevant for new use cases around hybrid cloud storage and storage for data and AI. Modern Data Lakehouses use object storage as the backend storage for structured and unstructured data. Backup applications use object storage to backup and offload data. One common requirement is to tier objects to tape. This requires object storage architectures providing scalable and high performing storage in the front end and allowing to tier aged objects to tape. 

In this series of blog articles, we introduce a new and modern S3 object storage services that can be installed on IBM Storage Scale. The S3 object storage service is provided by an open-source software called NooBaa. NooBaa in combination with IBM Storage Scale and IBM Storage Archive Enterprise Editions allow transparent tiering of objects and buckets to tape.

This series consists of multiple parts, the parts will be linked here once published.

Part 1 (this article): Explains what NooBaa is and highlights the architecture of the solution integrating NooBaa with IBM Storage Scale.

Part 2: Demonstrates how to install and configure NooBaa on IBM Storage Scale file systems and use NooBaa services to PUT and GET objects and buckets using S3 clients.

Part 3: Provides a brief introduction to IBM Storage Scale information lifecycle management allowing to tier data to tape in combination with IBM Storage Archive. It also demonstrates how S3 buckets and objects can tiered to tape while providing seamless access to data.

Part 4: Highlights capabilities that can be used to improve usability for buckets and object stored on relatively slow tape devices by using S3 object metadata and tags. 

Part 5: In this part you can learn about the fundamentals of the AWS S3 Glacier API and how it can be used with NooBaa on IBM Storage Scale to manually control migration and recalls. 

Part 6: (not published yet) Demonstrate how NooBaa plugins can be used to automate migration and recalls of S3 objects from tape. Planned for April 2024, stay tuned. 

What is NooBaa?

NooBaa is a highly customizable and dynamic data gateway, providing S3 data services over any storage resource including S3, GCS, Azure Blob, Filesystems, etc. [1]. NooBaa provides S3 endpoints to users and application and allows full control over data placement with dynamic policies per bucket or account.

NooBaa grew up in the “container world” and is integral part of Red Hat Data Foundation (RDF). NooBaa is open source [2] and NooBaa-core standalone can be provided as standalone software package for Red Hat Linux.

The NooBaa-core standalone software package can be deployed on an IBM Storage Scale cluster providing the S3 endpoints while the buckets and objects are stored in IBM Storage Scale file systems. This is the foundation of modern S3 object storage services on IBM Storage Scale. Furthermore, by leveraging the integration of IBM Storage Scale with IBM Storage Archive S3 buckets and object stored in the file system can tiered to tape.

Disclaimer

When deploying open-source NooBaa-core standalone, there are a few things to consider:

  • NooBaa is open-source and can be used by anybody respecting the associated open-source license.
  • As open-source software NooBaa is work in progress with varying stability and functions.
  • Problem discovered with NooBaa can be addressed as issues in the repository on GitHub [2].

There are plans to integrate NooBaa in IBM Storage Scale as the new modernized S3 object stack. Initially, this integration does not support tiering to tape. However, you can use NooBaa as open-source solution to provide an S3 object storage with tiering to tape. 

Architecture with NooBaa on IBM Storage Scale

IBM Storage Scale is a modern data platform providing a comprehensive set of storage services, including but not limited to:

  • Clustered file systems with no single point of failure
  • Parallel and high performant data access on disk
  • Active – active stretched cluster architecture with synchronous replication and site failure tolerance
  • Date protection through snapshots, backup, and asynchronous replication across large distances
  • Data lifecycle management across different tiers of storage powered by an intelligent policy engine.

The picture below shows the architecture of the solution allowing to tier S3 objects on tape:

The open-source software NooBaa is installed on one or more IBM Storage Scale cluster nodes and provides the S3 object storage endpoints to the S3 users and applications. NooBaa is configured in name space file system mode (nsfs) allowing to store buckets and objects in file systems. The file systems are provided by IBM Storage Scale. Objects and buckets are stored on disk managed by IBM Storage Scale file systems. The IBM Storage Scale policy engine in combination with IBM Storage Archive is used to tier objects and buckets to tape. IBM Storage Archive manages the tape resources and writes the objects and buckets in LTFS format to tapes. 

The NooBaa configuration files and object data are stored in distinct shared directories of IBM Storage Scale file systems that are accessible by all NooBaa nodes. The NooBaa configuration files include account and bucket configuration as well as NooBaa service customization. The NooBaa configuration directory can be in the same file system where the buckets and objects are stored or in a different file system. It is recommended to use a different file system for the NooBaa configuration files.

In the next article in this series we demonstrate how easy it is to install, configure and use NooBaa S3 object storage services on IBM Storage Scale. If you want to try it out, you need an IBM Storage Scale cluster (single node is sufficient). We recommend IBM Storage Scale version 5.1.8+ on RHEL 8 or 9. 

References

[1] NooBaa documentation
https://www.NooBaa.io/

[2] NooBaa-core open source repository on GitHub
https://github.com/noobaa/noobaa-core

0 comments
68 views

Permalink