File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform

View Only

Back to Blog List

S3 tiering to tape with NooBaa Part 5 - AWS S3 Glacier API fundamentals

By Nils Haustein posted Fri March 22, 2024 07:20 AM

In part 3 of this article series, we explained basic techniques to tier (migrate and recall) S3 objects from disk to tape using NooBaa, IBM Storage Scale and IBM Storage Archive EE. In part 4, we demonstrated techniques leveraging object metadata and tags to better control migrations and recalls. Object metadata and tagging requires discipline from the S3 user to apply these metadata fields and tags consequently. Unless transparent recalls are disabled using Storage Archive options, there is still the danger of getting many transparent recalls that can lead to recall storms.

To avoid recall storms, NooBaa supports the AWS S3 Glacier API. Glacier provides additional operations on top of the S3 API facilitating retrieval or restoration of objects from high latency media like tapes. Object stored in a Glacier storage class cannot be retrieved using standard GET operations. Instead, the S3 user must first use the Restore-Object operation provided by the Glacier API to request the recall from tape. After the object was recalled from tape to disk, the user can use the standard GET operations to retrieve the object.

In this part of the blog article series, we explain the fundamentals of AWS S3 Glacier and demonstrate how it works with NooBaa in combination with Storage Scale. We do not migrate and recall objects during this demonstration because it requires manual interventions. We will elaborate on automation of migration and recalls along with Glacier operations in the next part of this series.

Note: the IBM Storage Scale S3 object storage service does not yet support the AWS Glacier API. Glacier cannot be enabled or used with the IBM Storage Scale S3 service.

NooBaa Glacier fundamentals

In this section we demonstrate the fundamental behavior of Glacier with NooBaa on IBM Storage Scale. We show manual steps to prepare Glacier objects for restoration to foster the understanding of the Glacier implementation with NooBaa NSFS. In the next article, we will demonstrate automation for all many manual steps shown below.

Enable Glacier in NooBaa

To enable the basic Glacier function in NooBaa, two parameters must be added to the NooBaa configuration file (for more details about the NooBaa configuration file refer to Section Customize and start the NooBaa service of Part 2 in this series). The two configuration parameters are NSFS_GLACIER_ENABLED=true and NSFS_GLACIER_LOGS_ENABLED=false as shown in the example below:

# cat /ibm/cesshared/noobaa/config.json

{

"ENDPOINT_FORKS": 0,

"UV_THREADPOOL_SIZE": 64,

"ALLOW_HTTP": true,

"NSFS_GLACIER_ENABLED": true,

"NSFS_GLACIER_LOGS_ENABLED": false

}

Note, setting NSFS_GLACIER_LOGS_ENABLED to false will not log the objects to be migrated or recalled in a log file. We use this option to demonstrate manual steps. When automating migration and recalls of Glacier objects, this option is set to true.

If the parameter NSFS_GLACIER_LOGS_ENABLED is set to true (default) or when running a downstream version of NooBaa, then an additional directory must be created. Check if these the following directory exists:

# ls -l /var/run/noobaa-nsfs/wal/

If this directory does not exist, then create this directory and allow the user running the NooBaa service read and write access:

# mkdir -p /var/run/noobaa-nsfs/wal

To make the new configuration parameter effective the NooBaa service must be restarted:

# systemctl restart noobaa_nsfs

Objects are associated with Glacier through the storage class GLACIER. The storage class is provided during PUT operation by the S3 user. Objects associated with the storage class GLACIER cannot be GET without having them restored.

Store and migrate Glacier objects

As S3 user, let’s create a new bucket glacier1 where we store the objects associated with the storage class GLACIER:

# s3u1 mb s3://glacier1

Now, we PUT an object into the new bucket and associate the storage class GLACIER:

# s3u1 cp coldfile0 s3://glacier1 --storage-class GLACIER

To determine the storage class of an object the s3 API head-object operation can be used:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

"AcceptRanges": "bytes",

"LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

"ContentLength": 6449152,

"ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

"ContentType": "application/octet-stream",

"Metadata": {

"storage_class": "GLACIER"

"StorageClass": "GLACIER"

}

The objects in storage class GLACIER can be migrated by using the Storage Scale policy engine. The file associated with the object has a user attribute user.storage_class set to GLACIER as shown below:

# mmlsattr -L -d /ibm/fs1/buckets/glacier1/coldfile0

file name: /ibm/fs1/buckets/glacier1/coldfile0

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: system

fileset name: buckets

snapshot name:

creation time: Mon Mar 11 15:30:28 2024

Misc attributes: ARCHIVE

Encrypted: no

user.noobaa.content_type: "application/octet-stream"

user.storage_class: "GLACIER"

It is also possible to store non-Glacier objects in the bucket glacier1. This can be accomplished by omitting the parameter –storage-class GLACIER during the PUT operation. As a result, NooBaa sets the file attribute user.storage_class=STANDARD. Objects that are stored in the storage class STANDARD can be retrieved with normal GET operations.

The policy engine can be programmed to select files for migration that have the attribute user.storage_class set to GLACIER and that are not already migrated. Here is an example of this policy:

/* MACRO: defining migrated state */

define(is_migrated, (MISC_ATTRIBUTES LIKE '%V%'))

/* RULE 1: define external pool */

RULE 'extpool' EXTERNAL POOL 'ltfs' EXEC '/opt/ibm/ltfsee/bin/eeadm'

OPTS '-p pool1@lib1'

/* RULE 2: Migration rule */

RULE 'mig' MIGRATE FROM POOL 'system' TO POOL 'ltfs' WHERE

NOT (is_migrated) AND

XATTR('user.storage_class') like 'GLACIER'

The storage admin executes this policy stored in file mig-glacier.policy by using the following command:

# mmapplypolicy fs1 -P mig-glacier.policy

In the next section we elaborate on getting objects from Glacier.

Get Glacier objects

Normal GET operations for objects in storage class GLACIER do not work as shown below:

# s3u1 cp s3://glacier1/coldfile0 coldfile0

Warning: Skipping file s3://glacier1/coldfile0. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

Instead, the S3 user must trigger a restoration of the object using the restore-object operation:

# s3u1api restore-object --bucket glacier1 --key coldfile0 \

--restore-request Days=1

One day means that after the file was restored it remains 1 day on disk, before it is re-migrated to tape.

The subsequent head-object operation shows that the object is scheduled for restore:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

"AcceptRanges": "bytes",

"Restore": "ongoing-request=\"true\"",

"LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

"ContentLength": 6449152,

"ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

"ContentType": "application/octet-stream",

"Metadata": {

"storage_class": "GLACIER"

"StorageClass": "GLACIER"

}

The tag ongoing-request=true means that the object must be recalled from tape. Since we have not migrated the object, we must not perform a recall.

After the restore-object operation, the object is expected to be recalled from tape. We skip the recall for now because we did not migrate the object to tape. To enable the GET operation some attributes of the file in the Storage Scale file system must be adjusted. Before we adjust the attributes, let’s take a look at the current attributes of our Glacier object from a file system perspective. This requires administrative privileges in the Storage Scale file system:

# mmlsattr -L -d /ibm/fs1/buckets/glacier1/coldfile0

file name: /ibm/fs1/buckets/glacier1/coldfile0

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: system

fileset name: buckets

snapshot name:

creation time: Mon Mar 11 15:30:28 2024

Misc attributes: ARCHIVE

Encrypted: no

user.noobaa.content_type: "application/octet-stream"

user.storage_class: "GLACIER"

user.noobaa.restore.request: "1"

As shown above NooBaa added two additional attributes: user.storage_class and user.noobaa.restore.request. The attribute user.storage_class was added when the object was PUT into the bucket. The attribute user.noobaa.restore.request was added when the restore-object operation was performed.

To signal to NooBaa Glacier that the file was recalled from tape, we must manually add a new attribute user.noobaa.restore.expiry that encodes the date and time when the file is being remigrated. To set this we take the 1 day period encoded in the attribute user.noobaa.restore.request into account. Setting attribute of a file must be executed by a Storage Scale administrative user with sufficient privileges:

# mmchattr --set-attr user.noobaa.restore.expiry= \
$(date -u -d "1 days" +"%Y-%m-%dT00:00:00.000Z") \ /ibm/fs1/buckets/glacier1/coldfile0

Afterwards, we manually delete the attribute user.noobaa.restore.request as administrative user of the file system:

# mmchattr --delete-attr user.noobaa.restore.request /ibm/fs1/buckets/glacier1/coldfile0

The file attributes look like this from a file system perspective:

# mmlsattr -L -d /ibm/fs1/buckets/glacier1/coldfile0

file name: /ibm/fs1/buckets/glacier1/coldfile0

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: system

fileset name: buckets

snapshot name:

creation time: Mon Mar 11 15:30:28 2024

Misc attributes: ARCHIVE

Encrypted: no

user.noobaa.content_type: "application/octet-stream"

user.storage_class: "GLACIER"

user.noobaa.restore.expiry: "2024-03-13T00:00:00.000Z"

As S3 user let’s look at the object using the head-object operation:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

"AcceptRanges": "bytes",

"Restore": "ongoing-request=\"false\", expiry-date=\"Wed, 13 Mar 2024 00:00:00 GMT\"",

"LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

"ContentLength": 6449152,

"ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

"ContentType": "application/octet-stream",

"Metadata": {

"storage_class": "GLACIER"

"StorageClass": "GLACIER"

}

The tag ongoing-request is set to false, indicating that the object can be retrieved. The expiry time is in accordance to the file attribute user.noobaa.restore.expiry. The user must get the object prior to the expiration date, because afterwards the object might have been migrated again.

Let’s get the object:

# s3u1 cp s3://glacier1/coldfile0 coldfile0

download: s3://glacier1/coldfile0 to ./coldfile0

The object can be remigrated to tape in accordance with the restore expiration time that specifies the date and time the object can reside on disk. After remigration, the file attribute user.noobaa.restore.expiry must be manually deleted. The example below shows how to delete this attribute from the file.

# mmchattr --delete-attr user.noobaa.restore.expiry /ibm/fs1/buckets/glacier1/coldfile0

This command must be executed by an administrative user of the Storage Scale file system that has permissions to manipulate the subject file metadata.

Subsequently the S3 user cannot retrieve the object using the normal GET operation:

# s3u1 cp s3://glacier1/coldfile0 coldfile0

Warning: Skipping file s3://glacier1/coldfile0. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

The head-object operation shows that there is not restore request:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

"AcceptRanges": "bytes",

"LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

"ContentLength": 6449152,

"ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

"ContentType": "application/octet-stream",

"Metadata": {

"storage_class": "GLACIER"

"StorageClass": "GLACIER"

}

Summary

NooBaa manages extended attributes of objects stored as files in the Storage Scale file system in accordance with the S3 Glacier requests. These attributes can be used to control automatic migrating (user.storage_class=GLACIER) and recalls (user.noobaa.restore.request). To facilitate GET operations for Glacier objects, further attributes must be manually set (user.noobaa.restore.expiry) and deleted (user.noobaa.restore.request) after the recall. The table below summarizes the operations from an S3 user and storage admin perspective and shows the file attribute setting. The flow in this table is row-by-row with different operations highlighted in different colors:

Manually executing migration and recalls based on extended attributes set by NooBaa is cumbersome. Migration and recalls can be automated using the IBM Storage Scale policy engine. The policy engine is able to determine if certain attributes are set and execute the appropriate action (migration or recall) based on the attributes value. In part 6 of this blog article series we demonstrate how the policy engine can be used to automate S3 Glacier operations in the back end.

References

[1] AWS Glacier API
https://docs.aws.amazon.com/amazonglacier/latest/dev/amazon-glacier-api.html

[2] AWS Glacier implementation with NooBaa
https://github.com/noobaa/noobaa-core/blob/master/docs/design/NSFSGlacierStorageClass.md

2 comments

50 views

Permalink

https://community.ibm.com/community/user/blogs/nils-haustein1/2024/03/22/s3-tiering-to-tape-with-noobaa-part-5-glacier

Comments

Nils Haustein

Thu June 20, 2024 07:11 AM

Hi You Qing, the Noobaa NSFS S3 service itself does not have configurable timeouts. It tries to open the file in a intelligent way and waits until the recall is done. The S3 client has configurable timeouts. When you use the AWS CLI, you can set the connection and read timeouts using the following options in the aws command line:
--cli-connect-timeout value
--cli-read-timeout value
The value is in seconds. When you want to disable timeouts you can set the value of 0.

With the AWS CLI there is another config parameter, that you can put in the .aws/config file for your profile: max_attempts = value. This controls how many retries the client does. Default value is 4, setting it to a higher value cannot hurt.

I will elaborate on these parameter in my next blog article.

In the second part of my series I demonstrate how to install NooBaa from a nightly build. There is no reason to build it. You find the build instructions here:

[3] Building rpm packages for NooBaa-core
https://github.com/noobaa/noobaa-core/pull/7291

[4] NooBaa nightly builds (this URL spills out XML-content showing the available builds):
https://noobaa-core-rpms.s3.amazonaws.com/

Let me know if you have further question.

YOU QING RAO

Wed June 19, 2024 09:24 PM

Hello Nils,

As we know, sometime EE will cost more than 1 minutes to recall the data stored on tape. Do you have any guide to change Noobaa IO timeout, and build and install the code manually?

Thanks!

File and Object Storage

File and Object Storage

S3 tiering to tape with NooBaa Part 5 - AWS S3 Glacier API fundamentals

By Nils Haustein posted Fri March 22, 2024 07:20 AM