File and Object Storage

 View Only

S3 tiering to tape with NooBaa Part 5  - AWS S3 Glacier API fundamentals

By Nils Haustein posted Fri March 22, 2024 07:20 AM

  

In part 3 of this article series, we explained basic techniques to tier (migrate and recall) S3 objects from disk to tape using NooBaa, IBM Storage Scale and IBM Storage Archive EE. In part 4, we demonstrated techniques leveraging object metadata and tags to better control migrations and recalls. Object metadata and tagging requires discipline from the S3 user to apply these metadata fields and tags consequently. Unless transparent recalls are disabled using Storage Archive options, there is still the danger of getting many transparent recalls that can lead to recall storms.

To avoid recall storms, NooBaa supports the AWS S3 Glacier API.  Glacier provides additional operations on top of the S3 API facilitating retrieval or restoration of objects from high latency media like tapes. Object stored in a Glacier storage class cannot be retrieved using standard GET operations. Instead, the S3 user must first use the Restore-Object operation provided by the Glacier API to request the recall from tape. After the object was recalled from tape to disk, the user can use the standard GET operations to retrieve the object.

In this part of the blog article series, we explain the fundamentals of AWS S3 Glacier and demonstrate how it works with NooBaa in combination with Storage Scale. We do not migrate and recall objects during this demonstration because it requires manual interventions. We will elaborate on automation of migration and recalls along with Glacier operations in the next part of this series.

NooBaa Glacier fundamentals

In this section we demonstrate the fundamental behavior of Glacier with NooBaa on IBM Storage Scale. We show manual steps to prepare Glacier objects for restoration to foster the understanding of the Glacier implementation with NooBaa NSFS. In the next article, we will demonstrate automation for all many manual steps shown below.

Enable Glacier in NooBaa

To enable the basic Glacier function in NooBaa, two parameters must be added to the NooBaa configuration file (for more details about the NooBaa configuration file refer to Section Customize and start the NooBaa service of Part 2 in this series). The two configuration parameters are NSFS_GLACIER_ENABLED=true and NSFS_GLACIER_LOGS_ENABLED=false as shown in the example below:

# cat /ibm/cesshared/noobaa/config.json

{

    "ENDPOINT_FORKS": 0,

    "UV_THREADPOOL_SIZE": 64,

    "ALLOW_HTTP": true,

    "NSFS_GLACIER_ENABLED": true,

    "NSFS_GLACIER_LOGS_ENABLED": false

}

Note, setting NSFS_GLACIER_LOGS_ENABLED to false will not log the objects to be migrated or recalled in a log file. We use this option to demonstrate manual steps. When automating migration and recalls of Glacier objects, this option is set to true.

If the parameter NSFS_GLACIER_LOGS_ENABLED is set to true (default) or when running a downstream version of NooBaa, then an additional directory must be created. Check if these the following directory exists:

# ls -l /var/run/noobaa-nsfs/wal/

If this directory does not exist, then create this directory and allow the user running the NooBaa service read and write access:

# mkdir -p /var/run/noobaa-nsfs/wal

To make the new configuration parameter effective the NooBaa service must be restarted:

# systemctl restart noobaa_nsfs

Objects are associated with Glacier through the storage class GLACIER. The storage class is provided during PUT operation by the S3 user. Objects associated with the storage class GLACIER cannot be GET without having them restored.

Store and migrate Glacier objects

As S3 user, let’s create a new bucket glacier1 where we store the objects associated with the storage class GLACIER:

# s3u1 mb s3://glacier1

Now, we PUT an object into the new bucket and associate the storage class GLACIER:

# s3u1 cp coldfile0 s3://glacier1 --storage-class GLACIER

To determine the storage class of an object the s3 API head-object operation can be used:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

    "AcceptRanges": "bytes",

    "LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

    "ContentLength": 6449152,

    "ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

    "ContentType": "application/octet-stream",

    "Metadata": {

        "storage_class": "GLACIER"

    },

    "StorageClass": "GLACIER"

}

The objects in storage class GLACIER can be migrated by using the Storage Scale policy engine. The file associated with the object has a user attribute user.storage_class set to GLACIER as shown below:

# mmlsattr -L -d /ibm/fs1/buckets/glacier1/coldfile0

file name:            /ibm/fs1/buckets/glacier1/coldfile0

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    system

fileset name:         buckets

snapshot name:

creation time:        Mon Mar 11 15:30:28 2024

Misc attributes:      ARCHIVE

Encrypted:            no

user.noobaa.content_type: "application/octet-stream"

user.storage_class:   "GLACIER"

It is also possible to store non-Glacier objects in the bucket glacier1. This can be accomplished by omitting the parameter –storage-class GLACIER during the PUT operation. As a result, NooBaa sets the file attribute user.storage_class=STANDARD. Objects that are stored in the storage class STANDARD can be retrieved with normal GET operations.

The policy engine can be programmed to select files for migration that have the attribute user.storage_class set to GLACIER and that are not already migrated. Here is an example of this policy:

/* MACRO: defining migrated state */

define(is_migrated, (MISC_ATTRIBUTES LIKE '%V%'))

/* RULE 1: define external pool */

RULE 'extpool' EXTERNAL POOL 'ltfs' EXEC '/opt/ibm/ltfsee/bin/eeadm'

OPTS '-p pool1@lib1'

/* RULE 2: Migration rule */

RULE 'mig' MIGRATE FROM POOL 'system' TO POOL 'ltfs' WHERE

  NOT (is_migrated) AND

  XATTR('user.storage_class') like 'GLACIER'

The storage admin executes this policy stored in file mig-glacier.policy by using the following command:

# mmapplypolicy fs1 -P mig-glacier.policy

In the next section we elaborate on getting objects from Glacier.

Get Glacier objects

Normal GET operations for objects in storage class GLACIER do not work as shown below:

# s3u1 cp s3://glacier1/coldfile0 coldfile0

Warning: Skipping file s3://glacier1/coldfile0. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

Instead, the S3 user must trigger a restoration of the object using the restore-object operation:

# s3u1api restore-object --bucket glacier1 --key coldfile0 \

  --restore-request Days=1

One day means that after the file was restored it remains 1 day on disk, before it is re-migrated to tape.

The subsequent head-object operation shows that the object is scheduled for restore:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

    "AcceptRanges": "bytes",

    "Restore": "ongoing-request=\"true\"",

    "LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

    "ContentLength": 6449152,

    "ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

    "ContentType": "application/octet-stream",

    "Metadata": {

        "storage_class": "GLACIER"

    },

    "StorageClass": "GLACIER"

}

The tag ongoing-request=true means that the object must be recalled from tape. Since we have not migrated the object, we must not perform a recall.

After the restore-object operation, the object is expected to be recalled from tape. We skip the recall for now because we did not migrate the object to tape. To enable the GET operation some attributes of the file in the Storage Scale file system must be adjusted. Before we adjust the attributes, let’s take a look at the current attributes of our Glacier object from a file system perspective. This requires administrative privileges in the Storage Scale file system:

# mmlsattr -L -d /ibm/fs1/buckets/glacier1/coldfile0

file name:            /ibm/fs1/buckets/glacier1/coldfile0

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    system

fileset name:         buckets

snapshot name:

creation time:        Mon Mar 11 15:30:28 2024

Misc attributes:      ARCHIVE

Encrypted:            no

user.noobaa.content_type: "application/octet-stream"

user.storage_class:   "GLACIER"

user.noobaa.restore.request: "1"

As shown above NooBaa added two additional attributes: user.storage_class and user.noobaa.restore.request. The attribute user.storage_class was added when the object was PUT into the bucket. The attribute user.noobaa.restore.request was added when the restore-object operation was performed.

To signal to NooBaa Glacier that the file was recalled from tape, we must manually add a new attribute user.noobaa.restore.expiry that encodes the date and time when the file is being remigrated. To set this we take the 1 day period encoded in the attribute user.noobaa.restore.request into account. Setting attribute of a file must be executed by a Storage Scale administrative user with sufficient privileges:

# mmchattr --set-attr user.noobaa.restore.expiry= \
$(date -u -d "1 days" +"%Y-%m-%dT00:00:00.000Z") \ /ibm/fs1/buckets/glacier1/coldfile0

Afterwards, we manually delete the attribute user.noobaa.restore.request as administrative user of the file system:

# mmchattr --delete-attr user.noobaa.restore.request /ibm/fs1/buckets/glacier1/coldfile0

The file attributes look like this from a file system perspective:

# mmlsattr -L -d /ibm/fs1/buckets/glacier1/coldfile0

file name:            /ibm/fs1/buckets/glacier1/coldfile0

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    system

fileset name:         buckets

snapshot name:

creation time:        Mon Mar 11 15:30:28 2024

Misc attributes:      ARCHIVE

Encrypted:            no

user.noobaa.content_type: "application/octet-stream"

user.storage_class:   "GLACIER"

user.noobaa.restore.expiry: "2024-03-13T00:00:00.000Z"

As S3 user let’s look at the object using the head-object operation:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

    "AcceptRanges": "bytes",

    "Restore": "ongoing-request=\"false\", expiry-date=\"Wed, 13 Mar 2024 00:00:00 GMT\"",

    "LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

    "ContentLength": 6449152,

    "ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

    "ContentType": "application/octet-stream",

    "Metadata": {

        "storage_class": "GLACIER"

    },

    "StorageClass": "GLACIER"

}

The tag ongoing-request is set to false, indicating that the object can be retrieved. The expiry time is in accordance to the file attribute user.noobaa.restore.expiry. The user must get the object prior to the expiration date, because afterwards the object might have been migrated again.

Let’s get the object:

# s3u1 cp s3://glacier1/coldfile0 coldfile0

download: s3://glacier1/coldfile0 to ./coldfile0

The object can be remigrated to tape in accordance with the restore expiration time that specifies the date and time the object can reside on disk. After remigration, the file attribute user.noobaa.restore.expiry must be manually deleted. The example below shows how to delete this attribute from the file.

# mmchattr --delete-attr user.noobaa.restore.expiry /ibm/fs1/buckets/glacier1/coldfile0

 This command must be executed by an administrative user of the Storage Scale file system that has permissions to manipulate the subject file metadata.

Subsequently the S3 user cannot retrieve the object using the normal GET operation:

# s3u1 cp s3://glacier1/coldfile0 coldfile0

Warning: Skipping file s3://glacier1/coldfile0. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

The head-object operation shows that there is not restore request:

# s3u1api head-object --bucket glacier1 --key coldfile0

{

    "AcceptRanges": "bytes",

    "LastModified": "Mon, 11 Mar 2024 14:30:28 GMT",

    "ContentLength": 6449152,

    "ETag": "\"mtime-czqzravhx5a8-ino-mxj\"",

    "ContentType": "application/octet-stream",

    "Metadata": {

        "storage_class": "GLACIER"

    },

    "StorageClass": "GLACIER"

}

Summary

As shown above, NooBaa automatically adds extended attributes to the files stored in the Storage Scale file system in accordance with the S3 Glacier operation. These attributes can be used to control automatic migrating (user.storage_class=GLACIER) and recalls (user.noobaa.restore.request). To facilitate GET operations for Glacier objects, further attributes must be manually set (user.noobaa.restore.expiry) and deleted (user.noobaa.restore.request) after the recall. The table below summarizes the operations from an S3 user and storage admin perspective and shows the file attribute setting. The flow in this table is row-by-row with different operations highlighted in different colors:

Manually setting attributes after recall operations is cumbersome. Therefore, NooBaa provides a flexible plugins architecture to facilitate automatic migration and recalls. We will elaborate on this innovative architecture in our next part of this blog article series. Stay tuned!

References

[1] AWS Glacier API
https://docs.aws.amazon.com/amazonglacier/latest/dev/amazon-glacier-api.html

[2] AWS Glacier implementation with NooBaa
https://github.com/noobaa/noobaa-core/blob/master/docs/design/NSFSGlacierStorageClass.md

0 comments
23 views

Permalink