File and Object Storage

 View Only

S3 tiering to tape with NooBaa Part 3  - Tiering S3 objects to tape

By Nils Haustein posted Wed February 28, 2024 06:01 AM

  

In part2 of this series we explained how to deploy and use NooBaa in an IBM Storage Scale cluster. In this part 3 of the series, we elaborate on tiering objects from the file system to tape using IBM Storage Archive.

The tiering capabilities rely on the IBM Storage Scale policy engine and the integration of IBM Storage Archive Enterprise Edition with IBM Storage Scale. The picture below illustrates the IBM Storage Scale lifecycle management architecture:

As shown in the picture above, all cluster nodes can access filesystem where the data is stored. In our example, the S3 buckets and objects are stored in file system fs1 that is mounted under /ibm/fs1. The filesystem stores files and object in internal pools that are on disk. An internal pool is a collection of disks with the same characteristic. In our example, there is one internal pool named system. In this pool data and metadata is stored.

IBM Storage Scale data lifecycle management can migrate files and object between pools. Beside the internal pools there can be external pools. An external pool allows to migrate data to tape. An external pool can be provided by IBM Storage Archive Enterprise Edition. In the picture above the pool Tape is an external pool manage by IBM Storage Archive.  

Data lifecycle management is implemented by using the IBM Storage Scale policy engine. The policy engine executes rules that are written into a policy file (text file). Rules describe the action (such as migration) and the selection criteria for the files and objects to be migrated. There is a comprehensive set of selection criteria for files, most commonly use criteria are:

  • Last access time indicating when a file was last accessed,
  • File types based on file extensions,
  • Directories where files are stored.

More details about rules, policies and the policy engine can be found in [6].

When files are migrated to tapes that are managed by IBM Storage Archive, then these files are still visible in the file system. Upon access to a migrated file, it is recalled from tape and placed on disk. Files can also be pre-migrated whereby a copy of the file resides on disk and on tape. All this can be accomplished using the IBM Storage Scale policy engine or IBM Storage Archive commands.

In the section below we elaborate on “archive” functions: list, migrate and recall.

List migration states

To list file migration states, you must logon to an IBM Storage Scale cluster node with the privileges to run the appropriate commands.

Listing the migration states of files is essential to confirm that migrations and pre-migration operations succeeded.  To look at migration states of the files we must investigate the directory where the object buckets are stored.

In section Create accounts and buckets we created an S3 account named user1. In section Use AWS cli to access NooBaa S3 endpoints we created two buckets named test1 and test2 for this user. The buckets are stored in the file system fs1 under the path /ibm/fs1/buckets. Let look at buckets in the file systems:

# tree -d /ibm/fs1/buckets/
/ibm/fs1/buckets/
├── test1
└── test2

As shown above, there are two sub-directories named in accordance with the bucket names (test1 and test2).

Within these directories there are files that were ingested as object via the NooBaa S3 service (see section Use AWS cli to access NooBaa S3 endpoints):

# tree  /ibm/fs1/buckets/
/ibm/fs1/buckets/
── test1
│   └── filename

── test2
│   ── file0
│   ── file1
│   ── file2
│   ── file3
└── └── file4

As shown above in bucket test1 with bucket path /ibm/fs1/buckets/test1 there is one file stored. In bucket test2 are 5 files stored in path /ibm/fs1/buckets/test2.

IBM Storage Archive provides a sub-command to check the migration state of a file. Here is an example for checking the migration state for the file in bucket test1.

# eeadm file state /ibm/fs1/buckets/test1/filename
Name: /ibm/fs1/buckets/test1/filename
State: resident

The file is in status resident which means it is not on tape.

Checking the state of the files in important to test the effectiveness of the migration policies. Let’s migrate some files.

Migrate objects

To migrate files, you must logon to an IBM Storage Scale cluster node with the privileges to run the appropriate command.

During file migration, the content of the files is moved to tape. The files remain visible in the file system and in the S3 buckets while the file content resides on tape. Within the file metadata that is stored on disk there is a pointer to the tape where the file content is stored.

We can migrate the file in the test1 bucket using the command eeadm migrate. This command can read the filenames to be migrated from a pipe. The pipe is filed by using the find command. The files are migrated to a tape provided in the tape pool1:

# find /ibm/fs1/buckets/test1 -type f | eeadm migrate -p pool1

To check the migration state of the file in the test1 bucket, we can use this command:

# eeadm file state /ibm/fs1/buckets/test1/filename
Name: /ibm/fs1/buckets/test1/filename
State: migrated
ID: 8642704550425025024-3560302634690375993-1469765045-1049860-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

The file was migrated to tape with ID DO0060L7.

Migrating files using the find command is easy. However, it does not provide the flexibility for the selection of files. For many use cases, files to be migrates are selected based on specific attributes such as:

  • Last access time indicating when a file was last accessed,
  • File types based on file extensions,
  • Directories where files are stored.

To accommodate dynamic file selection based on file properties, the IBM Storage Scale policy engine can be used. 

Policy base migration

Let’s use the policy engine to migrate files from the test2 bucket. First, we create a policy with two rules that migrates all files from the path of the bucket test2:

/* RULE 1: define external pool */
RULE 'extpool' EXTERNAL POOL 'ltfs' EXEC '/opt/ibm/ltfsee/bin/eeadm' OPTS '-p pool1@lib1'

/* RULE 2: Migration rule */
RULE 'mig' MIGRATE FROM POOL 'system' TO POOL 'ltfs'
WHERE (KB_ALLOCATED > 0) AND PATH_NAME LIKE '%/ibm/fs1/buckets/test2/%'

The first rule defines the external pool name “ltfs” that is represented by the eeadm command that is invoked with the parameter -p pool1@lib1. This parameter ensures that the files are migrated to tape pool1.
The second rule defines that files in path
/ibm/fs1/buckets/test2/ are migrated to the “ltfs” pool that is managed by IBM Storage Archive.

The policy is stored in file mig.policy and executed with the following command:

# mmapplypolicy fs1 -P mig.policy

GLESL700I: Task migrate was created successfully, task ID is 1527.

GLESL839I: All 5 file(s) has been successfully processed.
[I] A total of 5 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;

The shortened output of the command above shows that 5 files were migrated.

Now we can check the state of the files in bucket test2:

# eeadm file state /ibm/fs1/buckets/test2/*
Name: /ibm/fs1/buckets/test2/file0
State: migrated
ID: 8642704550425025024-3560302634690375993-1624576163-1075970-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

Name: /ibm/fs1/buckets/test2/file4
State: migrated
ID: 8642704550425025024-3560302634690375993-1961220157-1075972-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

The shortened output above shows that file0 and file4 are migrated. All other files are in migrated state as well.

In this section we presented some easy ways to migrate objects from the IBM Storage Scale to tape using IBM Storage Archive. In the next section we recall objects using the AWS CLI.

Recalling objects

To recall files, we use the S3 protocol leveraging the AWS CLI. Let’s first list the bucket test1.

# s3u1 ls s3://test1

2024-02-06 09:32:55    8688646 filename

There is one object in this bucket with the size of 8 MB. There is no hint whether the file is migrated or not.

Let’s GET the object that we just migrated for bucket test1 by using the AWS CLI with the time command:

# time s3u1 cp s3://test1/filename ./

download: s3://test1/filename to ./filename

real    1m05.214s
user    0m0.630s
sys     0m0.131s

As shown above, it took about 1 minute to GET the object, because it was stored on tape and the tape had to be mounted.

When checking the file state on an IBM Storage Scale cluster node, we see the file is premigrated:

# eeadm file state /ibm/fs1/buckets/test1/filename
Name: /ibm/fs1/buckets/test1/filename
State: premigrated
ID: 8642704550425025024-3560302634690375993-1469765045-1049860-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

We can recall all the files from bucket test2 using the recursive copy command provided with the AWS CLI. First, we list the content of the bucket:

# s3u1 ls s3://test2

2024-02-06 11:01:20    8688640 file0
2024-02-06 11:01:20    2721792 file1
2024-02-06 11:01:20    2636800 file2
2024-02-06 11:01:20    4058112 file3
2024-02-06 11:01:20   10463232 file4

Now, we GET the entire content of bucket test2 into a local file system:

# time s3u1 cp --recursive s3://test2/ ./

download: s3://test2/file0 to ./file0
download: s3://test2/file1 to ./file1
download: s3://test2/file2 to ./file2
download: s3://test2/file3 to ./file3
download: s3://test2/file4 to ./file4

real    0m37.239s
user    0m0.672s
sys     0m0.140s

This operation took 37 seconds because the objects were recalled from tape that was already loaded in a tape drive.

Looking into the file systems, the files in bucket test2 are in status premigated:

# eeadm file state /ibm/fs1/buckets/test2/*

Name: /ibm/fs1/buckets/test2/file0
State: premigrated
ID: 8642704550425025024-3560302634690375993-1748409866-1088774-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

Name: /ibm/fs1/buckets/test2/file4
State: premigrated
ID: 8642704550425025024-3560302634690375993-130656382-1072644-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

The output above only shows the states of file0 and file4 are in premigrated state. All the other files are in status premigrated as well. This indicates that the files were recalled from tape and are not dual resident on disk and on tape.

Accessing migrated objects using the S3 service provided by NooBaa is seamless because the objects are automatically recalled from tape. Of course, it may take some time to get files from tape. Care must be taken when many S3 users access migrated objects at the same time. This can lead to recall storm when many files residing on different tapes are recalled at the same time. To mitigate recall storm, bulk recalls can be used. Bulk recalls are much faster and resource efficient [7] because they sort the files to be recalled by tape ID and position on tape. In the next section we provide more details for bulk recalls

Bulk recalls

IBM Storage Archive provides a bulk recall function. With bulk recalls many files or objects are recalled in the order of the tape ID and their position on tape. Bulk recalls are much faster than transparent recalls triggered by S3 GET [7]. Bulk recalls however cannot be executed via the S3 API. Bulk recalls can be executed by an administrator who has permission to run the eeadm recall command on an IBM Storage Archive node.

Let’s demonstrate bulk recalls. Assume the 5 objects in bucket test2 are migrated again. An S3 user wants all files back on disk and issues a request to the administrator to recall all files from bucket test2. The administrator knows that bucket test2 for user1 is in path /ibm/fs1/buckets/test2. With this information the administrator can issue the bulk recall:

# find /ibm/fs1/buckets/test2 -type f | eeadm recall

This command string first finds all files in the bucket path and pipes these filenames to the eeadm recall command. The eeadm recall command sorts the filenames by tape ID and their location on tape and performs the recall in order for each tape ID in parallel.

The file state is premigrated, as shown below exemplary for file0 and file4:

# eeadm file state /ibm/fs1/buckets/test2/*

Name: /ibm/fs1/buckets/test2/file0
State: premigrated
ID: 8642704550425025024-3560302634690375993-1748409866-1088774-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

Name: /ibm/fs1/buckets/test2/file4
State: premigrated
ID: 8642704550425025024-3560302634690375993-130656382-1072644-0
Replicas: 1
Tape 1: DO0060L7@pool1@lib1 (tape state=appendable)

Now the S3 user can perform the GET operation for all objects using the recursive copy function:

# time s3u1 cp --recursive s3://test2/ ./

download: s3://test2/file0 to ./file0
download: s3://test2/file1 to ./file1
download: s3://test2/file2 to ./file2
download: s3://test2/file3 to ./file3
download: s3://test2/file4 to ./file4

real    0m0.795s
user    0m0.643s
sys     0m0.170s

As shown above the GET operation took less than a second, because the objects did not have to be recalled.

The disadvantage of bulk recalls is, that S3 users and the administrator must communicate to get bulks of objects recalled.

In this part of this blog article series, we demonstrated how easy it is to migrate and recall S3 objects to and from tape using the IBM Storage Scale policy engine in combination with IBM Storage Archive. In the next part of this series, we look functions allowing the S3 object client to control migration and recalls. Stay tuned!

References

[1] NooBaa documentation
https://www.NooBaa.io/

[2] NooBaa-core open source repository on GitHub
https://github.com/noobaa/noobaa-core

[3] Building rpm packages for NooBaa-core
https://github.com/noobaa/noobaa-core/pull/7291

[4] NooBaa nightly builds (this URL spills out XML-content showing the available builds):
https://noobaa-core-rpms.s3.amazonaws.com/

[5] NooBaa service customization parameter
https://github.com/noobaa/noobaa-core/blob/master/docs/dev_guide/NonContainerizedDeveloperCustomizations.md

[6] IBM Storage Scale policy guide
https://www.ibm.com/support/pages/node/6260749

[7] Comparison of transparent and bulk recalls
https://community.ibm.com/community/user/storage/blogs/nils-haustein1/2022/01/07/duration-of-optimized-and-normal-recalls


#Featured-area-2-home
#Featured-area-2
2 comments
41 views

Permalink

Comments

Thu March 07, 2024 03:01 PM

Thanks YouQing, absolutely. There are new and different use case asking for S3 object storage tiering to tape capabilities. We have it available as open-source with NooBaa on Storage Scale and Storage Archive. 

Thu March 07, 2024 01:49 AM

This is an interesting product where we can win HDD object storage