Originally posted by: Nils Haustein
By Nils Haustein, Khanh V Ngo
A tiered storage system provides lower total cost of ownership for large volumes of data by storing data on the most appropriate storage tier (flash, disk and tape). Independent studies have demonstrated that total cost of ownership of tape solution provides an expected TCO that is more than 80% lower than that of the all-disk solution .
While tape storage is suitable for storing large volumes of data over long periods of time at lower cost, access time to data on tape is significantly higher than to data on disk. This can cause negative user experience which is amplified by standard file systems like NFS and SMB that do not provide functions to efficiently manage data on tape. Many users of tiered storage file systems have experienced the implications when using tapes and are looking for alternatives.
This article explains the architecture and tape operations in a tiered storage file system (see section Tiered storage file systems with tapes) and highlights the challenges when using tapes (see section Challenges). Based on these challenges best practices based on solution architectures, operational process and tools are presented that allow efficient managing data on tape within tiered storage file systems (see section Best practices).
Tiered storage file systems with tapes
A tiered storage file system (Figure 1) provides disk and tape storage within a global file system namespace where a space management component transparently moves files from disk to tape and vice versa. The file system name space is accessible by users and applications through standard file system protocols such as NFS, SMB and POSIX.
Figure 1: Tiered storage file system
Tiered storage file systems do not always include tape storage. In this article however we address the challenges and solutions when using tape storage. Tiered storage file systems with tape storage are primarily used for archiving large volumes of data for long periods of time.
The movement of files to tape is called migration. During migration the file is copied to tape and subsequently stubbed on disk. The file stub on disk is represented by the file’s inode including metadata about the file such as time stamps, extended attributes, size, path and file name and a reference to the tape-ID where the file is stored. The file data however is freed up on disk and resides on tape. The space management component uses policies to automatically migrate files between disk and tape. These policies can select files subject for migration based on file metadata and other attributes.
A special form of migration is pre-migration. With pre-migration the file is dual resident: on disk and on tape. Hence pre-migration only copies the file to tape but does not stub it. The advantage of pre-migration is that a subsequent migration is fast because the file does not have to be copied to tape again, it is just stubbed on disk. Pre-migration however does not free up space on disk.
A migrated file is visible within the file system name space and can be transparently accessed by the users and applications. When a migrated file is accessed then the space management component intercepts this access request and copies the file back from tape to disk. Once the file is back on disk the access request is granted. This process is named recall.
There are two kind of recalls: transparent recall and tape optimized recall. A transparent recall is triggered when a migrated file is transparently accessed in the file system. The space management component loads the tape-ID referenced in the file stub, locates the file on tape and reads the file back to disk. When multiple migrated files are accessed at the same time, then multiple independent transparent recalls are triggered, each requiring a tape to be loaded, located and a file to be read. If multiple files are on the same tape, then these files are not recalled in the order they are stored on tape. This causes tape start – stop operations which are usually time consuming. In addition, transparent recalls can cause chaotic tape load operations because files are not sorted by their tape-ID. This is even more time consuming.
Tape optimized recalls are much faster and tape resource gentle. The tape optimized recall is triggered by a command that includes a list of file names to be recalled. This command is passed to the space management component that sorts these file names by the tape-ID (located in the file stub) and the position on tape. Afterwards it mounts the required tapes and copies the files back to disk in parallel. Hence, all files that are on one tape are copied back in the order they are stored on tape. In addition, it prevents chaotic tape loads because each tape that is required is loaded and processed only once. The downside is that tape optimized recalls in a tiered storage file system are not transparent to the user and require an administrative command to be issued.
Tiered storage file systems with tape storage are a blessing and a curse. The blessing is that the user can see all files regardless if these are stored on disk or tape. Cursing starts when the when the user opens a file that is stored on tape because the recall takes one or more minutes. Unfortunately the user is not aware that the file is on tape because standard file systems do not indicate whether the file is on disk or on tape. The user must wait and does not know why.
It gets even worse if the user simultaneously opens several files that are on tapes. This causes even longer waiting times because transparent recalls are not tape optimized. Standard file systems cannot leverage tape optimized recalls where files are sorted by their tape-ID and location on tape before the recall is initiated in this order.
These challenges are amplified by standard file systems that make the user think all files are instantly available but access to files on tape takes time. Making standard file systems tape aware is not feasible because it requires changes in hundreds of heterogeneous applications, file system implementations and operating systems that use standard file systems according to the current specification.
Some operating system tools also cause transparent recalls without the user noticing it. For example, the Finder on MacOS causes transparent recalls when it accesses a tiered storage file system via SMB because this it opens all files in a certain directory to read the thumbnail information. Opening a migrated file causes transparent recalls and the Finder can cause recall storms.
To address these challenges, architectures and operational process supported by tools can be established to facilitate archiving and retrieval in a tiered storage file system. The next section Best practices provides guidance for architecting tiered storage file systems with tape and presents process for archival and retrieval as well as a Tape Archive REST API tool to support these processes. We also reference object storage solutions that have built in tape awareness.
Using tapes within tiered storage file systems requires some fundamental architectural decisions combined with operational process to overcome the challenges with tape. Find below some key recommendations and high-level guidance for implementing operational process.
Use tiered storage file systems with tape for archiving large volumes of data over long period of time to achieve the cost saving with tape compared to disk storage.
- Storing small volumes of data on tape will not save cost compared to disk storage because the initial investment requires a tape library, tape drives and tape cartridges. The cost for this is comparable to disk storage.
- Storing data on tape for short period of time requires frequent reclamation of tapes which is a resource intensive process because it requires two tape drives.
The data stored on tape should be rarely accessed.
- Accessing data on tape takes long time for the recall
- Recalls scale with the number of tape drives. As more recalls as more tape drives are required which increases cost
Provision an archive file system with a disk buffer
- The disk buffer in the archive file system is the landing zone for all incoming archived data. It assures that the user does not directly access the tape resources during archiving.
- Provide enough capacity to the disk buffer assuring that migration to tape takes place after the ingest process.
Separate production and archive file system
- Using tape within a production file system may not be feasible because of the high access latency.
- Separating the archive file system with tape from the production file system without tape allows different file system configuration parameters and operational processes on a file system level
Allow users to browse the archive file system but prevent transparent recalls
- If the user can browse the archive file system, then he can see all his archived files which makes him happy.
- Preventing transparent recalls mitigates the challenges with tape and requires additional operational processes for retrieval. Further considerations about preventing transparent recalls can be found in section Prevent transparent recalls.
Implement simple and well-defined archival and retrieval process
- Archival process facilitates the ingest of files into the archive file system and the migration to tape.
- Retrieval process facilitates the retrieval of data from the archive file system and the optimized recall from tape.
- Archival and retrieval process must be simple to allow users with different IT skills to follow it.
- More recommendations regarding these process see sections Archival process and Retrieval process.
Prevent transparent recalls
The objective is to prevent the user from triggering a transparent recall when accessing the file and at the same time to allow the user to see the file in the file system name space. There are two ways to prevent transparent recalls for files.
One way to prevent transparent recalls of migrated files is to remove permission for read and write for the user and group after the file has been migrated to tape. Ideally this should only be done only for migrated files to allows the user to access resident files. The implementation requires to periodically identify files that are migrated and remove read and write permissions for the user. Likewise, for migrated files that have been recalled the read and write permissions must be restored to facilitate access. Automated processes can be used to identify migrated and recalled files and adjust the file permissions.
Another more efficient way depends on the implementation of space management component. If the space management component allows for preventing transparent recalls no file permissions must be changed. The space management component just cancels the access request and does not execute the transparent recall.
The prevention of transparent recalls is the foundation for the archival and retrieval processes discussed below.
The archival process lays out how files are moved to the archive file system and when they are migrated to tape. The user can access the archive file system that should be separated from the production file system and should provide a disk buffer as landing zone for the incoming data. This assures that the user does not send the archive data directly to tape.
A simple implementation of an archival process is that all users mount the archive file system and whenever a project is finished the user moves the associated files and directories to the archive file system. This way the user is actively involved in the archival process and knows that his data is in the archive file system.
The subsequent migration of files to tape should not require user intervention, just to make sure the archival process is simple. The migration can be automated and policy-drive. Policies define the selection criteria for files to be migrated. These selection criteria should match the expected access pattern of files in the archive file system. For example, if the expected access pattern is that files are more often accessed within the first 30 days after archival then files should be selected for migration if the file access time is greater than 30 days. This assures that files that have been migrated to tape are rarely accessed.
In some instances, it might be useful if the user triggers the migration to tape. For example, if the file access pattern is specific and cannot be implemented in automated policies. In this case the user can leverage extra tools like the Tape archive REST API to order the migration to tape (see section Tape archive REST API).
Once files are archived in the archive file system the user can see all his files and follow the retrieval process if he needs to access files.
The retrieval process defines how files are retrieved by the user from the archive file system. The user has access to the archive file system and can see all the file he has archived. One important prerequisite is that the user cannot transparently access a file that is migrated, see section Prevent transparent recalls for more details.
Preventing transparent recalls when the user accesses migrated files may require establishing service level agreement (SLA) defining how long the user must wait before he can access a migrated file. This defines the time interval of automated or manual tape optimized recall operations. The definition of such SLA is inherent part of the retrieval process. One example for such an SLA is that the user must wait 2 hours before a migrated file has been recalled using tape optimized recalls.
The retrieval process has two steps: In the first step the user browses his files in the archive file system and when he found the file, he accesses the file in the second step. If the file is not migrated to tape, then the file access succeeds otherwise the file access fails because the user is not allowed to cause transparent recalls.
If file access fails because the file is migrated and transparent recall is prevented, then the process defines how the user can get to his data. There are different implementations. One implementation is that the user sends an email to the archive file system administrators including a list of files to be retrieved. The administrator collects file lists from multiple users over a given time period and initiates the tape optimized recall. This requires establishing SLA defining how long it takes to retrieve a file. This implementation requires manual steps.
To automate the retrieval of files without giving the user the ability to cause transparent recalls an extra Tape archive REST API can be used (see section Tape archive REST API). With this the user can order the retrieval of migrated files though a HTTP-call. The user sends the list of file name to be recalled along with the HTTP-call. This call can recall the file immediately in a tape optimized manner or even better it can store the file list provided by this user and collect file lists from other users over a certain time period before recalling many files in a tape optimizes manner. Again, this requires establishing SLA defining how long it can take before a file is recalled. This defines the time period where file lists from different users are collected before initiating an automated tape optimized recall.
To avoid the user running into an error when accessing a migrated file, the Tape Archive REST API can be used to inquire the file status before accessing a file. The user can use a HTTP-call to inquire the state of a file using the Tape Archive REST API and if the file is in migrated state the user can order the recall without trying to access the file.
Tape archive REST API
The Tape archive REST API is on open source project that facilitates tape optimized operations in an archive file system  by giving control to the user for managing his files. It can be used to support the archival and retrieval process. The efficient use of the Tape archive REST API requires that transparent recalls are prevented.
The current implementation of the REST API is based on IBM Spectrum Scale providing the archive file system and IBM Spectrum Archive Enterprise Edition version 18.104.22.168 as space management component. The Tape archive REST API is not part of an IBM products and can be used under the open source license without support. Conceptual, this REST API can be adapted for many types of archive file systems and space management components. Contact the author of this article if you need professional help to fit the API to your requirements and environment.
The API in action
A typical use case is that the user can see all his files through an NFS export, a SMB share or directly in the Spectrum Scale file system. Access to files that are migrated is not possible, because transparent recalls are prevented. Preventing transparent recalls requires additional measures (see section Prevent transparent recalls).
If the access to a file fails than the user can determine the file state using the following API-call:
The fully qualified path and file name is given within the URL. As shown in the Response of this call, the file is in state “migrated”.
Because transparent recalls should be prevented the user can now use the following API-call to request recall of the files:
# curl -X PUT http://localhost/recall -d "
The fully qualified path and file name(s) are given in the body of this request, whereby each path and file name must be placed in a separate line. The response indicates that the recall has succeeded. To leverage the tape optimized recall multiple file name should be given with the recall request.
The user can now determine the file state again using the appropriate API-call:
As shown in the response the file is in state “premigrated”.
The user can also order the migration of a pre-migrated or resident file using the following API call:
# curl -X PUT http://localhost/migrate?pool1=pool1 -d "
The URL identifier “pool1=pool1” is required and defines the target tape pool for this migration request. Up to three pools can be defined as migration target. The fully qualified path and file name(s) are given in the body of this request, whereby each path and file name must be placed in a separate line. The response of this request shows that the migrate has succeeded. For tape optimized operations multiple file names should be given with this migrate request.
The Tape archive REST API provides additional function to inquire the status of certain IBM Spectrum Archive EE components. It can be deployed on IBM Spectrum Archive EE servers or on remote servers that communicate with the IBM Spectrum Archive EE servers via ssh. It can even be deployed as docker container. For more information refer to the git repository .
Tape aware object storage
Many of the challenges associated with tape storage in a tiered storage file system have been addressed with OpenStack Swift High Latency Middleware (Swift HLM).
The OpenStack Swift HLM middleware is an OpenStack Swift associated project  and is useful for tiered object storage systems including disk and tape. OpenStack Swift HLM is designed to manage operations on a high latency media like tapes . OpenStack Swift HLM allows explicit control of Swift objects and container locations by providing tape specific functions for migration, recall and status. With the “status” call the user can easily determine whether or not objects have been migrated to tape. The user can initiate the migration of an object or container using the “migrate” call. When the user initiates a recall for a migrated object via the “recall” call he is aware that it takes a while because he knows the object is migrated. The HLM backend can also implement tape optimized recalls when one or more users recall multiple objects at the same time.
OpenStack Swift HLM provides a tape aware object storage protocol to the user and closes the inherent gaps we have unleashed for file systems (see section Challenges). However, it is not a file system, it is an object storage. File systems are omnipresent in many IT environments, and it may require huge efforts to shift the workload from file to object-oriented data access.
The spread of Swift HLM in the market is limited, unfortunately. The dominant object storage protocol is Amazon Web Services S3 (Amazon S3). With S3 Glacier, Amazon provides a low-cost object storage service for archiving where the retrieval of objects can last minutes to hours, depending on the selected service level . The Amazon S3 Glacier API extends the Amazon S3 object storage protocol with additional calls that facilitate the retrieval of archived objects. When the user wants to retrieve objects, it starts a job using the Glacier API. The user can monitor the job status and can be informed (e.g. by email) when the job has finished. Once the retrieval job has finished the user can download his objects. The Amazon S3 Glacier API is more complex than Swift HLM but also provides more comfort to the user.
 Disk and Tape TCO study by ESG:
 Github repository for Tape archive REST API
 OpenStack Swift associated projects providing alternative APIs:
 Efficient data tiering with OpenStack Swift HLM
 Amazon S3 Glacier