Tape Storage

Driving adoption of tape storage in the Cloud

By Nils Haustein posted Thu November 26, 2020 10:24 AM

  

By Nils Haustein and Harald Seipp

Executive summary

A tiered storage system provides lower total cost of ownership for large volumes of data by storing data on the most appropriate storage tier (flash, disk and tape). Independent studies have demonstrated that total cost of ownership with combined disk and tape solutions is 3 to 7 times lower than disk-only solutions [1]. In addition, the failure rate of tape is much lower than disk because the tapes do not contain continuously spinning mechanical parts.

While tape storage offers better total cost of ownership for storing large volumes of data for long periods of time, access time to data on tape is significantly higher than to data on disk. This can cause negative user experience which is amplified by standard file systems like NFS and SMB that are not tape aware.

Implementing tape awareness in standard file system protocols is not easy because of the wide distribution and acceptance within existing IT solutions. However newer data interfaces such as object storage APIs can easily be adapted to the characteristics of tape [2]. Consequently, the integration of a tape aware object storage API for data access in a tiered object storage system has the potential to significantly improve the user experience while lowering total cost of ownership with tapes. This will eventually drive the adoption of the tape tier for cloud storage.  

Tiered storage

A tiered storage system provides disk and tape storage within a global file system namespace and transparently moves data from disk to tape and vice versa. The file system name space is accessible by users and applications through a file system protocol such as NFS, SMB and POSIX. The automatic movement of data is based on policies and takes into account retention times, sizes, data types and other data attributes. Access to data is transparent for the user and the application through the file system protocol, regardless whether the data is stored on disk or on tape.

A tiered storage system – especially the combination of disk and tape - provides lower total cost of ownership for large volumes of data by moving the data to the most appropriate storage technology [1]. The cost savings of tape storage in a tiered storage system come with a downside. The higher latency of data access on tape in combination with tape storage agnostic file systems cause negative user experiences, as demonstrated below. This challenge can be addressed with tape aware data interfaces provided by tiered storage systems.

File systems are a blessing and a curse for tiered storage

Tiered storage systems provide access to files via standard file system protocols such as POSIX, NFS and SMB. The file system however is both a blessing and a curse [4]. The blessing is that the user sees all files regardless on which storage tier files are stored. This blessing is also the curse, because the user cannot easily identify whether the file is migrated to tape. When the user opens a file that is migrated he must wait a couple of minutes until the tape is loaded, positioned and the file has been recalled. Unfortunately, the user is not aware that the file open operation takes time because the file system has no way to let the user know. After exceeding the human acceptance factor (~20 seconds), the user may get impatient and attempt to cancel the operation or reboot the system.

It gets even worse if the user opens several migrated files simultaneously that happen to be stored on different tapes. This causes even longer waiting times because tapes are randomly mounted and dismounted to serve individual files. Standard file systems cannot leverage tape optimized recalls where files are sorted by their tape-ID and location on tape before the recall is initiated in this order. This technique dramatically reduces the access and transfer time because tapes are loaded and read in an optimized manner.

The negative user experience described above is amplified by standard file systems that make the user think all files in a tiered storage system are instantly available but access to files on tape takes time. Making standard file system protocols tape aware is not feasible because it requires changes in hundreds of heterogeneous applications, file system implementations and operating systems that are using standard file systems according to the current specification.

Tiered Object Storage improves the user experience

Object storage provides object APIs for accessing data as objects. Two widely accepted object storage APIs are OpenStack Swift and Amazon S3. Unlike standard file system protocols, object storage APIs can easily be made tape aware. The OpenStack Swift High Latency Media (Swift HLM) middle-ware [2] allows this by providing tape specific API functions that address the challenges associated with tape storage.

The Swift HLM middle-ware is an OpenStack Swift associated project [3] and is useful for running OpenStack Swift with high latency media (HLM), such as tape storage. Swift HLM can be added to OpenStack Swift and allows explicit control of Swift objects and container locations by providing tape specific functions for migration, recall and status.

OpenStack Swift with Swift HLM can be easily integrated into a tiered object storage system comprised of different storage tiers including flash, disk and tape [4]. This OpenStack Swift object API along with the HLM extension allows the management of data on the storage tiers while the HLM back-end adapts to the specific tape characteristics and functions. The object characteristics of files on tape - they are always written and read as a whole – perfectly match the tiered object storage characteristics.

The OpenStack Swift HLM extension provides the following additional calls, find some example below:

With the “status” call the user can easily determine whether objects have been migrated to tape.

curl - i "http://hostname/hlm/v1/status/account/container/object" -X GET -H "X-Auth-Token: $token"

The user can initiate the migration of an object or container using the “migrate” call.

curl - i "http://hostname/hlm/v1/migrate/account/container/object" -X POST -H "X-Auth-Token: $token"

When the user initiates a recall for a migrated object via the “recall” call he is aware that it takes a while because he knows the object is migrated.

curl - i "http://hostname/hlm/v1/recall/account/container/object" -X POST -H "X-Auth-Token: $token"

The “request” call allows the user to determine which operations (migrate and recall) are in progress:

curl - i "http://hostname/hlm/v1/request/account/container/object" -X GET -H "X-Auth-Token: $token"

Migration and recall requests can be executed synchronous or asynchronous. It is recommended to perform these operations asynchronous using the HLM back-end. The HLM back-end is provided by the tape management software. This allows the implementation of tape optimized recalls whereby the HLM back-end can collect recall requests over a predefined period, store the names of the objects to be recalled in a list and perform the tape optimized recall using this list.

Such a tiered object storage system is well suited for cloud deployments because it provides the cost advantages of a tiered storage system with a much-improved user experience. It leverages standardized APIs such as OpenStack Swift in combination with automation of data management across the various storage tiers. Furthermore, the object storage architecture is designed for storing billions of objects for many users and application while providing continuous availability and extensive scalability.

 

References

[1] ESG validation report: Quantifying the Economic Benefits of LTO-8 Technology
https://www.lto.org/wp-content/uploads/2018/08/ESG-Economic-Validation-Summary.pdf

IBM Tape TCO cost calculator
https://www.ibm.com/it-infrastructure/resources/tools/storage-tco-calculator/index.htm

[3] OpenStack Swift associated projects providing alternative APIs:
http://docs.openstack.org/developer/swift/associated_projects.html#alternative-api

[4] Challenges and solutions with tiered storage file systems
https://community.ibm.com/community/user/storage/blogs/nils-haustein1/2020/01/14/managing-files-in-tiered-storage

0 comments
178 views

Permalink