Originally posted by: Nils Haustein
By Nils Haustein and Takeshi Ishimoto
In this blog entry we would like to share our experience upgrading Spectrum Archive Enterprise Edition version 1.1 to version 1.2. Before we do this we give a short introduction to Spectrum Archive EE and elaborate on new functions introduced with version 1.2.
Introduction
IBM Spectrum Archive™ Enterprise Edition - a member of the IBM Spectrum Storage™ family - enables you to automatically move infrequently accessed data from disk to tape to lower costs while retaining ease of use. Storing large volumes of data that needs to be retained for years on tape provides better cost efficiency than disk because tape does not consume power when not in use. In addition the failure rate of tape is much lower than disk because the tapes do not spin mechanical parts all the time and leverage read-after-write during data recording. Independent studies demonstrated that total cost of ownership with combined disk and tape solutions is by factor 3 – 7 lower than disk-only solutions [1].
Spectrum Archive Enterprise Edition (hereafter named Spectrum Archive EE) integrates the standardized Linear Tape File System™ (LTFS) [5] in the global file system name space provided by IBM Spectrum Scale™. Spectrum Scale – also known as General Parallel File System (GPFS) – is a clustered file system providing tiered storage capabilities. This allows an administrator to configure rules and policies controlling the automated placement and migration of files throughout their lifecycles. It enables organizations to easily implement data lifecycle management resulting in better storage efficiency and cost savings. Spectrum Archive EE manages the data on tape according to the standardized LTFS format, preventing vendor lock-in and assuring easy data exchange using tapes.
With Spectrum Archive EE version 1.2 (V1.2) up to two tape libraries can be connected and used from one Spectrum Scale & Archive cluster. This allows storing files on up to three tape in two different tape libraries facilitating better data availability and disaster protection. In this blog entry we present the functional highlights of Spectrum Archive EE V1.2 and elaborate on the upgrade path from Spectrum Archive EE version 1.1 (V1.1).
New concepts and functions in V1.2
Spectrum Archive EE V1.2 became available in December 2015 and introduced the following new functions:
- Multi-library support for up to two libraries
- TS1150 WORM tape support
- Support for LTO-7 tape drive technology
- Support for Red Hat 7
In addition significant performance improvements have been implemented, accelerating migration jobs by aggregating files and eliminating the wait time for the LTFS tape index synchronization. Also the footprint of the Spectrum Archive EE metadata has been significantly reduced.
The deployment of V1.2 was limited to new installations, this means upgrades from previous V1.1 releases to V1.2 were not possible. With Spectrum Archive EE V1.2.1 IBM now offers the upgrade path to version 1.2. Before we will elaborate on the upgrade procedure to V1.2 we highlight some architectural changes in Spectrum Archive to facilitate multi-library support.
Multi-library support
Spectrum Archive EE V1.2 and above supports up to two IBM tape libraries within one Spectrum Scale & Archive cluster as shown in picture 1 below. Each tape library is accessed by one Spectrum Archive node group. A Spectrum Archive node group is comprised of one control node and zero or more additional nodes. A Spectrum Archive node is a Spectrum Scale cluster node with the Spectrum Archive software configured and with access to one tape library. A Spectrum Archive control node manages the processing queue for the tape library. Each node within one node group has access to dedicated tape drives in that library. This means tape drives cannot be shared between nodes in a node group. Tape cartridges within the tape library however are shared by all nodes in a node group.

Picture 1: Spectrum Scale & Archive cluster with two tape libraries and node groups
As shown in picture 1, in a multi-library setup with two tape libraries there are two node groups. In node group A there is one Spectrum Archive control node with access to tape library lib0. In node group B there are two Spectrum Archive nodes with access to library lib1. One node is the control node and one node is an additional node.
In a multi-library environment migration policies can be configured to store migrated files on tapes in both tape libraries. In order to accommodate this, at least one tape pool has to be defined for each library with at least one tape per pool. In our example below one the pool with name pool1 is created in library lib0 and the pool named pool2 is created in library lib1, using the following commands:
# ltfsee pool create pool1 –l lib0
# ltfsee pool add pool1 –l lib0 –t LIBA00L7
# ltfsee pool create pool2 –l lib1
# ltfsee pool add pool2 –l lib1 –t LIBB00L7
To migrate files to two tape libraries the pool and library name can be directly specified in the migration policy as shown below:
The first rule defines the external pool provided by Spectrum Archive and the destination tape pools named pool1 in lib0 and pool2 in lib1:
RULE EXTERNAL POOL 'ltfs' EXEC '/opt/ibm/ltfsee/bin/ltfsee'
OPTS '-p pool1@lib0 pool2@lib1'
The subsequent migration rule migrates all files that have been last accessed more than 30 days ago to two tapes, one in lib0 and one in lib1:
RULE 'mig30days' MIGRATE FROM POOL 'system' TO POOL 'ltfs'
WHERE (CURRENT_TIMESTAMP - ACCESS_TIME > 30) AND
NOT (MISC_ATTRIBUTES LIKE '%V%')
To run this policy, store the rules in a file and use the mmapplypolicy command to execute this policy.
# mmapplypolicy fs-name –P policy-file –N nodes
Alternatively use ltfsee migrate command. This command uses a file list generated with a list policy as input to migrate files identified by the list policy to two tape libraries, as shown in the example below:
Create a list policy to identify all files that have been last accessed 30 days ago and write it to a file (in this case the file is named list_30days.pol)
/* define external list */
RULE EXTERNAL LIST '30days' EXEC ''
/* define list policy to list files last accessed more than 30 days ago */
RULE 'list30day' LIST '30days' WHERE
(CURRENT_TIMESTAMP - ACCESS_TIME > 30) AND
NOT (MISC_ATTRIBUTES LIKE '%V%')
Run the list policy (stored in file list_30days.pol) and let it create the output file. The output-files is name ./mylist.list.30days.
# mmapplypolicy filesystem -P list_30days.pol - f ./mylist -I defer
Using the output file from the list policy (named ./mylist.list.30days) run the migration to two tape libraries using the ltfsee migrate command:
# ltfsee migrate –s ./mylist.list.30days -p pool1@lib0 pool2@lib1
As demonstrated above there are two noteworthy changes with multi-library support. The first is that the configuration model has changed. Spectrum Archive EE V1.2 introduced the concept of node groups with control nodes and additional nodes. Nodes in one node group have access to the same library. The second change is that the migration requires to specify the destination library associated with the tape pools.
For more details about using Spectrum Scale ILM policies with Spectrum Archive refer to this whitepaper [6].
Upgrading from V1.1 to V1.2
In this section we share our experience performing an upgrade from Spectrum Archive EE V1.1 to Spectrum Archive EE V1.2, in particular to version 1.2.1. Thanks to the Spectrum Archive development team around Takeshi Ishimoto who provided the code and guidance for this test before it became available beginning of July in 2016. The upgrade went well and we were surprised how fast it went. The official procedure can be found at [2].
We had a small test environment setup with a single Spectrum Archive node and one tape library with two LTO-5 tape drives. The Spectrum Scale cluster was running GPFS 4.1.0.8 on RHEL 6.5 and Spectrum Archive 1.1.1.3 build 9400. In this environment we only had 5 tapes. So we created 4 tape pools and migrated different types of data to each pool. In total we had 10.000 migrated files, distributed across 5 tapes.
Planning is key. Accordingly, the first step is to identify which components need to be upgraded beside the Spectrum Archive software. The version of Spectrum Archive EE V1.2 we used supported the following levels (refer to [3] for the current interoperability matrix of Spectrum Archive):
- GPFS level: 3.5.0.x, 4.1.1.x and 4.2.0.1
- RHEL: 6.7, 7.1, 7.2 and SLES 11 SP4
Consequently, we had to upgrade GPFS from 4.1.0.8 to 4.2.0.4 and RHEL from 6.5 to 6.7. This upgrade has been done in phase 2 of the Spectrum Archive upgrade process that is further explained below.
1. Preparation and saving configuration
In this phase the tape inventory needs to be checked and tapes which are exported offline or which are not in a pool but still contain data need to be identified and their ID must be written to a file list. This file list is used in step 3 to assign these tape to valid pools.
After extracting the Spectrum Archive packages the new tool named “ltfsee_save_config” needs to be run. This saves the Spectrum Archive configuration in a file.
One important check to mention here is for the size of the Spectrum Archive metadata file system or directory (dcache). The sizing recommendations for this file system or directory have changed between V1.1 and V1.2 with a significant lower space consumption in V1.2. So please review the sizing recommendations for Spectrum Archive EE V1.2 and change the size in your cluster when appropriate [4].
At the end of this phase Spectrum Archive is stopped. This is the start of the downtime.
2. Upgrading operating system and Spectrum Scale
Once the first phase has been completed successfully, the operating system and Spectrum Scale versions can be upgraded according to the levels supported by Spectrum Archive EE [3]
One thing we learned was that there is no easy upgrade path from RHEL 6 to RHEL 7. Typically this requires a new installation. So we did not upgrade to h7.1 but 6.7.
The upgrade of Spectrum Scale can be done node by node and requires to shutdown GPFS on the node being upgraded. Make sure that upgrade level is compatible to the rest of the cluster.
During out test this step took the longest time.
3. Installing and configuring Spectrum Archive EE V1.2
This step starts after all Spectrum Archive nodes have been upgraded to the appropriate operating system and Spectrum Scale level. The Spectrum Scale cluster must be online and the Spectrum Archive nodes must be active with all file systems mounted.
The first task in this step is to install the already extracted Spectrum Archive EE V1.2 packages using the utility “ltfsee_install” with a new option named “-upgrade”.
After this step has successfully completed, the V1.1 configuration data that was saved in phase 1 needs to be imported to V1.2. This is done using another tool named “ltfsee_config_upgrade”. This utility uses the saved configuration data of phase 1 – including the file lists created for exported offline tapes and tapes that are not in a pool - and converts it to the V1.2 format. In addition it creates a metadata cache with content information for each tape in a pool. The duration of this step depends on the number of tapes and files on tape. Our tests have shown that this takes approx. 10 - 20 minutes for 1 million files.
After the configuration upgrade is complete you can start LTFS and Spectrum Archive and enjoy the new functionality and performance. Keep in mind some commands have changed to reflect the use of multi-tape libraries. In addition the help for the command “ltfsee” has been improved very much. For example you can enter the following command to get the help screen for the command “ltfsee info”:
# ltfsee info help
We recommend running some migration jobs and observe the performance improvement. Now you can also change the configuration and add a second node group with a second library.
Summary
Spectrum Archive EE version 1.2 allows transparent migration of files to two tape libraries. It support higher levels of Spectrum Scale software and allows using TS1150 WORM tapes. It provides significant performance improvements for migration and recall and reduces the footprint for the metadata.
The upgrade process from version 1.1 to version 1.2 is documented very well (see [2]) and can be done quickly depending on the additional software upgrades for operating systems and Spectrum Scale. Our upgrade took just a little bit more than 1 hour including the additional software upgrades.
References
[1] The Clipper Group - study comparing total cost of ownership for disk and tape
www.clipper.com/research/TCG2010054.pdf
[2] Spectrum Archive EE upgrade procedure from V1.1 to V1.2
http://www.ibm.com/support/knowledgecenter/en/ST9MBR_1.2.1/ltfs_ee_how_to_upgrade_from_version_11x.html
[3] Spectrum Scale and operating system levels supported by Spectrum Archive EE:
http://www.ibm.com/support/knowledgecenter/en/ST9MBR_1.2.1/ltfs_ee_system_reqs.html
[4] Spectrum Archive metadata capacity sizing:
http://www.ibm.com/support/knowledgecenter/en/ST9MBR_1.2.1/ltfs_ee_configuring_prepare_gpfs_graphs.html
[5] Linear Tape File System (LTFS) Standard maintained by SNIA and ISO:
http://www.snia.org/tech_activities/standards/curr_standards/ltfs
http://www.iso.org/iso/catalogue_detail.htm?csnumber=69458
[6] Spectrum Scale ILM Policies – A practical guide
http://w3-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102642
#DS8000