Using IBM Spectrum Discover to facilitate a workflow for tape optimized recalls in tiered storage file system with tape
Introduction
This article presents a solution that addresses challenges of tiered storage file systems with tape by leveraging metadata management. Tiered storage file systems with tape are primarily used for archiving large volumes of data for long periods of time because tape storage is cheap. A key challenge of tiered storage file system with tape are transparent recalls, that are caused by users when accessing migrated files. To learn more about these challenges, have a look at this block article: [https://community.ibm.com/community/user/storage/blogs/nils-haustein1/2020/01/14/managing-files-in-tiered-storage].
A metadata management system like IBM Spectrum Discover automatically catalogs the system metadata of files stored in a tiered storage file system and allows users to display the migration states for files. The user can add additional metadata that can be used to recall files in an optimized manner. In this article the solution architectures, workflows, and programming resources for optimizing access of files on tape are described. First, I summarized the architecture and challenges of tiered storage file systems with tape.
Tiered storage file system with tape
The foundation of the solution architecture is a tiered storage files system with tape. A tiered storage file system - shown in Figure 1 - provides disk and tape storage within a global file system namespace. A Hierarchical Storage Management component (HSM) transparently moves files from disk to tape and vice versa. The file system name space is accessible by users and applications through standard file system protocols such as NFS, SMB and POSIX.
The movement of files between disk and tape is managed by the HSM component. The process of moving files to tape is called migration. After migration the files is still visible in the file system, however the content of the file resides on tape. When a migrated file is accessed in the file system, then the HSM component recalls the file.
There are two types of recalls: transparent and tape optimized recall. The transparent recall is performed when the user accesses the file in the file system. The tape optimized recall is initiated with an administrative command provided by the HSM component. The tape optimized recall is much faster when multiple files must be recalled because the tape optimized recall sorts the files provided in a file list by the tape-ID and the location on tape. This blog article gives some ideas about the duration of transparent and tape optimized recalls [https://community.ibm.com/community/user/storage/blogs/nils-haustein1/2022/01/07/duration-of-optimized-and-normal-recalls]
Challenges
As described in the blog article [[https://community.ibm.com/community/user/storage/blogs/nils-haustein1/2020/01/14/managing-files-in-tiered-storage], tiered storage file systems with tape are a blessing and a curse. The blessing is that the user can see all files regardless of whether files are stored on disk or on tape. Cursing starts when the user opens a file that is stored on tape because the recall from tape takes one or more minutes. Unfortunately, the user is not aware that the file is on tape because standard file systems do not provide the information about the storage location of the file (disk or tape).
It gets worse if the user simultaneously opens several files that are on tapes. This causes even longer waiting times because transparent recalls are not tape optimized. These challenges are amplified by standard file systems that make the user think all files are instantly available but access to files on tape takes time.
The solution
To address the challenges associated with recalls of many files in a tiered storage file system the user of the file system can leverage metadata management. With metadata management the user can easily determine if files are stored on tape. Furthermore, the use can add custom tags to files that must be recalled in an optimized manner.
As shown in Figure 2, the tiered storage file system is provided by IBM Spectrum Scale. The HSM component providing the tape storage tier within the tiered storage file system is represented by IBM Spectrum Archive Enterprise Edition. It can also be IBM Spectrum Protect for Space Management. Metadata management is provided by IBM Spectrum Discover. These components are briefly described below.
Figure 2: The solution architecture
IBM Spectrum Scale is a clustered parallel file system that can be used to store billions of files on the most appropriate storage tier based on Information Lifecycle Management (ILM) policies. Migration of files to tape can be automated using the IBM Spectrum Scale policy engine. The policy engine executes rules provided by the administrator. A rule may define to migrate files that are larger than 10 MB and have not been accessed for more than 30 days. The policy engine selects files that match the rule and passes the selected file names to the HSM component.
IBM Spectrum Archive Enterprise Edition is the HSM component, providing a storage tier on tape within the IBM Spectrum Scale file system. IBM Spectrum Archive Enterprise Edition manages access to the tapes. In fact, it writes the files to tapes in the standardized Linear Tape Filesystem format (LTFS). IBM Spectrum Archive also provides a tape optimized recall function that allows recalling many files in a tape optimized manner.
IBM Spectrum Discover is an extensible metadata management software that provides data insight for unstructured data by scanning and cataloging metadata from storage systems, like IBM Spectrum Scale file systems. Each file in the IBM Spectrum Scale file system has system metadata, such as file and path names, time stamps, file size, storage tier, migration state, permissions, and other attributes. This system metadata is captured and cataloged by IBM Spectrum Discover through automated scans or live events. Each file in the IBM Spectrum Scale file system is represented by one metadata record in the IBM Spectrum Discover metadata catalog. Users can search for metadata records using SQL queries through the IBM Spectrum Discover Graphical User Interface (GUI) or REST APIs.
In addition, IBM Spectrum Discover allows enriching existing metadata records with additional information that are provided by custom tags. A tag is a key value pair which can be added manually or automatically to a selected set of metadata records. Auto-tagging policies are used to add custom tags to metadata records. An auto-tagging policy can be configured to run periodically or on-demand through the IBM Spectrum Discover Graphical User Interface (GUI) or REST APIs.
IBM Spectrum Discover provides role- based-access for users to manage metadata records via the GUI or REST APIs. For example, users with the Data Admin role can create data source connection, manage all metadata records, and create collections. Users with the Data User role can manage a subset of metadata records that is provided in a collection. A collection groups subset of metadata records that are selected by filter criteria. For example, the metadata visibility for a user with the Data User role can be limited to a fileset of an IBM Spectrum Scale file system.
Preparation
Once the solution shown in Figure 2 is deployed, IBM Spectrum Discover can scan and catalog the metadata from the IBM Spectrum Scale file system which is space managed by IBM Spectrum Archive Enterprise Edition.
In this section the preparation tasks allowing the user to query the migration state of files and adding tags for recalls are described. These preparation tasks create the required components in the IBM Spectrum Discover server, including an administrative user, the data source connection for the IBM Spectrum Scale file system, a collection including all relevant metadata records for the partition in the file system that is space managed, tags, and auto-tagging policies. The next section Workflows builds upon these components and describes the workflows accommodating tape optimized recalls leveraging metadata management.
The preparation tasks are conducted using the IBM Spectrum Discover GUI.
Create an administrative user
To perform the subsequent preparation tasks a new user is created in the IBM Spectrum Discover server. The new user must have the privileges to create data source connection, create collections and policies, create tags, and create auto-tagging polices. Figure 3 gives an example for the user creation using the IBM Spectrum Discover GUI:
Figure 3: Create administrative user
The username of the new user is archiveadmin and the user has the role Data Admin. With this role the user has the privileges to perform all required tasks.
Note, the GUI user that is used to create the new user must have the role Admin, because only this role allows creating new users. The default user with the Admin role is: sdadmin.
Create data source connection
The user archiveadmin with Data Admin privileges can now create a data source connection. Figure 4 shows an example for configuring an IBM Spectrum Scale file system as data source using the IBM Spectrum Discover GUI:
Figure 4: Create data source connection for IBM Spectrum Scale file system
The data source connection name is archive. The IBM Spectrum Scale file system to be scanned is named fs1 and mounted under /ibm/fs1 within the IBM Spectrum Scale cluster. IBM Spectrum Discovers scans the file system daily at 6 AM.
Once the data source connection for the IBM Spectrum Scale file system has been established and scanned, the metadata records cataloged by IBM Spectrum Discover can be queried. Metadata records can be queried by using the IBM Spectrum Discover GUI or REST APIs. In the example shown in Figure 5, the visual query builder of the GUI is used to query for all metadata records in the data source connection created before. In the visual query builder to tag Data Source with the value fs1 was selected:
Figure 5.: Query metadata records in data source
The query condition for the query shown in Figure 4 was: datasource IN (‘fs1’). The name ‘fs1’ is the name of the IBM Spectrum Scale file system data source. There were a total of 632 files cataloged by Spectrum Discover in the data source connection archive. The individual records provide information about the migration state (column state). Notice that some files were migrated by IBM Spectrum Archive EE, and some are in state pre-migrated.
Create collection with policy
The IBM Spectrum Scale file system fs1 has multiple partitions (filesets). One partition - the fileset named discover1 under path /ibm/fs/discover1 - is managed by IBM Spectrum Archive EE. To limit the metadata record scope for subsequent queries to files stored in fileset discover1, a collection can be created. A collection is a subset of metadata records that are selected by a filter. Figure 6 shows the definition of the collection named archivecollection that contains metadata record for files stored in fileset discover1. This collection can be created by the archiveadmin user.
Figure 6: Create Collection
The collection archivecollection contains metadata records that match the filter. The filter shown in Figure 6 selects files that are stored in the Spectrum Scale file system data source fs1 and fileset discover1. All metadata records matching this filter automatically get a custom tag with key collection and value archivecollection assigned. This is accomplished by an auto-tagging policy that is automatically created and executed when defining the collection policy shown in Figure 6.
The automatically created auto-tagging policy is named archivecollection_tagpolicy. Figure 7 shows the properties of the collection policy archivecollection_tagpolicy:
Figure 7: Collection policy that tags files with the collection tag
As shown in Figure 7, the auto-tagging policy of the collection archivecollection tags all metadata records that match the filter with the collection tag set to archivecollection. This auto-tagging policy automatically runs every day at 7 am, after the scan of the data source.
The content of the collection archivecollection can be checked with the visual query builder, as shown in Figure 8:
Figure 8: Query metadata records in collection
The query performed was: collection IN (‘archivecollection’). There were a total of 511 files that have the collection tag set to archivecollection. The individual records provide information about the migration state (column state). Some files were migrated by IBM Spectrum Archive EE and some are in state resident.
Create tag
Tags are used to add custom metadata to metadata records in an IBM Spectrum Discover system. As shown above, metadata records pertaining to a collection have a collection tag with the collection name as value. Likewise, the user can create own tags and assign these to metadata records using policies. Figure 9 shows an example for creating a tag with the key recallMe:
Figure 9: Create tag
The tag name or key is recallMe. The type of the tag is open. An open tag can have any string value with the length of up to 256 bytes. In the subsequent examples Boolean values (true and false) will be used. Note, another reasonable tag type in this context is restricted. A restricted tag has a fixed value, which must be defined when the tag is created. For example, the values of a restricted tag can be true or false, or yes or no. To simplify the operations, we use the tag type open.
Create auto-tagging policy
The tag recallMe is used to indicate that a file must be recalled, when the tag value is set to true, and the file state is migrated. The assignment of tags is accommodated by auto-tagging policies. Figure 10 shows an auto-tagging policy that adds the custom tag recallMe=true for files in collection archivecollection that are migrated and stored in path /ibm/fs1/discover1/test1:
Figure 10: Auto-tagging policy for migrated files in path test1
The auto-tagging policy is named recall-test1. After creating and running this policy, all files in path /ibm/fs1/discover1/test1 that are migrated have the tag recallMe=true. This can be checked using the following query:
collection='archivecollection'and path like '%/ibm/fs1/discover1/test1/%'
Figure 11 shows the query output including the value of the tag recallMe:
Figure 11: Query for files in path test1
As shown in figure 11, all files that are in state migrated have the tag recallMe=true. Files that are not migrated, have the tag recallMe=false or null.
Based on this custom tag recallMe, files can be recalled in a tape optimized manner. The next section explains how this can be done in an automated fashion.
Workflow
In this section a workflow is explained that allows users to tag migrated files to be recalled in a tape optimized manner. The execution of the workflow requires the solution to be prepared (see section Preparation).
After preparing the solution, IBM Spectrum Discover has cataloged the metadata from the IBM Spectrum Scale file system that is space managed by IBM Spectrum Archive Enterprise Edition. Furthermore, the collection named archivecollection represents the metadata records of space managed fileset. The user of the archivecollection can display the migration state for each file in this fileset using queries. Leveraging tags and auto-tagging policies, the user can add the tag recallMe=true for migrated files. An automated process queries the IBM Spectrum Discover server for file names in the collection achivecollection that are in migrated state and have the tag recalMe set to true. These files are now recalled in a tape optimized manner.
The foundation of the workflow presented here is that the user cannot open migrated files in the tiered storage file system. This can be achieved by configuring the HSM component to prevent transparent recalls (see step Preventing transparent recalls). If the user tries to open a migrated file, an error will be presented. This is where the workflow starts:
- The file system user displays the migration state of files in the tiered storage file system by leveraging the metadata catalog in IBM Spectrum Discover (see step Determine migration state)
- The user requires access to migrated files, and tags these files with the customer tag recallMe=true in the metadata catalog of the IBM Spectrum Discover server (see step Tag a file with recall tag).
- A background process periodically queries the metadata catalog to identify files that have the customer tag recallMe set to true and that are in migrated state. When files matching this search criteria the background process recalls these files in a tape optimized manner (see step Recall tagged files).
- Now the user can open the required files that he tagged prior.
- After the recall has completed, the background process updates the metadata catalog in the IBM Spectrum Discover server. It also sets the recallMe tag to false for files that where recalled. (see step Update metadata catalog).
The workflow examples presented below are based on the IBM Spectrum Discover REST APIs. The API calls are initiated with the curl command. These curl-commands can be easily embedded in programs providing the user command line tools to execute the workflows. These programs are explained in the section Programming resources.
Authenticating with the Spectrum Discover REST API
Before working with IBM Spectrum Discover using the REST APIs, the administrative user must be authenticated. During the preparation, the administrative user archiveadmin was created (see section Create an administrative user). To authenticate this user with the REST APIs, the following API call can be used:
# curl -k -u archiveadmin:$password https://$sdServer/auth/v1/token
The variable $password is the password for the user archiveadmin that was provided when this user was created. The variable $sdServer is the IBM Spectrum Discover server address. The API call above returns an authentication token, that can be used for subsequent API calls. This token expires after one hour and must be refreshed. In the subsequent examples the API authentication token is stored in variable $token.
Determine migration state
To determine the migration state of files the user provides a path and file name pattern of the files. With this information a SQL query can be executed to lists the migration state of the files. The user may be interested in additional information, such as the size, the value custom tag recallMe and the path and file name. This additional information can be displayed with the query results. The API call to select metadata records looks like this:
curl -k -H "Authorization: Bearer ${token}" https://$sdServer/db2whrest/v1/sql_query -X POST
-d"select path, filename, size, state, collection, $tagName from $sdDb where path like $pName and filename like $fName and collection in ('archivecollection')"
The select command is initiated with the sql_query endpoint of the API and uses the authentication $token that was obtained before. Let’s take a closer look at the SQL statement given with the -d parameter:
select path, filename, size, state, collection, recallMe from metaocean
where path like $pName and filename like $fName and collection in ('archivecollection')
This SQL statement selects the metadata fields path, filename, size, state, collection and recallMe from the IBM Spectrum Discover database which is named metaocean. These metadata fields are selected for all metadata records that match the path name and the file name provided by the user and the collection name archivecollection that was previously created (see section Create collection with policy). The path name is encoded in variable $pName, and the file name is encoded in variable $fName.
If the path name is /ibm/fs1/discover1/test1 and the file name is file_0.pdf, then the query returns the following output:
0,"/ibm/fs1/discover1/test1/","file_0.pdf",236550,"migrtd","archivecollection","false"
This output is still hard to read. It can be formatted and may look like this (see section lstag.sh - display migration status and tags for more details about formatting the output from the SQL query):
State Size recallMe Collection Path-and-Filename
-------- ---- -------- ---------- -----------------------------
migrtd 236550 false archivecollection /ibm/fs1/discover1/test1/file_0.pdf
This output shows the user that the file /ibm/fs1/discover1/test1/file_0.pdf is in state migrated and the recallMe tag is set to false.
Tag a file with recall tag
In the previous section, the user has determined that file_0.pdf is migrated and the tag recallMe is set to false. To get this file recalled along with other files in a tape optimized manner, the user can set the recallMe tag to true.
Setting the recallMe tag to true for a set of metadata records requires an auto-tagging policy, just like the policy recall-test1 that was created in section Create auto-tagging policy. The filter of this auto-tagging policy must be adjusted to select the required metadata records. If the user wants to set the recallMe tag to true for the file file /ibm/fs1/discover1/test1/file_0.pdf, then the filter looks like this:
path like '%/ibm/fs1/discover1/test1/%' and filename like 'file_0.pdf' state like 'migrtd' and collection in ('archivecollection')
This filter takes into account that the state of the file must be migrated, because it does not make sense to tag a file for recall that is not migrated. Furthermore, the scope of the metadata records is the archivecollection.
This filter can be applied for the existing auto-tagging policy named recall-test1, causing the old filter to be replaced with this new filter. The following API call updates the existing auto-tagging policy recall-test1 and applies a new filter:
curl -k -H "Authorization: Bearer ${token}"
https://$sdServer/policyengine/v1/policies/recall-test1
-d@policy_update.json -X PUT -H "Content-type: application/json"
The updated policy definition including the new filter is provided in file policy_update.json that must be created prior. The policy update looks like this:
# cat policy_update.json
{
"pol_id": "recall-test1",
"pol_filter": "path like '%/ibm/fs1/discover1/test1/%' and filename like 'file_0.pdf' and state like ‘migrtd’ and collection in ('archivecollection')",
"pol_state": "Active",
"action_id": "AUTOTAG",
"action_params": { "tags": {"recallMe": "true"} },
"schedule": "NOW"
}
This policy update is applied to the policy recall-test1. The new filter selects file_0.pdf from the archivecollection and sets the tag recallMe=true, if the file is in state migrated. The policy state is active, and the policy will be executed immediately (schedule=now).
After executing this policy with the new filter, the recallMe=true tag is set for the file file_0.pdf. To check this, the migration state of the file can be queried (see section Determine migration state) with the following result:
0,"/ibm/fs1/discover1/test1/","file_0.pdf",236550,"migrtd","archivecollection","true"
This output is hard to read and can be formatted like this:
State Size recallMe Collection Path-and-Filename
-------- ---- -------- ---------- -----------------------------
migrtd 236550 true archivecollection /ibm/fs1/discover1/test1/file_0.pdf
As shown above, the tag recallMe=true was set for the file.
In the next step files with the tag recallMe=true that are migrated can be recalled.
Recall tagged files
Metadata records for files with the tag recallMe set to true can be queried and recalled. To query metadata records with the tag recallMe=true and state=migrated in the collection archivecollection, the following SQL query can be used leveraging the REST API:
curl -k -H "Authorization: Bearer ${token}" https://$sdServer/db2whrest/v1/sql_query -X POST -d"select path, filename from metaocean where collection in ('archivecollection') and recallMe='true' and state='migrtd'"
The select command is initiated with the sql_query endpoint of the API. Let’s take a closer look at the SQL statement given with the -d parameter:
select path, filename from metaocean where collection in ('archivecollection') and recallMe='true' and state='migrtd'
This SQL statement selects the metadata path and filename from the IBM Spectrum Discover database metaocean. These metadata fields are selected for all metadata records where the tag recallMe is true and the state is migrated. The result of this API call is:
0,"/ibm/fs1/discover1/test1/","file_8.pdf"
1,"/ibm/fs1/discover1/test1/","file_9.pdf"
2,"/ibm/fs1/discover1/test1/","file_0.pdf"
3,"/ibm/fs1/discover1/test1/","file_2.pdf"
4,"/ibm/fs1/discover1/test1/","file_4.pdf"
5,"/ibm/fs1/discover1/test1/","file_5.pdf"
6,"/ibm/fs1/discover1/test1/","file_6.pdf"
7,"/ibm/fs1/discover1/test1/","file_7.pdf"
The selected metadata records can be stored in a list with one concatenated path and file name per line and passed to the tape optimized recall command. With IBM Spectrum Archive EE, the tape optimized recall is executed with the command:
eeadm recall
filelist
The parameter filelist is the name of the file that includes the selected path and file names. This list looks like this:
/ibm/fs1/discover1/test1/file_8.pdf
/ibm/fs1/discover1/test1/file_9.pdf
/ibm/fs1/discover1/test1/file_0.pdf
/ibm/fs1/discover1/test1/file_2.pdf
/ibm/fs1/discover1/test1/file_4.pdf
/ibm/fs1/discover1/test1/file_5.pdf
/ibm/fs1/discover1/test1/file_6.pdf
/ibm/fs1/discover1/test1/file_7.pdf
The recall of tagged migrated files can be executed by a scheduled program and must run on an IBM Spectrum Archive EE server to execute the tape optimized recall command.
Update metadata catalog
When the recall finished successfully, then the recalled files are in status pre-migrated. This status is not immediately reflected in the IBM Spectrum Discover server. It requires to scan the data source. After scanning the data source, the collection archivecollection must be updated to reflect the metadata changes in the file system. This requires running the collection policy for the archivecollection. Finally, the recallMe tag that was set to true for migrated files should be set to false for all files in the collection that are no longer in state migrated. This can be done with an additional auto-tagging policy. These three steps are described below.
Scanning the data source named archive can be accomplished with the following API call. The data source archive was created in section Create data source connection.
curl -H "Authorization: Bearer ${token}" -k https://$sdServer/connmgr/v1/scan/archive -X POST
-H "Content-type: application/json"
The scan updates migration state for the recalled files to pre-migrated in the IBM Spectrum Discover catalog. The scan status can be monitored with the following API call:
curl -H "Authorization: Bearer ${token}" -k https://$sdServer/connmgr/v1/scan/archive
To update the collection that was created in section Create collection with policy, the collection policy for the archivecollection must be executed. The following API call can be used for this.
curl -H "Authorization: Bearer ${token}" -k https://$sdServer/policyengine/v1/policies/$collPolicy/start -X POST
Newly added files are now available in the archivecollection.
Setting the tag recallMe to false for files that were recalled can be done with an additional auto-tagging policy. This auto-tagging policy can be created and executed using the GUI or REST API. The following example shows how to create and execute this policy using the REST API.
curl -k -H "Authorization: Bearer ${token}" https://$sdServer/policyengine/v1/policies -d@policy.json -X POST -H "Content-Type: application/json"
The policy definition is provided in file policy.json and looks like this:
# cat policy.json
{
"pol_id": "recall-reset-test1",
"pol_filter": "collection in ('archivecollection') and recallMe='true' and state not like 'migrtd'",
"pol_state": "Active",
"action_id": "AUTOTAG",
"action_params": { "tags": {"recallMe": "false"} },
"schedule": "NOW"
}
The auto-tagging policy is named recall-reset-test1. This policy selects metadata records in archivecollection that have the recallMe tag set to true and where the state is not migrated. For the selected metadata records the tag recallMe is set to false. This auto-tagging policy is executed immediately (schedule=now).
In summary, the three steps required to update the IBM Spectrum Discover metadata catalog after the recall comprise:
- Scanning the data source to update the migration state of metadata records after tape optimized recalls
- Running the collection policy to reflect the changes in the data source
- Set the recallMe tag to false for all files that are not migrated
These steps can run in an automated fashion, perhaps right after the recall of tagged files has finished. These steps can run on a server that can access the IBM Spectrum Discover system.
Preventing transparent recalls
To leverage the workflows above transparent recalls in the tiered storage file system with tape should be disabled. This can be done leveraging capabilities of the HSM components.
With IBM Spectrum Archive Enterprise Edition, transparent recalls can be permanently disabled by running the command:
eeadm cluster set -a allow_transparent_recall -v no
Once this commend has run, IBM Spectrum Archive EE will automatically cancel transparent recalls before these recalls are queued. The user gets an error message and can execute the workflows explained in section Workflows. The current setting of the parameter controlling transparent recalls can be displayed using the following command:
eeadm cluster show
With IBM Spectrum Protect for Space Management transparent recalls can be permanently disabled by using the following client option. This option must be entered into the dsm.sys file:
hsmoptimizedrecallonly yes
After setting this option on all servers running the IBM Spectrum Protect for Space Management client, the recall daemons must be restarted. Ensure that there are no recalls in progress when doing this. To restart the recall daemons, run the following commands on one server:
# dsmkilld
# sleep 3
# dsmrecalld
Disabling transparent recalls applies to all file systems that are space managed by the HSM component. Users of the file system should be informed, that they cannot cause transparent recalls. Instead, they should leverage the workflows above to check the migration state for files and tag files that are required with the recallMe tag.
The functions explained in section Workflows can be implemented in programs, so the user does not have to deal with API calls. The next section presents some programming resource examples.
Programming resources
The workflows presented in section Workflows can be wrapped in programs that can be executed by the user via the command line. In fact, the GitHub repository [https://github.com/IBM/discover-tape-recall-integration] provides example programs for this.
Find below a short explanation of these programs.
lstag.sh - display migration status and tags
This program allows the user to display the migration state and the value of the tag recallMe for a given path and file name specification. It queries the IBM Spectrum Discover metadata catalog with the filter provided by the user. The filter is a path and file name specification and can either be a fully qualified path name or a fully qualified file name. Wildcards are not currently supported.
The example below shows the selected metadata fields for file in path /ibm/fs1/discover1/test1:
# lstag.sh /ibm/fs1/discover1/test1
State Size recallMe Collection Path-and-Filename
------ ------- -------- ---------- -------------------
migrtd 857088 true archivecollection /ibm/fs1/discover1/test1/file_8.pdf
migrtd 788480 true archivecollection /ibm/fs1/discover1/test1/file_9.pdf
migrtd 236550 true archivecollection /ibm/fs1/discover1/test1/file_0.pdf
premig 848896 false archivecollection /ibm/fs1/discover1/test1/file_1.pdf
migrtd 290816 true archivecollection /ibm/fs1/discover1/test1/file_2.pdf
premig 599040 false archivecollection /ibm/fs1/discover1/test1/file_3.pdf
migrtd 386048 true archivecollection /ibm/fs1/discover1/test1/file_4.pdf
migrtd 795648 true archivecollection /ibm/fs1/discover1/test1/file_5.pdf
migrtd 644096 true archivecollection /ibm/fs1/discover1/test1/file_6.pdf
migrtd 117760 true archivecollection /ibm/fs1/discover1/test1/file_7.pdf
ftag.sh - set the recallMe tag to true
This program allows the user to tag metadata records for a given path and file name specification with the tag recallMe=true. It updates and executes an auto-tagging policy in the IBM Spectrum Discover server that adds the tag recallMe=true to metadata records matching the path and file name specification and where the state is migrated. The user provided path and file name specification and can either be a fully qualified path name or a fully qualified file name. Wildcards are not currently supported.
In the example the file /ibm/fs1/discover1/test1/file_1.pdf is tagged with the recallMe=true tag. Before the tag is added, the state of the file is the following in IBM Spectrum Discover:
# ./lstag.sh /ibm/fs1/discover1/test1/file_1.pdf
State Size recallMe Collection Path-and-Filename
----- ------- -------- ---------- -----------------
migrtd 848896 false archivecollection /ibm/fs1/discover1/test1/file_1.pdf
Adding the tag:
# ./ftag.sh /ibm/fs1/discover1/test1/file_1.pdf
Info: checking if tag recallMe exists.
Info: creating and executing policy to tag the files
Finally, check the state again. The tag was successfully added:
# ./lstag.sh /ibm/fs1/discover1/test1/file_1.pdf
State Size recallMe Collection Path-and-Filename
----- ------- -------- ---------- -----------------
migrtd 848896 true archivecollection /ibm/fs1/discover1/test1/file_1.pdf
recallTagged.sh - recall tagged files
This program queries the metadata catalog for files in a specified collection that have the tag recallMe set to true and recalls these files. This program must run on an IBM Spectrum Archive server because it uses the eeadm recall command. The collection is provided as input parameter by the user.
The example below recalls all tagged files in the archivecollection:
# ./recallTagged.sh archivecollection
Info: Checking configuration parameters.
Info: obtaining file list from Spectrum Discover.
Info: recalling 10 files.
2021-12-31 10:33:51 GLESL268I: 10 file name(s) have been provided to recall.
2021-12-31 10:33:54 GLESL839I: All 10 file(s) has been successfully processed.
This program is not intended for use by the user of the file system. It is an administrative program that the administrator of the IBM Spectrum Archive EE system should use. This program can be scheduled to run in certain intervals.
Note, after recalling files using IBM Spectrum Archive EE, the metadata records in the IBM Spectrum Discover catalog are not automatically updated. An additional program is used to update the catalog.
scancol.sh - Update metadata catalog
This program updates the IBM Spectrum Discover catalog for a specified data source and collection. It first scans the data source provided by the user as input parameter. Then it runs the collection policy for the collection provided by the user as input parameter. Finally, it runs a auto-tagging policy that sets the recallMe tag to false for all files that are not migrated in the collection.
The example below shows how to run this program for the data source archive (created in section Create data source connection) and the collection archivecollection (created in section Create collection with policy):
# ./scancol.sh archive archivecollection
Info: Checking configuration parameters.
--------------------------------------------------------------------
Info: checking and scanning data source connection archive
Info: Data source connection archive exists, scanning it.
Info: status: Complete
--------------------------------------------------------------------
Info: checking if collection policy exists.
Info: Collection policy archivecollection_tagpolicy exists, starting it.
Info: status: complete
---------------------------------------------------------------------
Info: checking if policy to remove tag recallMe exists.
Info: Starting policy recallMeNot-policy to remove the tag recallMe.
Info: status: complete
After running the program scancol.sh, the tags state and recallMe are adjusted as show in the lstag.sh output below:
# lstag.sh /ibm/fs1/discover1/test1
State Size recallMe Collection Path-and-Filename
------ ------- -------- ---------- -------------------
premig 857088 false archivecollection /ibm/fs1/discover1/test1/file_8.pdf
premig 788480 false archivecollection /ibm/fs1/discover1/test1/file_9.pdf
premig 236550 false archivecollection /ibm/fs1/discover1/test1/file_0.pdf
premig 848896 false archivecollection /ibm/fs1/discover1/test1/file_1.pdf
premig 290816 false archivecollection /ibm/fs1/discover1/test1/file_2.pdf
premig 599040 false archivecollection /ibm/fs1/discover1/test1/file_3.pdf
premig 386048 false archivecollection /ibm/fs1/discover1/test1/file_4.pdf
premig 795648 false archivecollection /ibm/fs1/discover1/test1/file_5.pdf
premig 644096 false archivecollection /ibm/fs1/discover1/test1/file_6.pdf
premig 117760 false archivecollection /ibm/fs1/discover1/test1/file_7.pdf
This program is not intended for use by the user of the file system. It is an administrative program that the administrator of the IBM Spectrum Archive EE system should use. This program can be scheduled to run in certain intervals, perhaps it may be executed right after the tape optimized recall.
Appendix
Blog article: Best practices for managing file in tiered storage file systems with tape
https://community.ibm.com/community/user/storage/blogs/nils-haustein1/2020/01/14/managing-files-in-tiered-storage
GitHub repository with programming resources
https://github.com/IBM/discover-tape-recall-integration
Solution integrating IBM Spectrum Scale with IBM Spectrum Archive Enterprise Edition:
https://www.ibm.com/support/pages/node/6355579
Redbook: IBM Spectrum Discover metadata management
http://www.redbooks.ibm.com/abstracts/redp5550.html?Open
Linear Tape File System standard
https://www.snia.org/education/what-is-lfts
IBM Spectrum Archive EE option to prevent transparent recalls:
https://www.ibm.com/docs/en/spectrum-archive-ee/1.3.2?topic=cluster-eeadm-set
IBM Spectrum Protect for Space Management client option to prevent transparent recalls:
https://www.ibm.com/support/knowledgecenter/SSERBH_8.1.10/hsmul/r_opt_hsmoptimizedrecallonly.html
Disclaimer
The information contained in this documentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information provided, it is provided “as is” without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this documentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
Copyright license
This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis.
#Storage#PrimaryStorage#StorageManagementandReporting