File and Object Storage

 View Only

Challenges and solution for backing up data managed by IBM Spectrum Scale Transparent Cloud Tiering

By Nils Haustein posted Thu October 13, 2022 09:42 AM

  

In this article I describe challenges and solutions related to backing up an IBM Spectrum Scale file system that is managed by IBM Spectrum Scale Transparent Cloud Tiering.

Introduction

In this section I briefly describe the relevant components including IBM Spectrum Scale TCT and the IBM Spectrum Scale backup function and file attributes that are used to manage migrated files during backup.

TCT

IBM Spectrum Scale Transparent Cloud Tiering (IBM Spectrum Scale TCT) allows to free up IBM Spectrum Scale file system storage capacity, by moving out cooler data to the cloud storage [1]. For this purpose, one or more IBM Spectrum Scale cluster nodes are connected to cloud storage. Supported cloud storage are IBM Cloud® Object Storage, Amazon Web Services S3, and Microsoft Azure object storage service (block 'blob' only) [2].

IBM Spectrum Scale TCT can leverage the existing ILM policy engine available in IBM Spectrum Scale, and administrators can define policies to migrate files to cloud storage. After migration the file is still visible in the name space, however the file content resides in cloud storage. The file attributes stored in the name space include a pointer to the cloud storage. When a migrated file is accessed, then TCT intercepts the access request, recalls the file content from cloud storage into the name space and provides access to the file data. Files can also be pre-migrated from the IBM Spectrum Scale name space to the cloud storage. A pre-migrated file is dual resident: it resides in the on disk of the name space and in the cloud storage. When a pre-migrated file is migrated, then the file content must not be sent again to cloud storage. The file is just stubbed by deleting the file content from disk and adjusting the file attributes.

Conceptionally, IBM Spectrum Scale TCT works like hierarchical storage management systems components (HSM components) such as IBM Spectrum Archive Enterprise Edition and IBM Spectrum Protect for Space Management. However, there are some significant differences regarding file attributes that influence the backup behavior.

Backup

IBM Spectrum Scale provides a backup function that integrates with IBM Spectrum Protect. The backup function allows backing up files from file system and independent filesets to one or more IBM Spectrum Protect servers. The IBM Spectrum Scale backup function is represented by the command mmbackup [3].

The IBM Spectrum Scale backup function is scalable because it leverages the IBM Spectrum Scale policy engine to quickly identify files that are candidate for backup. The backup process can be scaled across multiple IBM Spectrum Scale cluster nodes, whereby each backup-node backs up a subset of the files that were identified as candidates.

When a file system is managed by an HSM component, then it is important to not automatically backup files that are migrated, because this can cause recall storms. The IBM Spectrum Scale backup function has built-in awareness for files that were migrated by HSM component. If migrated files are selected as candidates for backup, then mmbackup by default skips these files for backup. This behavior can be overruled with the mmbackup parameter --backup-migrated. This parameter is not default and must be set explicitly.

However, the IBM Spectrum Scale backup function is not aware of files that were migrated by TCT. The reason is that TCT uses different file attributes than HSM components. Consequently, files migrated by TCT that are selected as backup candidate are recalled from cloud storage and backed up.  This can cause un-wanted recall storm.

 

File attributes

In this section we take a deeper look into the file attributes that are managed by HSM components and TCT.

HSM components that integrate with IBM Spectrum Scale such as IBM Spectrum Archive Enterprise Edition and IBM Spectrum Protect for Space Management share a common implementation. They use the same file attributes to mark the file migration status and supply information about the location of the file content.  The file attributes of a file migrated by IBM Spectrum Archive Enterprise Edition look like this:

# mmlsattr -L -d /gpfs/archive/file0
file name:            /gpfs/archive/file0
metadata replication: 1 max 2
data replication:     1 max 2
immutable:            no
appendOnly:           no
flags:               
storage pool name:    system
fileset name:         root
snapshot name:
creation time:        Wed Jun  1 15:29:09 2022
Misc attributes:      ARCHIVE OFFLINE
Encrypted:            no
dmapi.IBMSGEN#:       "1"
dmapi.IBMUID:         "7449889632718415285-5327770825292684041-44978310-22300-0"
dmapi.PREMTM0:        "1654090149"
dmapi.PRETPS0:        "A10020JC@b3f37b95-76b9-4169-9802-a8a1aa7f4130:b9f525a7-29fa-4966-b17e-282f8e0ebbe0@ON"
dmapi.IBMProv:        "ltfs????"
dmapi.IBMTPS:         "1 A10020JC@b3f37b95-76b9-4169-9802-a8a1aa7f4130@b9f525a7-29fa-4966-b17e-282f8e0ebbe0"
dmapi.IBMObj:         "??????????)q??Pz?|???????#?P?(?????? ???????????????????????????????????????????????????????????????????????????????????????"
gpfs.dmapi.region:    0x000000000000000000000000000000000700000000000000

 

The file attributes that describe the migration state are the once in the dmapi and gpfs.dmapi name space. For example, the attribute dmapi.IBMTPS  includes the tape ID and the pool ID and library ID where the file content was migrated to. The attribute dmapi.IBMObj is only set when a file is in migrated state. The file attributed dmapi.IBMPMig (not shown above) is only set for files in pre-migrated

 

Contrary, IBM Spectrum Scale TCT does not use the same file attributes names. The file attributes of a file migrated by TCT look like this:

# mmlsattr -L -d /gpfs/tct/file1
file name:            /gpfs/tct/file1
metadata replication: 1 max 2
data replication:     1 max 2
immutable:            no
appendOnly:           no
flags:
storage pool name:    system
fileset name:         root
snapshot name:
creation time:        Tue May  3 20:57:42 2022
Misc attributes:      ARCHIVE OFFLINE
Encrypted:            no
gpfs.dmapi.region:    0xFFFFFFFFFFFFFFFF00000000000000000700000000000000
dmapi.MCEA:           "REV1N???????br??????(??$????br??????(??$????????????????t?B?????L6qb??+Z??????????????????S?bq5?L6qb??+Z??????????????????S?bq5???????l????t$???$???????"

 

TCT does not use the attribute dmapi.IBMObj to mark a migrated file or the file attribute dmapi.IBMPMig to mark a pre-migrated file. It uses the attribute dmapi.MCEA to encode the migration state and location of the file data in cloud storage.

These incompatibility between TCT and HSM components causes challenges with backup.

 

File migration state

There are also differences between TCT and HSM regarding file migration states. The following figure shows the file migration states and transitions with TCT:


Important to notice is that writing or changing a migrated file does not transition the file to resident state, but to co-resident state. Reading a migrated file also transitions it to co-resident state. Thus, a co-resident file might have been changed or not. This state is ambiguous.

 

The file migration with HSM are shown in the following figure:

As shown in the figure above, a migrated file transitions to resident state if it was written or changed. A migrated file transitions to pre-migrated state if it was read or recalled.

 

The backup challenge and solution

The IBM Spectrum Scale mmbackup process has a logic to determine migrated files by evaluating the presence of the file attribute dmapi.IBMObj. If this attribute is not set, then backup process assumes the file is not migrated. Likewise, mmbackup evaluates the presence of the file attribute dmapi.IBMPMig to determine if the file is pre-migrated. If both attributes are not set, then mmbackup concludes the file is in resident state.

Because TCT does not use the file attribute dmapi.IBMObj and dmapi.IBMPMig, the backup process assumes the file is not migrated and recalls migrated files in case it requires a backup. This can cause recall storms in TCT managed name spaces.

 

Solution

The goal of this solution is to make mmbackup aware of files that are migrated by TCT. For this purpose, mmbackup offers the option to leverage a custom policy file that is used for file identification. The custom policy file can be adjusted to provide mmbackup a hint that a file is migrated.

 

Note, changes to the policy file used by mmbackup are on your own risk. IBM cannot be made liable for damages. Furthermore, the content of the policy file may change in the future with the result that the adjustments described below have no effect.

 

The first step is to obtain the policy file that is used by mmbackup. to identify files that are candidates for backup. This can be done by setting an environment variable [3]:

# export DEBUGmmbackup=2

And performing a normal backup operation for the subject name space:

# mmbackup fsname [futher-mmbackup-parameters] 

The environment variable causes the backup operation to store the policy file in /var/mmfs/mmbackup/.mmbackupRules.fsname

The prefix fsname is the name of the file system in scope for the backup operation

 

The next step is to copy the policy file to a different directory and make the following adjustment:

The second rule in the policy may look like this:

RULE 'BackupRule' LIST 'mmbackup.1.tsm01' DIRECTORIES_PLUS
     SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME)
     || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME)
     || ' ' || 'resdnt' )
WHERE … 

Add a case-statement to the end of the second rule in place of the token ‘resdnt’ of the original rule above:

RULE 'BackupRule' LIST 'mmbackup.1.tsm01' DIRECTORIES_PLUS
     SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME)
     || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME)
     || ' ' || (CASE WHEN XATTR('dmapi.MCEA') IS NOT NULL
     AND (KB_ALLOCATED = 0) AND (FILE_SIZE > 0 ) THEN 'migrat'
     WHEN XATTR('dmapi.MCEA') IS NOT NULL AND (KB_ALLOCATED > 0)
     AND (FILE_SIZE > 0 ) THEN 'premig'
     ELSE 'resdnt' END) )
  WHERE …

The case-statement causes an additional parameter to be added for each entry in the file list that reflects the file migration state (migrat, premig, resdnt). The file migration state is derived based on the presence of the file attribute dmapi.MCEA, the file size and the blocks allocated on disk. This additional parameter is interpreted by the mmbackup process and causes files that have the migration state migrat to be skipped.

Note, the case-statement above does not work for backups from snapshots, because the file attributed KB_ALLOCATED is set to 0 in a snapshot. When using this policy for backups from snapshots, then files in status co-resident are marked as migrated and skipped from backup when selected as backup candidate.

 

From now on, the mmbackup process is started with the adjusted policy file:

# mmbackup fsname -P adjusted-policy-file [further-mmbackup-paramters] 

 

If mmbackup find migrated files as candidates, then these migrated files are skipped with a warning indicating:

mmbackup:Some changed files will be skipped to avoid HSM recall. See /gpfs/tct/mmbackup.hsmMigFiles.tsm01 for detail

The file referenced by this warning (/gpfs/tct/mmbackup.hsmMigFiles.tsm01) contains the names of the migrated files, that where skipped.

Subsequently, the administrator can recall the skipped files and run another backup using the adjusted policy. This subsequent backup operation backs up the skipped and recalled files and clears the warning message. Afterwards the administrator can migrate the recalled files again and continue the periodic backup operations using the adjusted policy file.

Some limitations apply to this solution.

Limitations

The solution presented above has some limitations:

  • Only files with a size larger than 4 KB should be migrated. Otherwise, the file may be stored in the inode and the allocated capacity on disk is 0. This causes that a small file is always identified as migrated file, even though it may not be in migrated state.
  • The stub size configured with TCT must be 0. Otherwise, portions of the file are stored on disk after migration and the allocated capacity of a migrated file is not 0. This causes that migrated files with a stub size larger than 0 are not identified as migrated files and recalled from cloud storage. If the stub size is larger than 0 KB then consider the Alternative-case-statement.
  • The backup operation does not work from snapshots, because the file attribute KB_ALLOCATED is always set to 0 in snapshots. This causes files in status co-resident to be skipped for backup when selected as backup candidate.
  • Changes to the policy file used by mmbackup are on your own risk. IBM cannot be made liable for damages. Furthermore, the content of the policy file may change in the future with the result that the adjustments described below have no effect.

 

References

[1] Introduction to IBM Spectrum Scale transparent cloud tiering

https://www.ibm.com/docs/en/spectrum-scale/5.1.3?topic=overview-introduction-cloud-services#concept_ty4_jhz_v5

[2] Support cloud storage provider for IBM Spectrum Scale TCT

https://www.ibm.com/docs/en/spectrum-scale/5.1.3?topic=services-supported-cloud-providers#concept_o33_qhs_lv

[3] IBM Spectrum Scale mmbackup command:

https://www.ibm.com/docs/en/spectrum-scale/5.1.3?topic=reference-mmbackup-command

 

Alternative case statement

An alternative way to derive the migration state evaluates the file attribute MISC_ATTRIBUTE. The migration state is directly encoded in this attribute. Other hierarchical storage management clients also use the MISC_ATTRIBUTE to encode the migration state. This case statement could be used if the stub size is larger than 0 KB. The alternative case-statement looks like this:

RULE 'BackupRule' LIST 'mmbackup.1.tsm01' DIRECTORIES_PLUS
     SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME)
     || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME)
     || ' ' || (CASE WHEN (MISC_ATTRIBUTES LIKE '%V%') THEN 'migrat'
     WHEN (MISC_ATTRIBUTES LIKE '%M%' AND MISC_ATTRIBUTES NOT LIKE '%V%')
   THEN 'premig'
     ELSE 'resdnt' END) )
  WHERE …

 

Note, the case-statement above does not work for backups from snapshots, because the file attributed MISC_ATTRIBUTES does not reflect the migration state in snapshots. When using this policy for backups from snapshots, then files in status co-managed and non-resident are marked as resident and are backed up.


#Featured-area-3
2 comments
18 views

Permalink

Comments

Wed November 30, 2022 11:29 AM

Thanks Naren, good point. I updated the article reflecting stub size greater then 0.

Mon November 28, 2022 12:37 AM

Hi Nils,

Thanks for the detailed and very useful explanation. I note that the script checks for KB allocated > 0, however, what about situations where there is a stubSize that is > 0 KB as is often becoming a requirements for application to be able to show thumbnails etc... Would it be better to only rely on the alternate method which uses MISC_ATTRIBUTES? 

Thanks,

-Naren