Primary Storage

Configuring Elastic Storage ILM with more than two disk pools and a tape pool

By Archive User posted Mon December 15, 2014 04:55 AM

  

Originally posted by: Nils Haustein


I would like to share some experiences configuring Information Lifecycle Management (ILM) with Elastic Storage GPFS 4.1, in particular regarding automatic migration with more than two storage pools and tape being one of  them. The ILM function in GPFS allows to seamlessly and transparently migrate data from one file system storage pool to another. A file system storage pool is a collection of storage devices with the same characteristic.

 

GPFS ILM Function and Policies

For example a file system might have two internal pools on disk. One internal pool can be based on SSD drives allowing fast data processing and another internal pool can be based on normal disk drives allow greater capacities at lower cost. Let’s name the SSD pool “system” and the disk pool “silver”. The goal of the two pools is to store the important data on SSD and move this data to the disk pool (silver) when the SSD pool reaches 90 %. The GPFS ILM function allows creating placement and migration policies to facilitate this behavior.

 

With placement policies the internal pool for a new file can be assigned. Let’s assume that all files except of *.mp3 files should be placed on the fast SSD pool (system). The corresponding placement policy looks like this:

 

RULE 'mp3' SET POOL 'silver' WHERE ((LOWER(NAME) LIKE '%.mp3')

RULE 'default' SET POOL 'system'

 

This placement policy indicates that files ending with .mp3 are placed on pool silver, while all other files (default) are placed on pool system (SSD pool):

 

With migration policies files which had been accessed for the longest time ago can be migrated from the SSD pool to disk pool when the SSD pool becomes 90% full. The corresponding migration policy looks like this.

 

RULE 'systemMigration' MIGRATE FROM POOL 'system' THRESHOLD(90,70) WEIGHT(CURRENT_TIMESTAMP - ACCESS_TIME) TO POOL 'silver'

 

This migration policy instructs GPFS to migrate files from pool ‘system’ (SSD pool) to pool ‘silver’ when the system pool reaches a threshold of 90%. The selection of files is based on access time – as denoted by the WEIGHT parameter - whereby oldest files are migrated first. The migration stops when the system pool has reached 70 %.

 

In order to start this migration automatically, the policy above must be applied to the file system and a callback for migration must be configured. To apply this policy to the file system the following command can be used. Please notice that the policy-file must include the placement and the migration policies outlined above:

 

mmchpolicy filesystem-name –P policy-file

GPFS migration callback

A callback in GPFS is triggered upon certain events and invokes a script. The callback also allows sending additional parameters to this script. To configure a callback for migration, the events “lowDiskSpace” and “noDiskSpace” must be specified. There is a standard script which starts the policy configured for the file system called /usr/lpp/mmfs/bin/mmstartpolicy. The following command shows the configuration of a standard migration callback:

 

mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace,noDiskSpace --parms "%eventName %fsName”

 

This callback is named MIGRATION and triggered by the events “lowDiskSpace” and “noDiskSpace”. When one of these events is triggered by GPFS the callback invokes the script /usr/lpp/mmfs/bin/mmstartpolicy with the parameters eventName and fsname. The script /usr/lpp/mmfs/bin/mmstartpolicy will invoke the mmapplypolicy command with the file system name as parameter. The command mmapplypolicy will use the policy applied with mmchpolicy.

 

The challenge with the third pool on tape

Now let’s assume that a third pool should be used which represents a tier on tape. This third pool can be facilitated by LTFS Enterprise Edition. LTFS EE integrates with GPFS and can be used as an external storage pool, allowing migrating files from an internal storage pool, such as the silver pool to LTFS tape.

 

Picture 1 shows such a setup with three pools, two internal and one external:

image

 

The goal of this setup is to store and process the important data on the system pool represented by the SSD drives and migrate from the system pool to the silver pool (represented by disk) when the system pool is 90% full, as shown above. Furthermore the silver pool needs to be migrated at 90% to the external pool represented by LTFS EE.

 

The policy to migrate from the silver pool to LTFS EE looks like this:

 

RULE EXTERNAL POOL 'ltfs' EXEC '/opt/ibm/ltfsee/bin/ltfsee' OPTS 'tapepool'

RULE 'SilverMigration' MIGRATE FROM POOL 'silver' THRESHOLD(90,70) WEIGHT(CURRENT_TIMESTAMP - ACCESS_TIME)
TO POOL 'ltfs'

 

The policy above has two rules. The first rule defines the external pool ‘ltfs’ which is represented by the script /opt/ibm/ltfsee/bin/ltfsee. The OPTS parameter (tapepool) denotes the LTFS EE tape pool name which must be setup in LTFS EE. The second rule defines to migrate files from pool ‘silver’ (disk pool) to pool ‘ltfs’ when the silver pool reaches a threshold of 90%. The selection of files is based on access time whereby the oldest files are migrated first.

 

When configuring more than two migration policies for more two pools within a file system, the parameters for the policy engine (mmapplypolicy command) might have to be different. In particular, when migrating from an internal pool to another internal pool all GPFS nodes might be involved. When migrating from an internal pool to an external pool that is represented by LTFS EE,  only those nodes can be involved that have LTFS EE installed. For this reason it might be necessary to differentiate the parameters for the policy engine depending on source and destination for the migration. Thus, the migration between more than two pools in a file system requires different parameters for the mmapplypolicy command, which is invoked by the callback.

 

On the other hand, it is not possible to configure callbacks based on pools. A callback is configured for events and the event is identical regardless which pool triggers it. So the challenge is to create a callback that can differentiate between the pools and pass the proper parameters to the policy engine. One answer to this is using the parameter %storagePool which can be passed to the callback script. Based on the storage pool name the callback script sets the appropriate options for the policy engine.

 

Referring to the example above, if the source storage pool name is ‘system’ the parameters for the policy engine are default because the migration is done from one internal pool to another. If the source storage pool name is ‘silver’ – indicating that the migration from an internal pool to an external pool represented by LTFS EE is to be done – the parameters for the policy engine are not default. In particular the policy engine may be instructed to run the migration jobs only on certain nodes (parameter -N nodename), with a maximum of 3 thread during the execution (parameter -m 3) and with only one instance (parameter --single-instance).

 

In order to implement this, the callback script must be adjusted to implement the logic above. The callback is configured to invoke the adjusted callback script (/root/silo/callback/mystartpolicy) with the additional parameter %storagePool:

 

mmaddcallback MIGRATION --command /root/silo/callback/mystartpolicy
--event lowDiskSpace,noDiskSpace
--parms "%eventName %fsName %storagePool"

 

The script root/silo/callback/mystartpolicy is an exact copy of the standard script /usr/lpp/mmfs/bin/mmstartpolicy with the following additional changes:

 

The first change is in the area where the parameters are parsed and assigned. The parameter %storagePool is arg3 which is assigned to the variable pool. In addition the rest of the options must shift 3 instead of 2 as shown in blue below:

 

eventType=$arg1

device=$arg2

deviceName=${device##+(/)dev+(/)}  # Name stripped of /dev/ prefix.

#New get arg3 to be the pool

pool=$arg3

#Changed: shift 3 instead of 2

#shift 2

shift 3

options=$@

 

The next change is at the beginning of the main section. Here the variable “options” - which are the mmapplypolicy options - are set in accordance the pool variable. In this example when the pool is system, no special options are set because the system pool is migrated to the silver pool which is also an internal pool. If the pool name is silver then special mmapplypolicy parameters are assigned to the options variable in order to facilitate the requirements for LTFS EE. The new lines in the callback script following the main body of code comment is shown below in blue:

 

# main body of code

print -- "$(date): $mmcmd: $device generated event $eventType on node $ourNodeName"

 

#NEW depending on the pool set the right options

print -- "$(date): Poolname triggering the policy is $pool"

if [[ "$pool" != "system" ]]

then

  print -- "Info: setting mmapplypolicy options=$options -N node1 -m 3 -n 1 --single-instance"

  options=$options" -N node1 -m 3 -n 1 --single-instance"

else

  print -- "Info: setting mmapplypolicy options=$options"

  options=$options

fi

 

With this adjusted callback script the migration from pool system to silver will be invoked with other parameters than the migration from pool silver to LTFS EE. By the way the combined policy for this file system which needs to be activated using the mmchpolicy command looks like this:

 

/* migration for system to silver */

RULE 'systemMigration' MIGRATE FROM POOL 'system' THRESHOLD(90,70) WEIGHT(CURRENT_TIMESTAMP - ACCESS_TIME) TO POOL 'silver'

 

/* migration from silver to ltfs */

RULE EXTERNAL POOL 'ltfs' EXEC '/opt/ibm/ltfsee/bin/ltfsee' OPTS 'tapepool'

RULE 'SilverMigration' MIGRATE FROM POOL 'silver' THRESHOLD(90,70) WEIGHT(CURRENT_TIMESTAMP - ACCESS_TIME)
TO POOL 'ltfs'

 

/* placement policies */

RULE 'mp3' SET POOL 'silver' WHERE ((LOWER(NAME) LIKE '%.mp3')

RULE 'default' SET POOL 'system'

 

 

Summary

To efficiently migrate data in a file system with more than two file system storage pools and one of them being represented by LTFS EE, the callback configuration and script can be slightly adjusted. The callback script can be invoked with one additional parameter – the storage pool name – which is used to determine the proper options for the policy engine within the callback script.

 

Another more programming intensive - but yet more flexible approach - is to pass the parameters for the policy engine to the callback script using the –parms option and decode it in the callback script based on the storage pool name.

 

0 comments
2 views

Permalink