File and Object Storage

File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform 

 View Only

Archiving data using Storage Scale to two Object stores

By MAARTEN KREUGER posted Wed May 21, 2025 09:48 AM

  

Archiving data using Storage Scale to two Object stores

IBM Storage Scale is a versatile tool to manage data and place it on the optimal storage tier. Within a filesystem this can be achieved using placement or migration policies that can transparantly move the data of a file from one storage pool to another, while keeping the file metadata (filename, times, acls, etc) in the original place. These policies can apply to a single file, a whole fileset, or a set of files defined by an SQL statement.

Moving data out of a Scale file system is currently supported by defining external storage pools, which need a transport mechanism to move the data to an external storage provider. At the moment IBM Storage Protect (formerly TSM) and IBM Storage Archive (formerly LTFS EE) are supported which can provide an offload to tape. IBM Storage Ceph can use multi-zone replication across two sites as well. These additional software products need acquiring, installation, configuration, and management. This adds cost and complexity to the overall solution, which can certainly be worth it at scale, but can be too much for a small installation. There is a need to duplicate data across data centers while keeping it simple. Enter AFM to cloud Object Store.

Using the Scale AFM to Cloud Object Store feature allows a Scale Active File management (AFM) fileset (not an entire filesystem) to be mapped to an S3 object store. All files written to this fileset are automatically uploaded to a bucket and if the fileset quota are exceeded the file data is evicted and space is freed up. A bucket can be accessed from multiple Scale clusters if needed, allowing access to the data for DR or transport purposes. The per TB Scale licensing fee is only applied to the space available in the fileset, not in the space provided by the Object Store, making this a cost effective and simple offload solution.

When using the AFM2COS feature for archiving, a choice needs to be made which Object Store to use. This can be either a local S3 store like IBM Storage Ceph or IBM Scale S3 protocol service, or a cloud based object store like Google Cloud, IBM Cloud, or Azure. A common requirement we see is to have the archived data written to two separate locations. When using IBM Storage Protect or IBM Storage Archive this is solved internal to those products by replicating the data over two tape libraries/pools. To do this within Scale itself requires using the AFM2COS multi-site replication feature. This uploads a file to two different Object stores at the same time.

This blog demonstrates how to set up and use this feature.

To create this setup we need two object stores and a Scale cluster. We will set up a local Scale cluster as an object store which we will call the archive cluster and an IBM Cloud based S3 Object store. We then need to create a bucket in both stores with the same name, and accessible over the same port. We also need to create another Scale cluster that has the AFM2COS fileset that we will call the protocol cluster as it shares the fileset using NFS.

Step 1: Create a NFS protocol cluster and an S3 archive cluster

Download the latest Scale software for Linux (developer edition will work too) and extract.

We will build two single node clusters with a filesystem on the second disk called scalefs. We need a base IP for the cluster and a Virtual IP (VIP) for the protocol access. The base IP is whatever the node has configured. Both are in the same subnet.

We define the s3prot host for the NFS access, and the s3arch host for the S3 access with the following hosts table: (example)

192.168.1.101 s3prot

192.168.1.201 s3prot-vip

192.168.1.102 s3arch

192.168.1.202 s3arch-vip

Run the following commands on both nodes to install scale and deploy the protocol services:

yum -y install gcc gcc-c++ kernel-headers kernel-devel cpp binutils nfs-utils ethtool rpcbind psmisc iputils boost-regex elfutils-devel make numactl awscli

cd /usr/lpp/mmfs/5.*/ansible-toolkit

./spectrumscale setup -s <INSERT IP HERE>

./spectrumscale node add `hostname -s` -mqnapg

./spectrumscale nsd add "/dev/sdb" -p `hostname -s` -fs scalefs

./spectrumscale config gpfs -e 61000-62000

./spectrumscale callhome disable

./spectrumscale install -pr

./spectrumscale install

Next step is to configure and deploy the protocol services:

mkdir /ibm/scalefs/ces

./spectrumscale config protocols -f scalefs -m /ibm/scalefs/ces

./spectrumscale config protocols -e <INSERT VIP HERE>

./spectrumscale enable nfs s3

./spectrumscale deploy -pr

./spectrumscale deploy

Now for some post-install configuration:

mmchconfig autobuildGPL=yes

mmchnode --gateway -N `hostname -s`

/usr/lpp/mmfs/gui/cli/mkuser admin -g SecurityAdmin -p notahardpassword

mmuserauth service create --data-access-method file --type userdefined

We need a user account and location to put the S3 buckets in:

groupadd -g 10001 s3admin

useradd -r -u 10001 -g 10001 s3admin

mkdir -p /ibm/scalefs/buckets

chmod 1777 /ibm/scalefs/buckets

We'll use NFS to export the buckets directory on the protocol node, mount the NFS export on a client system of your choice:

mmnfs export add /ibm/scalefs/buckets -c "*(Access_Type=RW,Squash=no_root_squash)"

Next, on the Archive node, we configure the S3 service, but first we need to assign a new port for the GUI because it uses port 443 and runs on the same node:

mkdir /etc/scale-gui-configuration/

echo "445" > /etc/scale-gui-configuration/scale_gui_port

edit /usr/lpp/mmfs/gui/conf/gpfsgui.properties and change this line to:

GUI_HTTPS_PORT=445

systemctl restart gpfsgui

You can login as admin on the GUI at https://yourservername:445

We can now change the S3 port and create an S3 account:

mms3 config change ENDPOINT_SSL_PORT=443

mms3 account create s3admin --uid 10001 --gid 10001 --newBucketsPath /ibm/scalefs/buckets

  • access key: xxx
  • secret  key: yyyyyyyy

Copy the generated access and secret key somewhere safe! Next we can create a testbucket:

mms3 bucket create testbucket1 --accountName s3admin --filesystemPath /ibm/scalefs/buckets/testbucket1

This finishes the setup on both clusters. We have a functioning NFS and S3 object store on both.

Step 2: Configure an Object store in the IBM Cloud 

1.           Log in with you IBM ID at https://cloud.ibm.com

2.           Create a new instance at Infrastructure/Storage/Object Storage

3.           Create a new bucket; the name has to be unique. 

A screenshot of a computer

AI-generated content may be incorrect.

My bucket name: bucket-5l8zxstc3elov67

Next, get the public endpoint in the configuration tab:

A screenshot of a computer

AI-generated content may be incorrect.

My public endpoint: s3.eu-de.cloud-object-storage.appdomain.cloud

In your Cloud Object Storage instance, select the Service credentials tab and create a new credential and be sure to activate the HMAC switch:

A screenshot of a computer

AI-generated content may be incorrect.

A screenshot of a computer

AI-generated content may be incorrect.

Note the keys, and store them somewhere safe. My keys were: 

·        access key: 4f75cb403c9940c98c8d6ac115a25242

·        secret  key: fb16c837fccc4cc993af54584fbdd64aad2c593d0f999870

Now we can go back to the archive node, and create a bucket with the same name.

mms3 bucket create bucket-5l8zxstc3elov67 --accountName s3admin --filesystemPath /ibm/scalefs/buckets/bucket-5l8zxstc3elov67

This ends the preparation of the object stores.

Step 3: prepare the AFM fileset on the protocol cluster node

The first item to do is to define the keys to the buckets. We do this using the mmafmcoskeys command on the protocol node:

mmafmcoskeys bucket-name:endpoint set secret-key access-key

mafmcoskeys bucket-5l8zxstc3elov67:<local S3 VIP hostname> set s3admin-access-key s3admin-secret-key

mmafmcoskeys bucket-5l8zxstc3elov67:s3.eu-de.cloud-object-storage.appdomain.cloud set 4f75cb403c9940c98c8d6ac115a25242 fb16c837fccc4cc993af54584fbdd64aad2c593d0f999870

Because we are writing the object twice, a map is required which is defined using mmafmconfig:

mmafmconfig add map-name --multi-target-map endpoint1/gateway,endpoint2/gateway

mmafmconfig add localandcloud --multi-target-map <local S3 VIP hostname>/`hostname -s`, s3.eu-de.cloud-object-storage.appdomain.cloud/`hostname -s`

The endpoints are defined, next is to create and mount (link) the AFM fileset:

mmafmcosconfig filesystem-name --endpoint https://map-name --bucket bucket-name --mode mu --uid UID --gid GID --xattr --dir path-to-bucket-fileset

mmafmcosconfig scalefs bucket-5l8zxstc3elov67 --endpoint https://localandcloud --bucket bucket-5l8zxstc3elov67 --mode mu --uid 10001 --gid 10001 --xattr --dir /ibm/scalefs/buckets/bucket-5l8zxstc3elov67

mmlinkfileset scalefs bucket-5l8zxstc3elov67  -J /ibm/scalefs/buckets/bucket-5l8zxstc3elov67

Test the fileset by copying some data into the AFM fileset directory, either directly on the protocol node, or on your NFS client system:

cp /boot/initramfs-0-rescue-*.img /ibm/scalefs/buckets/bucket-5l8zxstc3elov67/testfile1

Because the AFM fileset is in Manual Update mode (mandatory in this configuration) we need to initiate the upload ourselves using mmafmcosctl:

mmafmcosctl filesystem-name bucket-name bucket-path upload --all

mmafmcosctl scalefs bucket-5l8zxstc3elov67 /ibm/scalefs/buckets/bucket-5l8zxstc3elov67 upload  --all

   Queued     Failed           TotalData

                             (approx in Bytes)

         1          0           124342502

Object Upload successfully queued at the gateway.

When checking the directory on the S3 archive cluster node the file should be visible in the bucket directory:

ls -al /ibm/`hostname -s`/buckets/bucket-5l8zxstc3elov67/

total 121433

drwxrwx---. 2 s3admin s3admin      4096 May 20 13:50 .

drwxrwxrwx. 4 root    root         4096 May 20 13:20 ..

-rw-rw----. 1 s3admin s3admin 124342502 May 20 13:50 testfile1

We can check the IBM Cloud bucket using the web GUI or by using the AWS client tool:

# aws configure

AWS Access Key ID [****************PhHu]: 4f75cb403c9940c98c8d6ac115a25242

AWS Secret Access Key [****************U+IV]: fb16c837fccc4cc993af54584fbdd64aad2c593d0f999870

Default region name [None]: eu-de

Default output format [None]:

# aws --endpoint https://s3.eu-de.cloud-object-storage.appdomain.cloud s3 ls s3://bucket-5l8zxstc3elov67

2025-05-20 13:52:42  124342502 testfile1

Step 4: Using the AFM2COS archiving fileset

We can manually push files from the AFM2COS fileset to the Object stores, and we can download them too. The mmafmcosctl command provides for both data and metadata download, uploads, evictions and reconciliation. For example, when a new cluster wants to connect to an existing object store, it can do so and only download the metadata. When the file is accessed, the data is pulled from the object store on demand.

mmafmcosctl scalefs bucket-5l8zxstc3elov67 /ibm/scalefs/buckets/bucket-5l8zxstc3elov67 download --all --uid 10001 --gid 10001 --metadata

One drawback of the manual update setting is that files are not evicted when the fileset exceeds its quota. This has to be done manually. For archive purposes this can make sense as it allows the S3 upload to be scheduled at a convenient time. It does mean files are not directly automatically uploaded and cannot have their data evicted. We can however use migration policies to make this happen.

Create a policy file called list1hour.pol with the following content:

RULE 'filesRule' LIST 'newfiles'

WHERE (CURRENT_TIMESTAMP - MODIFICATION_TIME < INTERVAL '1' HOURS)

Apply this policy to the fileset to generate a list of files, convert it to a usable format and use it to upload the files:

mmapplypolicy /ibm/scalefs/buckets/bucket-5l8zxstc3elov67 -P /root/list1hour.pol -I defer -f /root

cat /root/list.newfiles | awk -F '[ ]' '{ for(i=7; i<=NF; i++) printf "%s",$i (i==NF?ORS:OFS) }' > /root/list.objects

mmafmcosctl scalefs bucket-5l8zxstc3elov67 /ibm/scalefs/buckets/bucket-5l8zxstc3elov67 upload --object-list /root/list.objects

We could use this list of files to evict the files' data directly, or we can use a separate policy for that:

Create a policy file called listnotevicted.pol that searches for non offline files:

RULE 'filesRule' LIST 'notevicted'

WHERE (MISC_ATTRIBUTES NOT LIKE '%o%')

Apply this policy to the fileset to generate a list of files, convert it to a usable format and upload the files:

mmapplypolicy /ibm/scalefs/buckets/bucket-5l8zxstc3elov67 -P /root/listnotevicted.pol -I defer -f /root

cat /root/list.notevicted | awk -F '[ ]' '{ for(i=7; i<=NF; i++) printf "%s",$i (i==NF?ORS:OFS) }' > /root/list.notevicted.objects

mmafmcosctl scalefs bucket-5l8zxstc3elov67  /ibm/scalefs/buckets/ bucket-5l8zxstc3elov67 evict --object-list /root/list.notevicted.objects

The data part of files are not evicted if they are not uploaded first. These attempts will create an alert message but can be tried again later.

Documentation links

https://www.ibm.com/docs/en/storage-scale/5.2.3?topic=cacos-configuring-multi-site-replication-afm-cloud-object-storage

https://www.ibm.com/docs/en/storage-scale/5.2.3?topic=administering-afm-cloud-object-storage

https://www.ibm.com/docs/en/storage-scale/5.2.3?topic=aacos-evicting-data-objects-by-using-manual-updates-mode-afm-cloud-object-storage

0 comments
38 views

Permalink