File and Object Storage

Finalizing Spectrum Scale Updates

By MAARTEN KREUGER posted 26 days ago

  

After upgrading Spectrum Scale the procedure urges you to upgrade both the cluster and filesystem level but warns you that you cannot revert to the old level if you do. I often see very old cluster and filesystem levels in the field, so why not complete the update? The changes can be done online, but I think there is hesitancy to do so as it currently "works" and benefits are not really clear.

so why update the cluster version? the answer is of course in the documentation, but in short, it enables features like new ciphers or new log file formats. The filesystem level enables features like Fastcreate on AFM DR or AFM to COS, a full list is of course in the documentation.

Finalizing the upgrade within a cluster can only be done after all cluster nodes have been upgraded. See the documentation for all details. Remote clusters are different as they cannot participate in cluster management, so they only need NSD and metadata node access, which is granted by backwards compatibility, provided the remote cluster is not too far behind, see the FAQ Q2.8. Filesystem access is stricter for remote clusters, as there is no forward compatibility when accessing a filesystem with a level higher than is known to the remote cluster.

Not every patch update will have a new cluster version or even a new filesystem version. Each minor and of course major release will do, and between major release we do need to observe the N-1 compatibility:

Spectrum Scale compatibility

 

In the examples below we have two clusters, we start with gpfs 5.0.5.10 installed on all nodes and we are going to upgrade to version 5.1.1.4 so we can use the AFM to COS feature.

cluster1: remnode with remotely mounted filesystem from cluster2 /scale2

cluster2: scalenode-1,scalenode-2 with filesystem /scale2

mmlsfs scale2 -V                  23.00 (5.0.5.0)         

mmlsconfig minReleaseLevel -V     23.00 (5.0.5.1)         

           

The first step is to upgrade scalenode-1 on cluster2 to version  5.1.1.4:

scalenode-1: /usr/lpp/mmfs/5.1.1.4/tools/repo/local-repo --repo && yum -y  update gpfs.*  &&  /usr/lpp/mmfs/5.1.1.4/tools/repo/local-repo --clean && mmbuildgpl && mmstartup                                
scalenode-1:  mmchconfig release=latest     
Verifying that all nodes in the cluster are up-to-date ...                        
mmchconfig: The following nodes must be upgraded to GPFS release 5.1.1.0 or higher:                        
scalenode-2                         
mmchconfig: Command completed: Not all required changes were made.

We're not allowed to finalize the upgrade yet, we still need to update scalenode-2:

scalenode-2: /usr/lpp/mmfs/5.1.1.4/tools/repo/local-repo --repo && yum -y  update gpfs.*  &&  /usr/lpp/mmfs/5.1.1.4/tools/repo/local-repo --clean && mmbuildgpl && mmstartup                                            
scalenode-1:  mmchconfig release=latest                                  
Verifying that all nodes in the cluster are up-to-date ...                        
mmchconfig: Command successfully completed                                  

 The cluster version has now been upgraded successfully:                     

mmlsfs scale2 -V                  23.00 (5.0.5.0)         

mmlsconfig minReleaseLevel -V     25.00 (5.1.1.0)         

Next step in finalizing is to upgrade the filesystem version, so we can use the new AFM to COS functionality. When updating the filesystem version, we have the option for "full" or "compat" update. When looking at the documentation:

mmchfs scale2 -V full|compat                                

       full                                  
Enables all new functionality that requires different on-disk data structures. Nodes in remote clusters that are running an earlier version of IBM Spectrum Scale will no longer be able to mount the file system.
With this option
the command fails if it is issued while any node that has the file system mounted is running an earlier version of IBM Spectrum Scale.                            
                           
         compat                        
Enables only backward-compatible format changes. Nodes in remote clusters that were able to mount the file system before the format  changes can continue to do so afterward.                          

 This does not bode well for the AFM COS functionality, as we have a remote cluster running 5.0.5, so we can only do compat, but let's try anyway: 

scalenode-1: mmchfs scale2 -V compat                          
You have requested that the file system version for local access be upgraded to version 25.00 (5.1.1.0). This will enable some new functionality but will prevent local nodes from using the file system with earlier releases of GPFS.  Remote nodes are not affected by this change.                           

Do you want to continue? yes                              
Successfully upgraded file system format version to 25.00 (5.1.1.0).                       

mmlsfs scale2 -V                   25.00 (5.1.1.0)        

mmlsconfig minReleaseLevel -V      25.00 (5.1.1.0) Current local access file system version                              
                                   23.00 (5.0.5.0) Current remote access file system version
                                   23.00 (5.0.5.0) Original file system version                                  
So the local filesystem level is 5.1.1.0, which is what we need to enable AFM COS, but only remote access is done using the old version. Let's try to configure AFM COS:

scalenode-1: mmafmcosconfig scale2 cos-scale2 --endpoint https://s3.eu-de.cloud-object-storage.appdomain.cloud:443 --bucket scale51 --mode iw                           
mmafmcosconfig: AFM object filesets cannot be created for file system scale2 because the file system version is less than 24.00 (5.1.0.0).                               

 So compat does indeed not enable the new feature, let's upgrade to "full" and get this done:            

scalenode-1: mmchfs scale2 -V full  
mmchfs:  Attention:  One or more remote clusters are granted access to the file system. After the file system is migrated, remote nodes that are running earlier GPFS versions will not be able to mount the file system any more.

File system version 25.00 is not supported on the following 1 remote nodes mounting the file system:
172.17.146.5    remnode                          cluster1.remnode         23.10
mmchfs: Command failed. Examine previous error messages to determine cause.                 

So that did not work. We'll have to upgrade the remote node before we can use AFM to COS or use a different filesystem.

remnode: /usr/lpp/mmfs/5.1.1.4/tools/repo/local-repo --repo && yum -y  update gpfs.*  &&  /usr/lpp/mmfs/5.1.1.4/tools/repo/local-repo --clean && mmbuildgpl && mmstartup
remnode: mmchconfig -V latest                            
scalenode-1: mmchfs scale2 -V full                                 
mmchfs:  Attention:  One or more remote clusters are granted access to the file system. After the file system is migrated, remote nodes that are running earlier GPFS versions will not be able to mount the file system any more.        
You have requested that the file system be upgraded to                                
version 25.00 (5.1.1.0). This will enable new functionality but will                             
prevent you from using the file system with earlier releases of GPFS.                                 
Do you want to continue? yes                              

Successfully upgraded file system format version to 25.00 (5.1.1.0).                       

Now with both local and remote clusters upgraded to version 5.1.1.4 and finalized the upgrade we are all ready.

mmlsfs scale2 -V                  25.00 (5.1.1.0)         

mmlsconfig minReleaseLevel -V     25.00 (5.1.1.0)           

Let's try to configure AFM COS again:

scalenode-1: mmafmcosconfig scale2 cos-scale2 --endpoint https://s3.eu-de.cloud-object-storage.appdomain.cloud:443 --bucket scale51 --mode iw                                

scalenode-1: mmlsfileset scale2 cos-scale2
Filesets in file system 'scale2':
Name                     Status    Path                                   
cos-scale2               Linked    /scale2/cos-scale2                               

So, upgrading filesystem levels can run into issues enabling functionality, but spectrum scale protects you by checking levels both local and remote.                            

But what if we want to add nodes to the cluster that have an old level? Let's see what happens when we have a new node that is running 5.0.5, and we try to add it to our 5.1.1 cluster:  

scalenode-1: mmaddnode newnode        
Tue Nov  9 10:39:53 CET 2021: mmaddnode: Processing node newnode 
mmremote: Incorrect keyword: checkNewClusterNode510     
mmremote:incompatibleGPFScode::3::2310:5.0.5.10:Linux:1:1:    
mmremote: Command failed. Examine previous error messages to determine cause
mmaddnode: The level of GPFS on node newnode does not support the requested action.
mmaddnode: mmaddnode quitting.  None of the specified nodes are valid.
mmaddnode: Command failed. Examine previous error messages to determine cause.

As newnode does not satify the minRelease level, it may not join. So, let's just create a new cluster on the new 5.0.5 node and do multicluster instead.            

newnode: mmremotecluster add cluster2.scalenode-1 -n scalenode-1,scalenode-2 -k id_rsa.pub.scale51                               
mmremotecluster: Command successfully completed                         
mmremotecluster: mmsdrfs propagation completed.                           

The join succeeds, as we have a version that is N-1 to the remote cluster, and the minReleaseLevel is not relevant for remote nodes.                                   

newnode: mmremotefs add scale2 -f scale2 -C cluster2.scalenode-1 -T /scale2
mmremotefs: mmsdrfs propagation completed.                        

The filesystem definition works as well, even though the filesystem level is too high. But alas the mount fails as the filesystem cannot be read correctly:    

newnode: mmmount scale2                                   
Tue Nov  9 10:28:52 CET 2021: mmmount: Mounting file systems ...                      
mount: /scale2: mount(2) system call failed: Wrong medium type.                            
mmmount: Command failed. Examine previous error messages to determine cause.                   

The remote cluster can access the NSDs fine as this is backwards compatible but the filesystem layout is not recognized, so this ends in an error.      

If you know beforehand that you will need to mount a filesystem on a lower level cluster, you can create the filesystem with a lower level using the the --version parameter. 

0 comments
12 views

Permalink