Updating Spectrum Scale
We'll kick this blog off by updating Spectrum Scale, which requires new software, as usual. Normally an IBM customer would download this from the IBM Fix Central website, but you're not a paying customer, so no access is granted for you. However, you can download a new version from the same place you downloaded the original code:
https://www.ibm.com/products/spectrum-scale
Download, unzip, and run the self-extracting archive. Ignore all the information on the installation Toolkit, we go direct! Your new code is now in /usr/lpp/mmfs/<code version>.
First step is to create the apt sources for Spectrum Scale (or yum repos if on RHEL/SLES):
NB: There is a bug in this script for Ubuntu in v5.1.1.0 (Sorry, it will be fixed, really, I've been promised!)
change line 142 in /usr/lpp/mmfs/5.1.1.0/tools/repo/local-repo;
from: osVersion = linux_dist[1][:2]
to: osVersion = ""
# /usr/lpp/mmfs/5.1.1.0/tools/repo/local-repo --clean
# apt clean
# /usr/lpp/mmfs/5.1.1.0/tools/repo/local-repo --repo
# apt -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true update
# mmshutdown
Shutting down the following quorum nodes will cause the cluster to lose quorum:
scalenode1
Do you wish to continue [yes/no]: yes
vr 11 jun 2021 13:53:06 CEST: mmshutdown: Starting force unmount of GPFS file system
# apt --only-upgrade install gpfs*
...
The following packages will be upgraded:
gpfs.base gpfs.compression gpfs.docs gpfs.gpl gpfs.gskit gpfs.gss.pmcollector gpfs.gss.pmsensors gpfs.gui gpfs.java gpfs.license.dev gpfs.nfs-ganesha gpfs.nfs-ganesha-dbgsym gpfs.nfs-ganesha-doc gpfs.nfs-ganesha-gpfs gpfs.smb gpfs.smb-dbg
16 upgraded, 0 newly installed, 0 to remove and 221 not upgraded.
Need to get 0 B/215 MB of archives.
After this operation, 2.160 kB of additional disk space will be used.
Do you want to continue? [Y/n]
...
# /usr/lpp/mmfs/5.1.1.0/tools/repo/local-repo --clean
The easiest way to start GPFS now is to run mmstartup, but, as we did a shutdown of the previous version earlier, the kernel extensions may still be in memory, so the new ones may not load. You can do either:
# mmbuildgpl
# mmstartup
# mmgetstate
# tail /var/mmfs/gen/mmfslog
- OR -
# mmbuildgpl
# reboot
Running mmbuildgpl is important, as this validates if your current kernel works with your version of Spectrum Scale. If it doesn't you will see compilation errors, and need to downgrade your kernel. See the first episode of this blog for instructions on that.
Spectrum Scale will start automatically after the reboot.
Trouble Shooting
So, what can go wrong? I have a few common issues lined up here:
- IP address change because of DHCP
- Temporary loss of a GPFS disk device
- Permanent loss of a GPFS disk device
- Loss of the system disk
IP address change because of DHCP
If this happens to your system, first of all, servers need a fixed IP! To fix this you will need to change the IP back to the original, and then reboot the system, that will fix it. That does not mean you cannot change the IP address, but you need to take some extra steps to make sure communication is not stopped using IP aliases. The same thing with hostname changes:
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=topics-changing-ip-addresses-host-names-cluster-nodes
Temporary loss of a GPFS disk device
When a disk is external and experiences a power failure or disconnect, GPFS loses access to the NSD device. Each filesystem has a qourum, and you always need a majority of the voting disks to be online for this filesystem quorum to be achieved, otherwise the filesystem is brought down by GPFS. With mmlsdisk you can see the voting disks marked with "desc":
# mmlsdisk nas1 -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ ---------
ssd1 nsd 512 1 yes yes ready up 2 system desc
sata1 nsd 512 1 no yes ready up 3 sata desc
Number of quorum disks: 2
Read quorum value: 2
Write quorum value: 2
As an example, I can power off the disk called sata1, as this is an external USB drive. Mind you, this drive does not hold any metadata, only data. Only the system pool holds metadata, and if you would lose that disk, the filesystem would definitely not be available. If the disk holding your metadata is powered off, your best bet is to do a reboot of the system after you fixed it.
With data-only pools it's different; If you would try to create a file in storage pool "sata" you would get an I/O error, same as when you try to read a file that is on the missing disk. After a while the disk may be marked down. To get it to work again, plug it back in, and run:
# mmchdisk nas1 start -a
mmnsddiscover: Attempting to rediscover the disks. This may take a while ...
mmnsddiscover: Finished.
scalenode1: Rediscovered nsd server access to ssd1.
scalenode1: Rediscovered nsd server access to sata1.
If you tried to create files while the sata1 disk was down, they may be ill-placed, you can fix this with mmrestripefs (which may really take a while)
# mmrestripefs nas1 -p
Scanning file system metadata, phase 1 ...
...
100.00 % complete on Fri Jun 11 15:38:08 2021 ( 2906112 inodes with total 3614 MB data processed)
Scan completed successfully.
Losing a disk permanently is another matter.
If you lose the system pool disk, and you have not replicated the metadata and data, you will have to rebuild from scratch and restore from backup. (You have made a backup, have you?)
A data disk loss, if not replicated, means loss of all files in that pool, as GPFS stripes data across all disks in a pool. The metadata is still there, so all files seem OK until you access them, then they return I/O errors. You will have to delete all affected files and restore them from backup. You can run a deletion policy script to do this.
You can create this policy using the GUI, but as it's really simple, we'll use command line, and test it before we really run it:
# cat > delete_sata.pol << EOF
RULE 'delete_sata' DELETE FROM POOL 'sata'
EOF
# mmapplypolicy nas1 -I test -P delete_sata.pol -L 2
...
WEIGHT(inf) DELETE /nas1/file1 SHOW()
WEIGHT(inf) DELETE /nas1/file2 SHOW()
WEIGHT(inf) DELETE /nas1/test2.zip SHOW()
WEIGHT(inf) DELETE /nas1/Documents/test.zip SHOW()
# mmapplypolicy nas1 -I yes -P delete_sata.pol -L 2
...
[I] A total of 4 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
0 'skipped' files and/or errors.
Operating system disk failure
Lastly, what would happen if your linux OS disk failed, or you accidently deleted too many files? If you have not made a backup with something like "Relax And Recover" (https://relax-and-recover.org) you’ll have to install from scratch and restore the configuration files. Sounds difficult, but luckily Spectrum Scale uses only a few locations:
- /usr/lpp/mmfs : contains from install files
- /opt/IBM/zimon : Performance monitoring files
- /var/lib/postgresql: Event and GUI database
- /var/mmfs : contains configuration files
- /var/adm/ras : contains log files
If your system is the only node in the cluster, you can just archive the contents of /var/mmfs and restore it on a newly built system with the same name, IP address, SSH host key, and the GPFS binaries installed, and then reboot. The cluster repository (CCR) for the NFS protocol access and configuration file /var/mmfs/gen/mmsdrfs should contain all the cluster needs to start again. you will need to reconfigure the gui.
If you have more nodes in the cluster, the configuration survives on other nodes.The failed system never officially left the cluster so the config files can be copied back with the mmsdrrestore command.
Next blog, we'll discuss tuning. some analysis commands, and quotas: part 8: Tuning and Quotas