IBM Spectrum Scale is an enterprise Software Defined Storage product which means it is scalable, high performance, full featured, complex, and never seen in the home environment. But since last year there is a special version of Spectrum Scale which is available free for home or development use and contains all features but is capped at 12 TiB of storage.
So, why would you want to run an enterprise product like Spectrum Scale at home? Just use a Synology, QNAP, or some special Linux distribution like FreeNAS. But where is the fun in that? If you want total control over where your data is stored, and use enterprise level replication and cloud access features, this is your chance. Download at: https://www.ibm.com/products/spectrum-scale
You can install the software on any x86_64 Linux based system, but some distributions are better supported than others. RHEL/CentOS works best, followed by SLES, then Ubuntu. Some features do not work with some OS's, some only on certain levels of Spectrum Scale or the OS. It's complicated. Check table 13 and 17 at Q2.1 : https://www.ibm.com/docs/en/spectrum-scale?topic=STXKQY/gpfsclustersfaq.html
My test system is a simple NUC Running Ubuntu 20.04 LTS with an Intel Celeron and 4GB of RAM, built-in SSD and USB attached SATA drive. More RAM/CPU is always welcome, depending on how many and which features you're using. For production use we recommend at least 64GB RAM, or 256GB if you run a lot of services. Also, Spectrum Scale is a clustered filesystem so you can have like a thousand systems in your cluster, We'll just use one to start with.
Let's jump straight into the installation procedure, Unzip and extract the RPM+Debian Packages:
# unzip Spectrum_Scale_Developer-22.214.171.124-x86_64-Linux.zip
# chmod +x Spectrum_Scale_Developer-126.96.36.199-x86_64-Linux-install
This unpacks the software RPMs/Debs into /usr/lpp/mmfs/<version>. Why /usr/lpp/mmfs/ and not /opt? That's because Spectrum Scale was originally developed for use on AIX in the nineties, and that's where it went. So. Tradition.
You can look up the full installation instructions in the documentation: https://www.ibm.com/docs/en/spectrum-scale/5.1.0?topic=installing
Or follow along with my steps. There are two ways to install, manual or automatic. If you have a lot of systems to install automatic is really nice, but we'll create a singleton cluster so manual it is.
First step is to create the apt sources for Spectrum Scale (or yum repos if on RHEL/SLES):
NB: There is bug in this script in v188.8.131.52 (Sorry) change line 142;
from: osVersion = linux_dist[:2]
to: osVersion = ""
# /usr/lpp/mmfs/184.108.40.206/tools/repo/local-repo --repo
Creating repo: /etc/apt/sources.list.d/ganesha.list
Creating repo: /etc/apt/sources.list.d/gpfs.list
Creating repo: /etc/apt/sources.list.d/object.list
Creating repo: /etc/apt/sources.list.d/smb.list
Creating repo: /etc/apt/sources.list.d/zimon.list
Creating repo: /etc/apt/sources.list.d/gpfs2.list
As these repositories are not signed, we need to enable unsigned repos:
# apt -o Acquire::AllowInsecureRepositories=true \
-o Acquire::AllowDowngradeToInsecureRepositories=true update
We can then install Spectrum Scale. Any dependencies should be resolved automatically with your standard repos. Updating works the same way.
# apt install gpfs.base gpfs.gpl gpfs.gskit gpfs.license.dev gpfs.docs gpfs.afm.cos gpfs.compression gpfs.gui gpfs.protocols-support gpfs.pmsensors gpfs.pmcollector gpfs.nfs-ganesha\* gpfs.smb* gpfs.pm-ganesha*
Before we start building the cluster, we need to prepare the system.
First, add Spectrum Scale to the PATH:
# echo "export PATH=/usr/lpp/mmfs/bin:\$PATH" > /etc/profile.d/gpfs.sh
# source /etc/profile.d/gpfs.sh
Next we'll manually build the kernel extension for GPFS to test if it works. Perhaps a C-compiler is not installed, or there is a kernel problem that needs fixing. We'll make this process automatic at start time later.
Next step is to get the pre-requisites in order:
- Make the IP address you're using static or fix the assignment in your DHCP server. (really important, do not skip this step!)
- NTP time synchronization, check with timedatectl
- DNS/hosts,check with ping `hostname` and/or host `hostname`
- I'm adding two entries to /etc/hosts, one static for GPFS, one floating for NAS access:
- 192.168.178.199 scalenode1
- 192.168.178.200 nas1
- Firewall, either disable or configure correctly
- Ubuntu: ufw disable
- RHEL/SLES: systemctl disable firewalld --now
- SSH permissions for issuing commands as root to all my cluster nodes:
- ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa
- cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
- ssh -o StrictHostKeyChecking=no `hostname` date
Now we're ready to create the Spectrum Scale cluster:
# mmcrcluster -N `hostname`:manager-quorum -A
# mmchlicense server --accept -N all
# mmchconfig autoBuildGPL=yes
Check the state of the node, it may be in "arbitrating" mode for a while, it should go "active" automatically. If not, check your firewall, or check the logfile /var/mmfs/gen/mmfslog.
Node number Node name GPFS state
1 scalenode1 active
Now the cluster is running, we can create a filesystem. For this we need block devices, these can be all kinds of full devices (iSCSI, USB, SAS, SAN, NVMe) or partitions on those devices. Run lsblk to get a list.
I have an internal MMC device which is a bit of an issue, as Spectrum Scale only looks for "regular" devices, check your devices with: mmdevdiscover. As my device is not listed, I'll need to add it using a custom script:
# cat > /var/mmfs/etc/nsddevices <<EOF
cat /proc/partitions | grep -v loop | grep '[0-9]' | while read x x x part
echo \$part generic
# chmod +x /var/mmfs/etc/nsddevices
Now we can define the partition as an NSD (Network Shared Disk) which we do with a stanza file:
# cat > local.nsd << EOF
We name the NSD "mmc", and specify the partition. The server option is the list of servers that have direct access to this device, which is just this system, you need iSCSI or a FC-SAN to have shared acccess from multiple systems. The usage is default, we'll put both data (file data) and metadata (directories, inodes, structures, logs) on this device. The failureGroup is 1, as this is our first and only server. This value guides the data replication feature of GPFS to place copies on multiple systems. The pool is the default "system" pool which is mandatory for putting metadata in.
# mmcrnsd -F local.nsd
# mmlsnsd -M
Disk name NSD volume ID Device Node name Remarks
mm C0A8B2C760746935 /dev/mmcblk1p2 scalenode1 server node
The NSD is now created, which means an NSD Identification number is written to the partition, and the device is registered in the cluster administration. Next job is to create a filesystem using this NSD.
We'll build a default file system, with Automount, nfs4 ACLs enabled, default replicas set to 1 for data and metadata, and to 3 as a maximum. The mountpoint is set to /nas1, which neatly matches the special device name. You can change these later if you want, but not the maximum replica settings.
# mmcrfs nas1 -F local.nsd -A yes -k nfs4 -r 1 -R 3 -m 1 -M 3 -T /nas1
The file system is now ready! We just need to mount it:
# mmmount nas1
# mmlsdisk nas1
disk driver sector failure holds holds storage
name type size group metadata data status avail pool
------------ -------- ------ -------- -------- ----- ------- ------ -------
mmc nsd 512 1 yes yes ready up system
# df -h /nas1
Filesystem Size Used Avail Use% Mounted on
nas1 29G 1,4G 28G 5% /nas1
Major changes like adding or removing disks or nodes can be done online via the command line. More user-oriented actions like adding an NFS or SMB export, creating snapshots, or setting file management policies can also be done using the GUI.
Stopping and starting the cluster is done using the following commmands:
The next blog will show the creation of an NFS share.