File and Object Storage

 View Only

Create an ECE(Eraser Code Edition) cluster using KVM

By GUANG LEI LI posted Tue December 15, 2020 10:53 AM

  

Note: running ECE cluster using KVM nodes is only for test purpose and not officially supported.

Requirements:

  1. Each KVM node needs >10G RAM. My config is 16G RAM
  2. Minimal 4 ECE nodes(4+2p). But it can tolerate only one node failure. If any ECE node is down, it will enter critical rebuild phase which impacts performance a lot.
  3. Minimal totally 12 pdisks in the cluster. Each disk > 15G. e.g., 4+2p, 4 nodes, GNR meta data will take up ~180G. I have 12x20GB pdisks, but the FS has only 49G total space….
  4. Disk drives must be presented as SCSI pass through device in virtual machine.
    Each drive used in Recovery Group must assign a WWID that is unique in the cluster. You can check this by using the ls -l /dev/disk/by-id or lsscsi -i command on the virtual machine. 
    In KVM, you need to specify a cluster wide unique "Serial Number" so a unique WWID could be generated and you also need to select “writethrough” as cache mode


 And here is the output from my cluster, the last column is WWID which is generated using Serial Number.

Here are the installation steps:

1> Install GPFS RPMS and these ECE RPMS:
     gpfs.gnr, gpfs.gnr.base, gpfs.gnr.support-scaleout

 

2> Create the cluster: 
 # mmcrcluster -N crcluster.conf -r /usr/bin/ssh -R /usr/bin/scp -C kvm_ece
[root@kvm_ece_2 gpfs_rpms]# mmlscluster
 GPFS cluster information
========================
GPFS cluster name:        kvm_ece.localdomain
 GPFS cluster id:          15607050950619519741
GPFS UID domain:          kvm_ece.localdomain
Remote shell command:     /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type:           CCR

Node Daemon node name       IP address       Admin node name        Designation
----------------------------------------------------------------------------------
1   kvm_ece_1.localdomain  192.168.122.101  kvm_ece_1.localdomain  quorum-manager
2   kvm_ece_2.localdomain  192.168.122.102  kvm_ece_2.localdomain  quorum-manager
3   kvm_ece_3.localdomain  192.168.122.103  kvm_ece_3.localdomain  quorum-manager
4   kvm_ece_4.localdomain  192.168.122.104  kvm_ece_4.localdomain  quorum

 

3> Creating the mmvdisk node class

 

mmvdisk nodeclass create --node-class ECE01 -N kvm_ece_1,kvm_ece_2,kvm_ece_3,kvm_ece_4

 

4> Start GPFS
 mmstartup -a

 

5> Verifying recovery group server disk topologies 
[root@kvm_ece_1 gpfs_rpms]# mmvdisk server list --node-class ECE01 --disk-topology
node                                      needs    matching
number server                           attention   metric   disk topology
------ -------------------------------- ---------  --------  -------------
    1  kvm_ece_1.localdomain             no         100/100  ECE 3 HDD
   2  kvm_ece_2.localdomain             no          100/100  ECE 3 HDD
   3  kvm_ece_3.localdomain             no          100/100  ECE 3 HDD
   4  kvm_ece_4.localdomain             no          100/100  ECE 3 HDD

 

6> Configuring recovery group servers 
[root@kvm_ece_1 ]# mmvdisk server configure --nc ECE01 --recycle one
mmvdisk: Checking resources for specified nodes.
mmvdisk: Node class 'ECE01' has a scale-out recovery group disk topology.
mmvdisk: Using 'default.scale-out' RG configuration for topology 'ECE 3 HDD'.
mmvdisk: Setting configuration for node class 'ECE01'.
mmvdisk: Node class 'ECE01' is now configured to be recovery group servers.
mmvdisk: Restarting GPFS daemon on node 'kvm_ece_1.localdomain'.
mmvdisk: Restarting GPFS daemon on node 'kvm_ece_4.localdomain'.
mmvdisk: Restarting GPFS daemon on node 'kvm_ece_2.localdomain'.
mmvdisk: Restarting GPFS daemon on node 'kvm_ece_3.localdomain'.
[root@kvm_ece_1 gpfs_rpms]#

 
If it reports the following errors:

mmvdisk: Slot location is missing from pdisk n001p004 device(s) //ece-1/dev/sdf of declustered array DA1 in recovery group rg01 with hardware type Unknown.
mmvdisk: Slot location is missing from pdisk n001p005 device(s) //ece-1/dev/sdc of declustered array DA1 in recovery group rg01 with hardware type Unknown.
mmvdisk: Slot location is missing from pdisk n001p006 device(s) //ece-1/dev/sde of declustered array DA1 in recovery group rg01 with hardware type Unknown.


Then you can run this command as a workaround:(Please don't change this configuration on any ECE system in production.)
echo 999 | mmchconfig nsdRAIDStrictPdiskSlotLocation=0 -i


7> Creating recovery groups 

[root@kvm_ece_1 ~]# mmvdisk rg create --rg rg01 --nc ECE01



Starting from Spectrum Scale ECE 5.1.2, the loghome size increased from 2G to 32G. So you need a much bigger pdisk space or else RG creation will be hanging.
To workaround this, you can change the loghome size back to 2G:

[[root@ece-11 cst]# pwd
/usr/lpp/mmfs/data/cst
[root@ece-11 cst]# diff -u compSpec-scaleOut.stanza.ori compSpec-scaleOut.stanza
--- compSpec-scaleOut.stanza.ori 2021-12-12 21:15:17.450828233 -0500
+++ compSpec-scaleOut.stanza 2021-12-12 21:15:32.080710431 -0500
@@ -49,5 +49,5 @@
longTermEventLogSize=128m
shortTermEventLogSize=128m
fastWriteLogPct=75
- logHomeSize="root=2G user=32G"
+ logHomeSize="root=2G user=2G"

[root@ece-11 cst]#

Then broadcast compSpec-scaleOut.stanza to all ECE nodes of the RG being created.


8> Define one or more vdisk sets, and create the vdisk sets:

 I want to have a system pool with 20% of total space, and data pool of 80% total space:

# mmvdisk vs define --vdisk-set ece_meta --rg rg01 --code 4+2p --block-size 4M --set-size 20% --nsd-usage metadataOnly
# mmvdisk vs define --vdisk-set ece_data --rg rg01 --code 4+2p --block-size 4M --set-size 80% --nsd-usage dataOnly --storage-pool datapool
# mmvdisk vs create --vdisk-set all

 After both vdisk sets are create, it looks like:

If you meet such error when define the vdisk sets:

You can increase pagepool using mmvdisk command. Each KVM on my cluster has 16G RAM, and pagepool is 9G, and the error is gone after I increased pagepool to 10G for each ECE node:

 

9> Create the file system: 
# mmvdisk filesystem create --file-system fs01 --vdisk-set ece_meta,ece_data --mmcrfs -A yes -M 2 -m 2 -r 2 -R 2 -Q yes -T /fs01



0 comments
52 views

Permalink