You have been beyond helpful and I appreciate you being patient with me and answering all my questions. I now a fully functioning /scratch volume again. Adding '-v no' worked and the volume mounted without any issues. Consider this case closed.
I am so grateful.
Original Message:
Sent: Mon May 20, 2024 04:49 AM
From: José Pina Coelho
Subject: Help removing disk from NSD volume
Apparently the news that the filesystem was removed hasn't reached the disks... had something similar on a 16 node cluster 20 years ago (things moved slower).
Use '-v no', but first verify that those NSDs aren't associated with another filesystem (mmlsnsd).
Ref: https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=reference-mmcrfs-command
------------------------------
José Pina Coelho
IT Specialist at Kyndryl
Original Message:
Sent: Fri May 17, 2024 04:49 PM
From: Joseph Koral
Subject: Help removing disk from NSD volume
When I tried to remove the file system, I did get an error. Then I did it with a -p and I thought it removed it because I cannot see it anymore. I then tried to create the file system and received an error that the NSDs still belong to a file system. Also, we don't replicate the scratch volume because we want it to be faster than home and data.
Here is all of my output:
[root@snode1 ~]# mmdelfs gpfs_scratch
Too many disks are unavailable.
Some file system data are inaccessible at this time.
Check error log for additional information.
Too many disks are unavailable.
Some file system data are inaccessible at this time.
mmdelfs: tsdelfs failed.
mmdelfs: Command failed. Examine previous error messages to determine cause.
[root@snode1 ~]# mmdelfs gpfs_scratch -p
Too many disks are unavailable.
Some file system data are inaccessible at this time.
Check error log for additional information.
Too many disks are unavailable.
Some file system data are inaccessible at this time.
mmdelfs: Attention: Not all disks were marked as available.
mmdelfs: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@snode1 ~]# mmlsdisk gpfs_scratch
mmlsdisk: File system gpfs_scratch is not known to the GPFS cluster.
mmlsdisk: Command failed. Examine previous error messages to determine cause.
[root@snode1 ~]# mmcrfs gpfs_scratch "gd1_12;gd2_12;gd3_12;gd4_12;gd5_12" -A yes -m 1 -M 3 -r 1 -R 3 -T /scratch
Disk gd3_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:37 2024.
Disk gd4_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:39 2024.
Disk gd5_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:43 2024.
Disk gd2_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:42 2024.
Disk gd1_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:40 2024.
Error accessing disks.
mmcrfs: tscrfs failed. Cannot create gpfs_scratch
mmcrfs: Command failed. Examine previous error messages to determine cause.
Is there a way to force the removal?
------------------------------
Joseph Koral
Original Message:
Sent: Fri May 17, 2024 05:17 AM
From: José Pina Coelho
Subject: Help removing disk from NSD volume
Before removing the filesystem, take note of the current values on the mmlsfs output, as some of them might be important for your case.
- mmcrfs scratch "gd1_12;gd2_12;gd3_12;gd4_12;gd5_12" -A yes -m 2 -M 3 -r 1 -R 3 -T /scratch
- (M/R) -> Maximum metadata/data replicas. (Can't be changed after creation)
- (m/r) -> Desired number of metadata/data replicas. -m2 will preserve the filesystem structure even in the case of a disk failure. You may choose -r2 if you want to preserve filesystem contents in the case of a disk failure, but it will consume twice the space.
------------------------------
José Pina Coelho
IT Specialist at Kyndryl
Original Message:
Sent: Thu May 16, 2024 04:28 PM
From: Joseph Koral
Subject: Help removing disk from NSD volume
Thank you for your response. It does not have two copies as I am getting an error message stating "Too many disks are unavailble" Sounds like I should just delete the file system and re-create without gd6_12, correct?
Do you know what the right command would be to create the new file system using the 5 disks that are available?
Thank you so much for your help so far.
------------------------------
Joseph Koral
Original Message:
Sent: Thu May 16, 2024 04:15 AM
From: José Pina Coelho
Subject: Help removing disk from NSD volume
First you need to make sure that the filesystem can be recovered with that disk down, "mmlsfs gpfs_scratch" will tell you if it has two copies of data and metadata.
If it does have two copies, "mmrestripefs -r gpfs_scratch" should reallocate copies of all the data on gd_12 to other NSDs. After the restripe, "mmlsdisk gpfs_scratch" should show that disk as "empty" instead of "to be emptied".
If it doesn't have two copies, you'll lose the data in that volume.
Note 1: You did mention it being a scratch volume, so the alternative is to delete the filesystem (mmdelfs), then re-create it with the (mmcrfs).
Note 2: Even if you're not replicating the data, you should create the filesystem with metadata replication, that way if you lose an NSD, you just lose files, not the filesystem.
PS: From the mmdeldisk man page:
- If the disk is permanently damaged and the file system is not replicated, or if the mmdeldisk command repeatedly fails, see the GPFS: Problem Determination Guide and search for Disk media failure.
------------------------------
José Pina Coelho
IT Specialist at Kyndryl
Original Message:
Sent: Wed May 15, 2024 03:20 PM
From: Joseph Koral
Subject: Help removing disk from NSD volume
I have a disk that is down on my scratch volume and I would just like to remove it and then build the volume again from scratch. This cluster is a bit older, so I am just trying to get by until we purchase a new one. Below is what I am seeing on the scratch volume and it is currently not mounting anywhere. I would like to remove gd6_12. This system was built by someone that is no longer available and I am fairly new using GPFS. Thank you in advance for any help you can provide.
[root@snode1 ~]# mmlsdisk gpfs_scratch
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
gd1_12 nsd 512 -1 Yes Yes ready up system
gd2_12 nsd 512 -1 Yes Yes ready up system
gd3_12 nsd 512 -1 Yes Yes ready up system
gd4_12 nsd 512 -1 Yes Yes ready up system
gd5_12 nsd 512 -1 Yes Yes ready up system
gd6_12 nsd 512 -1 Yes Yes to be emptied down system
Attention: Due to an earlier configuration change the file system
may contain data that is at risk of being lost.
------------------------------
Joseph Koral
------------------------------