Help removing disk from NSD volume

View Only

Expand all | Collapse all

1. Help removing disk from NSD volume

0 Like
Joseph Koral
Posted Wed May 15, 2024 05:20 PM

Reply
I have a disk that is down on my scratch volume and I would just like to remove it and then build the volume again from scratch. This cluster is a bit older, so I am just trying to get by until we purchase a new one. Below is what I am seeing on the scratch volume and it is currently not mounting anywhere. I would like to remove gd6_12. This system was built by someone that is no longer available and I am fairly new using GPFS. Thank you in advance for any help you can provide.

[root@snode1 ~]# mmlsdisk gpfs_scratch
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
gd1_12 nsd 512 -1 Yes Yes ready up system
gd2_12 nsd 512 -1 Yes Yes ready up system
gd3_12 nsd 512 -1 Yes Yes ready up system
gd4_12 nsd 512 -1 Yes Yes ready up system
gd5_12 nsd 512 -1 Yes Yes ready up system
gd6_12 nsd 512 -1 Yes Yes to be emptied down system
Attention: Due to an earlier configuration change the file system
may contain data that is at risk of being lost.

------------------------------
Joseph Koral
------------------------------
2. RE: Help removing disk from NSD volume

0 Like
IBM Champion

José Pina Coelho
Posted Thu May 16, 2024 04:16 AM
Edited by José Pina Coelho Thu May 16, 2024 04:17 AM

Reply
First you need to make sure that the filesystem can be recovered with that disk down, "mmlsfs gpfs_scratch" will tell you if it has two copies of data and metadata.

If it does have two copies, "mmrestripefs -r gpfs_scratch" should reallocate copies of all the data on gd_12 to other NSDs. After the restripe, "mmlsdisk gpfs_scratch" should show that disk as "empty" instead of "to be emptied".

If it doesn't have two copies, you'll lose the data in that volume.

Note 1: You did mention it being a scratch volume, so the alternative is to delete the filesystem (mmdelfs), then re-create it with the (mmcrfs).

Note 2: Even if you're not replicating the data, you should create the filesystem with metadata replication, that way if you lose an NSD, you just lose files, not the filesystem.

PS: From the mmdeldisk man page:

If the disk is permanently damaged and the file system is not replicated, or if the mmdeldisk command repeatedly fails, see the GPFS: Problem Determination Guide and search for Disk media failure.

------------------------------
José Pina Coelho
IT Specialist at Kyndryl
------------------------------

Original Message
3. RE: Help removing disk from NSD volume

0 Like
Joseph Koral
Posted Thu May 16, 2024 04:29 PM

Reply
Thank you for your response. It does not have two copies as I am getting an error message stating "Too many disks are unavailble" Sounds like I should just delete the file system and re-create without gd6_12, correct?

Do you know what the right command would be to create the new file system using the 5 disks that are available?

Thank you so much for your help so far.

------------------------------
Joseph Koral
------------------------------

Original Message
4. RE: Help removing disk from NSD volume

0 Like
IBM Champion

José Pina Coelho
Posted Fri May 17, 2024 05:17 AM

Reply
Before removing the filesystem, take note of the current values on the mmlsfs output, as some of them might be important for your case.

mmcrfs scratch "gd1_12;gd2_12;gd3_12;gd4_12;gd5_12" -A yes -m 2 -M 3 -r 1 -R 3 -T /scratch

(M/R) -> Maximum metadata/data replicas. (Can't be changed after creation)

(m/r) -> Desired number of metadata/data replicas. -m2 will preserve the filesystem structure even in the case of a disk failure. You may choose -r2 if you want to preserve filesystem contents in the case of a disk failure, but it will consume twice the space.

------------------------------
José Pina Coelho
IT Specialist at Kyndryl
------------------------------

Original Message
5. RE: Help removing disk from NSD volume

0 Like
Joseph Koral
Posted Fri May 17, 2024 04:49 PM
Edited by Joseph Koral Fri May 17, 2024 04:50 PM

Reply
When I tried to remove the file system, I did get an error. Then I did it with a -p and I thought it removed it because I cannot see it anymore. I then tried to create the file system and received an error that the NSDs still belong to a file system. Also, we don't replicate the scratch volume because we want it to be faster than home and data.

Here is all of my output:

[root@snode1 ~]# mmdelfs gpfs_scratch
Too many disks are unavailable.
Some file system data are inaccessible at this time.
Check error log for additional information.
Too many disks are unavailable.
Some file system data are inaccessible at this time.
mmdelfs: tsdelfs failed.
mmdelfs: Command failed. Examine previous error messages to determine cause.

[root@snode1 ~]# mmdelfs gpfs_scratch -p
Too many disks are unavailable.
Some file system data are inaccessible at this time.
Check error log for additional information.
Too many disks are unavailable.
Some file system data are inaccessible at this time.
mmdelfs: Attention: Not all disks were marked as available.
mmdelfs: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

[root@snode1 ~]# mmlsdisk gpfs_scratch
mmlsdisk: File system gpfs_scratch is not known to the GPFS cluster.
mmlsdisk: Command failed. Examine previous error messages to determine cause.

[root@snode1 ~]# mmcrfs gpfs_scratch "gd1_12;gd2_12;gd3_12;gd4_12;gd5_12" -A yes -m 1 -M 3 -r 1 -R 3 -T /scratch
Disk gd3_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:37 2024.
Disk gd4_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:39 2024.
Disk gd5_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:43 2024.
Disk gd2_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:42 2024.
Disk gd1_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:40 2024.
Error accessing disks.
mmcrfs: tscrfs failed. Cannot create gpfs_scratch
mmcrfs: Command failed. Examine previous error messages to determine cause.

Is there a way to force the removal?

------------------------------
Joseph Koral
------------------------------

Original Message
6. RE: Help removing disk from NSD volume

0 Like
IBM Champion

José Pina Coelho
Posted Mon May 20, 2024 04:50 AM

Reply
Apparently the news that the filesystem was removed hasn't reached the disks... had something similar on a 16 node cluster 20 years ago (things moved slower).

Use '-v no', but first verify that those NSDs aren't associated with another filesystem (mmlsnsd).

Ref: https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=reference-mmcrfs-command

------------------------------
José Pina Coelho
IT Specialist at Kyndryl
------------------------------

Original Message
7. RE: Help removing disk from NSD volume

1 Like
Joseph Koral
Posted Wed May 22, 2024 07:06 PM

Reply
Jose,

You have been beyond helpful and I appreciate you being patient with me and answering all my questions. I now a fully functioning /scratch volume again. Adding '-v no' worked and the volume mounted without any issues. Consider this case closed.

I am so grateful.

Thanks, Joe Koral

------------------------------
Joseph Koral
------------------------------

Original Message

IBM Storage

The online community where IBM Storage users meet, share, discuss, and learn.

IBM TechXchange Storage Scale (GPFS) Global User Group

Help removing disk from NSD volume

Joseph KoralWed May 15, 2024 05:20 PM

José Pina CoelhoThu May 16, 2024 04:16 AM

Joseph KoralThu May 16, 2024 04:29 PM

José Pina CoelhoFri May 17, 2024 05:17 AM

Joseph KoralFri May 17, 2024 04:49 PM

José Pina CoelhoMon May 20, 2024 04:50 AM

Joseph KoralWed May 22, 2024 07:06 PM

1. Help removing disk from NSD volume

2. RE: Help removing disk from NSD volume

3. RE: Help removing disk from NSD volume

4. RE: Help removing disk from NSD volume

5. RE: Help removing disk from NSD volume

6. RE: Help removing disk from NSD volume

7. RE: Help removing disk from NSD volume