IBM TechXchange Storage Scale (GPFS) Global User Group

 View Only
  • 1.  Help removing disk from NSD volume

    Posted Wed May 15, 2024 05:20 PM

    I have a disk that is down on my scratch volume and I would just like to remove it and then build the volume again from scratch. This cluster is a bit older, so I am just trying to get by until we purchase a new one. Below is what I am seeing on the scratch volume and it is currently not mounting anywhere. I would like to remove gd6_12. This system was built by someone that is no longer available and I am fairly new using GPFS. Thank you in advance for any help you can provide.

    [root@snode1 ~]# mmlsdisk gpfs_scratch
    disk         driver   sector     failure holds    holds                            storage
    name         type       size       group metadata data  status        availability pool
    ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
    gd1_12       nsd         512          -1 Yes      Yes   ready         up           system       
    gd2_12       nsd         512          -1 Yes      Yes   ready         up           system       
    gd3_12       nsd         512          -1 Yes      Yes   ready         up           system       
    gd4_12       nsd         512          -1 Yes      Yes   ready         up           system       
    gd5_12       nsd         512          -1 Yes      Yes   ready         up           system       
    gd6_12       nsd         512          -1 Yes      Yes   to be emptied down         system       
    Attention: Due to an earlier configuration change the file system
    may contain data that is at risk of being lost.



    ------------------------------
    Joseph Koral
    ------------------------------


  • 2.  RE: Help removing disk from NSD volume

    IBM Champion
    Posted Thu May 16, 2024 04:16 AM
    Edited by José Pina Coelho Thu May 16, 2024 04:17 AM

    First you need to make sure that the filesystem can be recovered with that disk down, "mmlsfs gpfs_scratch" will tell you if it has two copies of data and metadata.

    If it does have two copies, "mmrestripefs -r gpfs_scratch" should reallocate copies of all the data on gd_12 to other NSDs.  After the restripe, "mmlsdisk gpfs_scratch" should show that disk as "empty" instead of "to be emptied".

    If it doesn't have two copies, you'll lose the data in that volume.

    Note 1: You did mention it being a scratch volume, so the alternative is to delete the filesystem (mmdelfs), then re-create it with the (mmcrfs).

    Note 2: Even if you're not replicating the data, you should create the filesystem with metadata replication, that way if you lose an NSD, you just lose files, not the filesystem.

    PS: From the mmdeldisk man page: 

    • If the disk is permanently damaged and the file system is not replicated, or if the mmdeldisk command repeatedly fails, see the GPFS: Problem Determination Guide and search for Disk media failure.



    ------------------------------
    José Pina Coelho
    IT Specialist at Kyndryl
    ------------------------------



  • 3.  RE: Help removing disk from NSD volume

    Posted Thu May 16, 2024 04:29 PM

    Thank you for your response. It does not have two copies as I am getting an error message stating "Too many disks are unavailble" Sounds like I should just delete the file system and re-create without gd6_12, correct?

    Do you know what the right command would be to create the new file system using the 5 disks that are available?

    Thank you so much for your help so far.



    ------------------------------
    Joseph Koral
    ------------------------------



  • 4.  RE: Help removing disk from NSD volume

    IBM Champion
    Posted Fri May 17, 2024 05:17 AM

    Before removing the filesystem, take note of the current values on the mmlsfs output, as some of them might be important for your case.

    • mmcrfs scratch "gd1_12;gd2_12;gd3_12;gd4_12;gd5_12" -A yes -m 2 -M 3 -r 1 -R 3 -T /scratch
    • (M/R) -> Maximum metadata/data replicas. (Can't be changed after creation)
    • (m/r) -> Desired number of metadata/data replicas.  -m2 will preserve the filesystem structure even in the case of a disk failure.  You may choose -r2 if you want to preserve filesystem contents in the case of a disk failure, but it will consume twice the space.


    ------------------------------
    José Pina Coelho
    IT Specialist at Kyndryl
    ------------------------------



  • 5.  RE: Help removing disk from NSD volume

    Posted Fri May 17, 2024 04:49 PM
    Edited by Joseph Koral Fri May 17, 2024 04:50 PM

    When I tried to remove the file system, I did get an error. Then I did it with a -p and I thought it removed it because I cannot see it anymore. I then tried to create the file system and received an error that the NSDs still belong to a file system. Also, we don't replicate the scratch volume because we want it to be faster than home and data.

    Here is all of my output:

    [root@snode1 ~]# mmdelfs gpfs_scratch
    Too many disks are unavailable.
    Some file system data are inaccessible at this time.
    Check error log for additional information.
    Too many disks are unavailable.
    Some file system data are inaccessible at this time.
    mmdelfs: tsdelfs failed.
    mmdelfs: Command failed. Examine previous error messages to determine cause.

    [root@snode1 ~]# mmdelfs gpfs_scratch -p
    Too many disks are unavailable.
    Some file system data are inaccessible at this time.
    Check error log for additional information.
    Too many disks are unavailable.
    Some file system data are inaccessible at this time.
    mmdelfs: Attention: Not all disks were marked as available.
    mmdelfs: Propagating the cluster configuration data to all
      affected nodes.  This is an asynchronous process.

    [root@snode1 ~]# mmlsdisk gpfs_scratch
    mmlsdisk: File system gpfs_scratch is not known to the GPFS cluster.
    mmlsdisk: Command failed. Examine previous error messages to determine cause.

    [root@snode1 ~]# mmcrfs gpfs_scratch "gd1_12;gd2_12;gd3_12;gd4_12;gd5_12" -A yes -m 1 -M 3 -r 1 -R 3 -T /scratch
    Disk gd3_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:37 2024.
    Disk gd4_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:39 2024.
    Disk gd5_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:43 2024.
    Disk gd2_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:42 2024.
    Disk gd1_12 may still belong to file system . Created on node 192.168.160.23, Thu Mar 21 22:15:40 2024.
    Error accessing disks.
    mmcrfs: tscrfs failed.  Cannot create gpfs_scratch
    mmcrfs: Command failed. Examine previous error messages to determine cause.

    Is there a way to force the removal?



    ------------------------------
    Joseph Koral
    ------------------------------



  • 6.  RE: Help removing disk from NSD volume

    IBM Champion
    Posted Mon May 20, 2024 04:50 AM

    Apparently the news that the filesystem was removed hasn't reached the disks...  had something similar on a 16 node cluster 20 years ago (things moved slower).

    Use '-v no', but first verify that those NSDs aren't associated with another filesystem (mmlsnsd).

    Ref: https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=reference-mmcrfs-command



    ------------------------------
    José Pina Coelho
    IT Specialist at Kyndryl
    ------------------------------



  • 7.  RE: Help removing disk from NSD volume

    Posted Wed May 22, 2024 07:06 PM

    Jose,

    You have been beyond helpful and I appreciate you being patient with me and answering all my questions. I now a fully functioning /scratch volume again. Adding '-v no' worked and the volume mounted without any issues. Consider this case closed.

    I am so grateful.

    Thanks, Joe Koral



    ------------------------------
    Joseph Koral
    ------------------------------