IBM Storage Defender

IBM Storage Defender

Early threat detection and secure data recovery

 View Only

Resolving Filesystem Mount Issues in IBM Storage Scale when using HSM

By Smita Gargote posted 18 hours ago

  

IBM Storage Protect for Space Management handles recall and migration operations. A common issue arises when IBM Storage Scale encounters mount/unmount failures during dsmrecall or dsmmigrate operations. This can render filesystems inaccessible, leading to critical service disruptions.  

 
🧩Architecture Snapshot 

 

This integration model involves: 

  • IBM Storage Scale for high-performance file systems 

  • IBM Storage Protect for Space Management for intelligent data movement  

Together, they enable a scalable, cost-effective, and policy-driven storage solution. 


The diagram above shows the architecture of IBM Storage Protect for Space Management.  

 

 

🔧Prerequisites: 

 

  • Ensure that the user has administrative access to the Storage Scale nodes 

  • Backup any critical data before restarting services 

🧠Why This Matters: 

The dsmmigrate and dsmrecall processes are the backbone of space management: 

  • dsmmigrate moves files from local file systems to tape via LTFS. 

  • dsmrecall brings them back when needed. 

 

These services rely on the health of the GPFS (General Parallel File System) daemons. A misstep during restart can disrupt recall/migrate operations, impacting performance and availability. 

Procedure to Refresh Daemons: 

 

  1. 1. Check Active nodes in the cluster 
    Use the following commands to identify nodes running HSM daemons: 

dsmmigfs query -detail  

dsmmigfs query -failover 

 

  1.  2. Verify running processes   

Look for dsmwatchd, dsmrecalld, and related processes using ps -ef command. 

 

Example: 

[root@testnode ~]# psef | grep –i dsm 

root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach 

root 123456 1 0 04:22 ? 00:00:00 dsmrecalld 

root 234567 123456  0 04:22 ? 00:00:00 dsmrecalld 

root 345678 123456  0 04:22 ? 00:00:00 dsmrecalld 

 

  1. 3. Stop recall processes if needed 

 

# dsmq  

# dsmrm recall-id 

# dsmkilld 

 

  • Allow the dsmmigrate and dsmreconcile processes to complete before proceeding. 

 

  1.  4. Shutdown GPFS Safely 

 

Run “mmshutdown” and confirm daemon shutdown with “mmgetstate -a” commands. 

 

Example: 

[root@testnode ~]#mmshutdown 

 

The resultant output is expected as below: 

Shutting down the following quorum nodes will cause the cluster to lose quorum: 

testnode.xxx.ibm.com    

(Here we tested on the mentioned nodename in IBM lab) 

 

Do you wish to continue [yes/no]: yes 

 

Tue Jun  3 04:20:30 MST 2025: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems 

Tue Jun  3 04:20:35 MST 2025: 6027-1344 mmshutdown: Shutting down GPFS daemons 

Tue Jun  3 04:20:42 MST 2025: 6027-1345 mmshutdown: Finished 

 

[root@testnode ~]#mmgetstate -a 

 

Please refer the Storage Scale documentation to find the details of desired results. 

 

  1. 5. Observe daemon behaviour 
    Post-shutdown, only dsmwatchd should remain active - others exit gracefully. 

 

Example: 

[root@testnode ~]# psef | grep –i dsm 

root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach 

 

In the case of dsmrecalld / dsmmigrate daemons are still running, the below command will need to be run: 

 

[root@testnode ~]#dsmmigfs stop 

 

It is also recommended to stop dsmwatchd using the following command : 

 

[root@testnode ~]#systemctl stop hsm 

 

 

  1. 6. Restart GPFS 
    Use mmstartup to start the daemons and validate with mmgetstate -a. 

 

Example: 

[root@testnode ~]#mmstartup 

Tue Jun  3 04:20:52 MST 2025: 6027-1642 mmstartup: Starting GPFS ... 

 

Tip : For multiple nodes in cluster, the command may be used in the format 

mmstartup  -N node_name1, node_name2 

 

[root@testnode ~]#mmgetstate -a 

 

Node number Node name GPFS state 

---------------------------------------------------- 

  1. testnode-ibm active 

 

Please refer the Storage Scale documentation to find the details of desired results. 

 

  1. 7. Confirm HSM daemon recovery 
    After a few minutes, ps -ef should show all expected HSM processes back online. 

Example: 

[root@testnode ~]# psef | grep –i dsm 

Root 1234 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach 

Root 123456 0 05:00 ? 00:00:00 dsmrecalld 

Root 234567 123456  0 05:00 ? 00:00:00 dsmrecalld 

Root 345678 123456  0 05:00 ? 00:00:00 dsmrecalld 

 

If dsmwatchd is not running, use the following command to start it: 

 

[root@testnode ~]#systemctl start hsm 

 

Start the recall and migration processes using the following command: 

 

[root@testnode ~]#dsmmigfs start 

 

  1. 8. Validate Filesystem Health 
    Wait for some time and confirm that all HSM services are running, using the ps -ef command. Now ensure all local and remote mounts are accessible and functioning from GPFS. 

 

Example: 

[root@testnode ~]#dsmmigfs query -detail -node=all 

 

The output may be similar as below. 

GPFS Node Name: testbox 

GPFS Node ID: 1 

GPFS Status: active 

HSM Status: active 

Recall Daemon Session ID: 12F1F00C 

Mount Disposition: YES 

Ping Recall Daemon: YES 

Watch Daemon Session ID: 0 

📝Conclusion:

Following this daemon refresh procedure ensures that all filesystems remain accessible for recall, migration, and reconciliation operations in Storage Scale environments. This workaround is especially useful when best practices are not followed or during unexpected failures. 

 

🔍 For more technical details, refer to the IBM Documentation on Space Management. 

Overview of the space management client - IBM Documentation 

Contributors: Smita Gargote, Bharat Vyas and Rohit Phiske 

Acknowledgment: Special thanks to Nilesh Bhosale and Ravi Parikh for reviewing and guiding to write this blog. 

0 comments
5 views

Permalink