IBM Storage Defender

IBM Storage Defender

Early threat detection and secure data recovery

 View Only

Refreshing HSM Services on IBM Storage Scale: A Best Practice Guide

By Smita Gargote posted yesterday

  

In the world of enterprise storage, ensuring seamless integration and optimal performance across components is critical. One such integration is between IBM Storage Scale and IBM Storage Protect for Space Management - a powerful duo enabling efficient data tiering and lifecycle management. 

In this blog, we are revisiting the best practices for restarting the HSM (Hierarchical Storage Management) services in a clean manner. Whether performing an upgrade or addressing a break-fix scenario, following a structured approach ensures system stability and data integrity. 

🧩 Architecture Snapshot

This integration model involves: 

  • IBM Storage Scale for high-performance file systems 
  • IBM Storage Protect for Space Management  for intelligent data movement

Together, they enable a scalable, cost-effective, and policy-driven storage solution. 


Figure source: https://www.ibm.com/support/pages/system/files/inline-files/IBM_Spectrum_Protect_HSM_Scale_Configuration_Guide_0.pdf

The diagram above shows the architecture of IBM Storage Protect for Space Management.   

🧠 Why This Matters

The dsmmigrate and dsmrecall processes are the backbone of space management:

  • dsmmigrate moves files from local file systems to tape via LTFS 
  • dsmrecall brings them back when needed

These services rely on the health of the GPFS (General Parallel File System)  daemons. A misstep during restart can disrupt recall/migrate operations, impacting performance and availability. 

🔧 Prerequisites 

  • Ensure user has administrative access to the Storage Scale nodes
  • Backup any critical data before restarting services
  • Ensure no active recall/migration operations 

Procedure   

1. Check Active nodes in the cluster 


Use the following commands to identify nodes running HSM daemons: 
dsmmigfs query -detail  
dsmmigfs query -failover 
 
2. Verify running processes   
Look for dsmwatchd, dsmrecalld, and related processes using “ps -ef” command. 
 
Example: 
# ps -ef | grep -i dsm 
root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach 
root 123456 1 0 04:22 ? 00:00:00 dsmrecalld 
root 234567 123456  0 04:22 ? 00:00:00 dsmrecalld 
root 345678 123456  0 04:22 ? 00:00:00 dsmrecalld 
 

root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach 
root 123456 1 0 04:22 ? 00:00:00 dsmrecalld 
root 234567 123456  0 04:22 ? 00:00:00 dsmrecalld 
root 345678 123456  0 04:22 ? 00:00:00 dsmrecalld 
 


3. Shutdown GPFS safely 
Run “mmshutdown” command and confirm daemon shutdown with “mmgetstate -a” command. 
 
Example: 
# mmshutdown 
Shutting down the following quorum nodes will cause the cluster to lose quorum: 
testnode.xxx.ibm.com    
 
Do you wish to continue [yes/no]: yes 
 
Tue Jun  3 04:20:30 MST 2025: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems 
Tue Jun  3 04:20:35 MST 2025: 6027-1344 mmshutdown: Shutting down GPFS daemons 
Tue Jun  3 04:20:42 MST 2025: 6027-1345 mmshutdown: Finished 
 
Tip: For multiple nodes in cluster, the command may be used in the format:  
mmshutdown -N node_name1, node_name2 
 
The  "mmgetstate -a" output will show the inactive state. Refer to the Storage Scale documentation for more information. 
 
Example: 
# mmgetstate -a

Node number Node name GPFS state 
---------------------------------------------------- 
11 testnode-ibm down 
 

4. Observe daemon behaviour 


Post-shutdown, only  dsmwatchd  should remain active - others exit gracefully. 
 
Example: 
# ps -ef | grep -i dsm 
root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach 
 
5. Restart GPFS


Use "mmstartup" command , to start the daemons and validate with "mmgetstate -a". 
 
Example: 
# mmstartup 
Tue Jun  3 04:20:52 MST 2025: 6027-1642 mmstartup: Starting GPFS ... 
 
Tip: For multiple nodes in cluster, the command may be used in the format:  
mmstartup  -N node_name1, node_name2 
 
# mmgetstate -a 
 Node number Node name GPFS state 
---------------------------------------------------- 
11 testnode-ibm active 
 
6. Confirm HSM daemon recovery 


After a few minutes, "ps -ef" should show all expected HSM processes back online. 
 
Example: 
# ps -ef | grep -i dsm 
root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach 
root 123456 1 0 05:00 ? 00:00:00 dsmrecalld 
root 234567 123456 0 05:00 ? 00:00:00 dsmrecalld 
root 345678 123456 0 05:00 ? 00:00:00 dsmrecalld 
 
7. Validate Filesystem health 


Ensure all local and remote mounts are accessible and functioning. 

📝 Conclusion

Following this structured restart procedure ensures that the HSM services remain resilient and responsive. It’s a small step that goes a long way in maintaining operational excellence. 
 
🔍 For more technical details, refer to the IBM Documentation on Space Management: 
Overview of the space management client - IBM Documentation 

Contributors: Rohit Phiske, Smita Gargote and Bharat Vyas 
Acknowledgment: Special thanks to Nilesh Bhosale and Ravi Parikh for reviewing and guiding to write this blog.  

0 comments
10 views

Permalink