In the world of enterprise storage, ensuring seamless integration and optimal performance across components is critical. One such integration is between IBM Storage Scale and IBM Storage Protect for Space Management - a powerful duo enabling efficient data tiering and lifecycle management.
In this blog, we are revisiting the best practices for restarting the HSM (Hierarchical Storage Management) services in a clean manner. Whether performing an upgrade or addressing a break-fix scenario, following a structured approach ensures system stability and data integrity.
🧩 Architecture Snapshot
This integration model involves:
- IBM Storage Scale for high-performance file systems
- IBM Storage Protect for Space Management for intelligent data movement
Together, they enable a scalable, cost-effective, and policy-driven storage solution.

Figure source: https://www.ibm.com/support/pages/system/files/inline-files/IBM_Spectrum_Protect_HSM_Scale_Configuration_Guide_0.pdf
The diagram above shows the architecture of IBM Storage Protect for Space Management.
🧠 Why This Matters
The dsmmigrate and dsmrecall processes are the backbone of space management:
- dsmmigrate moves files from local file systems to tape via LTFS
- dsmrecall brings them back when needed
These services rely on the health of the GPFS (General Parallel File System) daemons. A misstep during restart can disrupt recall/migrate operations, impacting performance and availability.
🔧 Prerequisites
- Ensure user has administrative access to the Storage Scale nodes
- Backup any critical data before restarting services
- Ensure no active recall/migration operations
✅ Procedure
1. Check Active nodes in the cluster
Use the following commands to identify nodes running HSM daemons:
dsmmigfs query -detail
dsmmigfs query -failover
2. Verify running processes
Look for dsmwatchd, dsmrecalld, and related processes using “ps -ef” command.
Example:
# ps -ef | grep -i dsm
root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach
root 123456 1 0 04:22 ? 00:00:00 dsmrecalld
root 234567 123456 0 04:22 ? 00:00:00 dsmrecalld
root 345678 123456 0 04:22 ? 00:00:00 dsmrecalld
root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach
root 123456 1 0 04:22 ? 00:00:00 dsmrecalld
root 234567 123456 0 04:22 ? 00:00:00 dsmrecalld
root 345678 123456 0 04:22 ? 00:00:00 dsmrecalld
3. Shutdown GPFS safely
Run “mmshutdown” command and confirm daemon shutdown with “mmgetstate -a” command.
Example:
# mmshutdown
Shutting down the following quorum nodes will cause the cluster to lose quorum:
testnode.xxx.ibm.com
Do you wish to continue [yes/no]: yes
Tue Jun 3 04:20:30 MST 2025: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems
Tue Jun 3 04:20:35 MST 2025: 6027-1344 mmshutdown: Shutting down GPFS daemons
Tue Jun 3 04:20:42 MST 2025: 6027-1345 mmshutdown: Finished
Tip: For multiple nodes in cluster, the command may be used in the format:
mmshutdown -N node_name1, node_name2
The "mmgetstate -a" output will show the inactive state. Refer to the Storage Scale documentation for more information.
Example:
# mmgetstate -a
Node number Node name GPFS state
----------------------------------------------------
11 testnode-ibm down
4. Observe daemon behaviour
Post-shutdown, only dsmwatchd should remain active - others exit gracefully.
Example:
# ps -ef | grep -i dsm
root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach
5. Restart GPFS
Use "mmstartup" command , to start the daemons and validate with "mmgetstate -a".
Example:
# mmstartup
Tue Jun 3 04:20:52 MST 2025: 6027-1642 mmstartup: Starting GPFS ...
Tip: For multiple nodes in cluster, the command may be used in the format:
mmstartup -N node_name1, node_name2
# mmgetstate -a
Node number Node name GPFS state
----------------------------------------------------
11 testnode-ibm active
6. Confirm HSM daemon recovery
After a few minutes, "ps -ef" should show all expected HSM processes back online.
Example:
# ps -ef | grep -i dsm
root 1234 1 0 Jan01 ? 01:09:42 /opt/tivoli/tsm/client/hsm/bin/dsmwatchd nodetach
root 123456 1 0 05:00 ? 00:00:00 dsmrecalld
root 234567 123456 0 05:00 ? 00:00:00 dsmrecalld
root 345678 123456 0 05:00 ? 00:00:00 dsmrecalld
7. Validate Filesystem health
Ensure all local and remote mounts are accessible and functioning.
📝 Conclusion
Following this structured restart procedure ensures that the HSM services remain resilient and responsive. It’s a small step that goes a long way in maintaining operational excellence.
🔍 For more technical details, refer to the IBM Documentation on Space Management:
Overview of the space management client - IBM Documentation
Contributors: Rohit Phiske, Smita Gargote and Bharat Vyas
Acknowledgment: Special thanks to Nilesh Bhosale and Ravi Parikh for reviewing and guiding to write this blog.