File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform

View Only

Back to Blog List

Extension of the problem determination capabilities in IBM Spectrum Scale 4.2.2

By Archive User posted Tue January 10, 2017 09:48 AM

In IBM Spectrum Scale v 4.2.2 and later, the problem determination framework (available via the mmhealth command) was further extended to support in total 22 different system components (compared to 15 in v 4.2.1). It automatically performs over 150 different checks of the components functionality, generating over 500 different events for it.

Furthermore, the mmhealth command was extended with a subcommand cluster, allowing you to see the state of your whole cluster at a glance! If any monitored component on any IBM Spectrum Scale cluster node fails one of the checks, you can now find it out immediately, without having to execute (and keep in mind) hundreds of corresponding commands. This allows you to make sure, your cluster is working fine and if not, that you are informed about the corresponding issues.

[root@ch-41 ~]# mmhealth cluster show

Component           Total         Failed       Degraded        Healthy          Other
------------------------------------------------------------------------------------------
NODE                    5              0              1              4              0
GPFS                    5              0              0              5              0
NETWORK                 5              0              0              5              0
FILESYSTEM              1              0              0              1              0
DISK                    2              0              0              2              0
CES                     2              1              0              1              0
PERFMON                 3              0              0              3              0

The “drilling down” process was extended by a cluster stage, enabling you to get from an abstract and compact representation of the whole cluster to a detailed analysis of the relevant component on a relevant node just in a few keystrokes:

[root@ch-41 ~]# mmhealth cluster show ces

Component                Node                     Status            Reasons
------------------------------------------------------------------------------------------
CES                      ch-41.localnet.com       HEALTHY           -
CES                      ch-42.localnet.com       FAILED            ces_network_ips_down, nfs_in_grace, nfsd_down

The node view of the mmhealth command was further enhanced in the IBM Spectrum Scale 4.2.2 to show you when the last state changing event occurred. This can be extremely useful to pinpoint the reason of the corresponding problem:

[root@ch-41 ~]# mmhealth node show -N ch-42

Node name:      ch-42.localnet.com
Node status:    DEGRADED
Status Change:  10 min. ago

Component      Status        Status Change     Reasons
------------------------------------------------------------------------------------------
GPFS           HEALTHY       7 days ago        -
NETWORK        HEALTHY       8 days ago        -
FILESYSTEM     HEALTHY       7 days ago        -
DISK           HEALTHY       7 days ago        -
CES            FAILED        9 min. ago        nfsd_down, ces_network_ips_down, nfs_in_grace
PERFMON        HEALTHY       8 days ago        -

The event subcommand, also introduced in the IBM Spectrum Scale 4.2.2, provides you on demand with a detailed description of any detected problem and suggestions about the ways to fix them:

[root@ch-41 ~]# mmhealth event show nfsd_down
Event Name:              nfsd_down
Event ID:                999167
Description:             Checks for a NFS service process
Cause:                   The NFS server process was not detected
User Action:             Check the health state of the NFS server and restart, if necessary. The process might hang or is in a defunct state
Severity:                ERROR
State:                   FAILED

To learn more about using IBM Spectrum Scale’s problem determination framework take a look at:

Monitoring system health by using the mmhealth command in the IBM Knowledge Center

#Softwaredefinedstorage
#IBMSpectrumStorage
#IBMSpectrumScale

2 comments

1 view

Permalink

https://community.ibm.com/community/user/blogs/archive-user/2017/01/10/extension-of-the-problem-determination-capabilities-in-ibm-spectrum-scale-422

Comments

Archive User

Wed January 25, 2017 08:33 AM

Thanks for reviewing, Doris!

Archive User

Mon January 23, 2017 03:02 PM

Thanks Pavel for the quick highlights of this new function!

File and Object Storage

File and Object Storage

Extension of the problem determination capabilities in IBM Spectrum Scale 4.2.2

By Archive User posted Tue January 10, 2017 09:48 AM

Permalink

Comments

Additional
Resources

Office

Quick Links

File and Object Storage

File and Object Storage

Extension of the problem determination capabilities in IBM Spectrum Scale 4.2.2

By Archive User posted Tue January 10, 2017 09:48 AM

Permalink

Comments

Additional Resources

Office

Quick Links

Additional
Resources