IBM Spectrum Scale is a clustered filesystem and like any other clustered systems highly depends on the well-being of the underlying network to operate efficiently. Any issue with the network would hamper the proper operation of the cluster and there by the applications running or clients connected to the cluster would be impacted.
IBM Spectrum Scale version 4.2.2 introduces a new command called
mmnetverify
which can be used to detect common network issues in an existing cluster or network configuration issues before configuring the cluster on a group of nodes. This blog is based on an internal presentation by William Brown from Spectrum Scale development.
mmnetverify
is not yet a replacement for other network performance tools like
nsdperf
available with the Spectrum Scale.
nsdperf
helps measure network throughput by generating traffic patterns that mimics the way NSD clients and servers communicate. Unlike
nsdperf
it does not support running throughput tests over RDMA , many to many node throughput tests to provide the overall cluster throughput, or reports node resource stats. Also all the checks are performed within the cluster boundary and it cannot be used to run tests against external entities like protocol/object clients or authentication servers.
mmnetverify
can be used to diagnose the following kind of networking issues in an existing cluster and this article will delve into more details on usage of this command.
- Basic network connectivity issues
- Time synchronization
- Remote shell execution
- Name resolution
- Bandwidth between nodes
- Packet drops between nodes
Command usage -mmnetverify [Operation[ Operation...]]
[-N {Node[,Node...] | all}]
[--target-nodes {Node[,Node...] | all}]
[--configuration-file File] [--log-file File]
[--verbose] [--min-bandwidth Number]
Updates in Spectrum Scale 4.2.3 :
Two new options added [--max-threads N] [--ces-override]
There are basically 2 modes of operation for the
mmnetverify
command
- Configuration file (pre-cluster creation)
- Existing cluster
Configuration file This mode is used to verify the network correctness of a group of nodes before the Spectrum Scale cluster is configured. This mode can also be used when one does not want to use the current Spectrum Scale configuration values.
--configuration-file option is used to specify the path of the file. This file is used to describe the nodes of the cluster and the corresponding configuration values. The format of the file is as follows and only the node parameter need to be specified, remaining parameters will revert to default values if not specified. At least one node entry for the local node must be included in the configuration file.
Updates in Spectrum Scale 4.2.3 :
Specifying this option automatically sets the --ces-overide switch.
# # node – name of node # Required # node <daemon_hostname> [<admin_hostname>]# # rshPath – path to the remote shell command to use # Optional – defaults to /usr/bin/ssh # # rshPath <path_to_remote_shell_command># # rcpPath – path to the remote copy command to use # Optional – defaults to /usr/bin/scp # # rcpPath <path_to_remote_copy_command># # tscTcpPort – port used to communicate with GPFS daemon # Optional – defaults to 1191 # tscTcpPort <port_used_by GPFS_daemeon># # mmsdrsrvPortPort – port used to communicate with GPFS mmsdrsrv daemon # Optional – defaults to 1191 # # mmsdrsrvPortPort <port_used_by_GPFS_mmsdrsrv_daemon> # # tscCmdPortRange – range of default OS ephemeral ports # Optional – defaults to OS ephemeral ports # # tscCmdPortRange <min>-<max>For a detailed explanation of the above parameters please refer to the mmnetverify man page at
http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_mmnetverify.htmExisting clusterThis mode is used to verify the correctness of an already existing Spectrum Scale cluster using the information obtained from the cluster configuration file /var/mmfs/gen/mmsdrfs
Node rolesThe
mmnetverify
command uses the concept of node roles, specifically
local node and
target nodeLocal node is the node from which the one or more checks done by
mmnetverify
are initiated . Checks can be initiated from one or more nodes . Local nodes are specified with the -N option with a comma separate list of nodes. The default is to only run on the node where the
mmnetverify
command is executed.
Target node is the node against which the test will be run. Target nodes are specified with the –target-nodes (-T) option as a comma separated list of node names. The default is to run the checks against all the nodes in the cluster or those defined in the configuration file.
Updates in Spectrum Scale 4.2.3 :
Local node and target nodes now also support node classes .
If more than one nodes are specified with -N options then the tests will now run in parallel from these nodes. The new --max-threads command can be used to define on how many nodes the test can run in parallel, this default for --max-threads is 32 and valid range is 1-64
If Sudo wrappers are enabled on the cluster the mmnetverify can be run under sudo on the administration node as the admin user. While using sudo wrappers the -N option is not supported.
OperationsOperations are the checks done by the
mmnetverify
command and they can directed against the local node, a specific set of nodes, or all the nodes in the cluster.
Local operationsThese operations are run only on the local node and does not involve any target node. The only local operation that is supported is “
Interface” which checks if the IP address for the node's daemon and admin interfaces are enabled on the local node.
Node operationsThese operations are initiated from the local node (or the node/s specified by -N) and always involve one or more target nodes in the cluster. The configuration parameters like node names, remote shell and copy commands etc. defined in the cluster configuration file mmsdrfs are used for the various checks detailed below.
Following are the node operations supported with a brief explanation of each:
Operation | Checks performed | Details |
resolution | Hostname resolution | Target node’s daemon name and admin host names can be resolved to same IP on the local node. Reverse lookup of target node IP to daemon host name |
ping | Network connectivity via ping
| Local node can ping the target node’s daemon and admin interfaces . |
shell | Password less remote shell execution | Local node can execute passwordless remote commands on the target node. |
copy | Password less remote copy | Local node can execute passwordless remote copy to the target node. |
time | Time synchronization | Check time sync between local and target node. |
daemon-port | GPFS daemon port connectivity | Target node can establish TCP connection to mmfsd on local node. |
sdrserv-port | GPFS configuration port connectivity | Target node can establish TCP connection to mmsdrservPort port on local node. |
tsccmd-port | Random Tsccmd port range test | Target node can establish TCP connection to random ports in defined in tscCmdPortRange range on local node. |
data-small | Data exchange check for 100 byte packets | Target node can exchange a series of 100B data packets with local node. |
data-medium | Data exchange check for 16K byte packets | Target node can exchange a series of 16KB data packets with local node. |
data-large | Data exchange check for 64M byte packets | Target node can exchange a series of 64MB data packets with local node. |
bandwidth-node | Network bandwidth for one-to-one | Bandwidth check on target node on exchange of large data to local node. |
flood-node | Datagram flood for one-to-one | Target node floods local node with UDP datagrams . Records packet loss. |
Updates in Spectrum Scale 4.2.3 : Following new node operations are added
protocol-ctdb | CTDB port connectivity | Target node can establish connectivity to CTDB port on local node
Skips test on nodes which is not CES enabled. Can be overridden using --ces override option or configuration file. |
protocol-object | Object protocol connectivity | Target node can establish connectivity to ports used by object services on local node.
Skips test on nodes which is not CES enabled. Can be overridden using --ces override option or configuration file. |
Cluster operationsThese operations are initiated from the local node and all the other nodes in the cluster.
Following are the cluster operations supported:
Operation | Checks performed | Details |
bandwidth-cluster | Network bandwidth many-to-one | All target nodes can exchange large data in parallel to local node and measures bandwidth on each target node . |
flood-cluster | Flood datagram many-to-one
| All target nodes can flood UPD datagrams to local node in parallel recording any packet loss. |
Updates in Spectrum Scale 4.2.3 : Following new cluster operations is added
gnr-bandwidth | Overall cluster bandwidth | Reports sum of bandwidth from all the target nodes to the local node. |
Multiple operations can also be combined using shortcut as follows :
Shortcut | Checks performed |
connectivity | Resolution, ping, shell, copy |
port | daemon-port, sdrserv-port, tsccmd-port |
data | data-small, data-medium, data-large |
bandwidth | bandwidth-node, bandwidth-all |
flood | flood-node, flood-cluster |
all | all operations except flood-node and flood-cluster |
default | If no operations provided, all operations except data-large, flood-node and flood-cluster |
Updates in Spectrum Scale 4.2.3 : Following new shortcut is added
Protocol | protocol-ctdb, protocol-object |
Example –
$mmnetverify port shell flood
is the same as :
$mmnetverify daemon-port sdrserv-port tsccmd-port shell flood-node flood-cluster
Logging optionsmmnetverify
command provides options for logging the command output to a file or display more detailed output messages
Option | |
--log-file File | Specifies the path of the file to log the output of the checks. If not specified all messages are displayed on the console only |
--verbose | Used to display much more details of each checks |
Min-bandwidth optionThe --min-bandwidth option is used to specify the minimum bandwidth for data throughput bandwidth check operations. If the throughput between any two nodes is below the specified value, then this is reported in as an issue in the command output
Examples on analyzing configuration problems with mmnetverify commandAs previously mentioned one of the uses of mmnetverify command is to check the correctness of a group of nodes before configuring the Spectrum Scale cluster. The necessary parameters are provided via a configuration file. This mode is also used to specify customer parameters different from those defined in the configuration file.
Following example checks if all the nodes that one wants to build a cluster out of have everything setup up properly with respect to connectivity, ports and time synchronization you can run the following command.
$mmnetverify --configuration-file mm.config connectivity port time -N all
Note : Above output is trimmed for brevity
Following example checks if the port connectivity is all setup correctly before a cluster is configured. We have blocked port 1191 on dfnode2 and the command reports the issue as expected
mmnetverify --configuration-file mm.config port -N all
Few examples on analyzing network issues with the mmnetverify command on an existing cluster1) The following command checks basic network connectivity problems between a node and other node of the cluster -
$mmnetverify ping -N dfnode2 -T dfnode3
2) The following command checks network and daemon port connectivity problems between the local node and all others nodes of the cluster -
$mmnetverify connectivity port
3) The following command checks for problems with network connectivity between each nodes of the cluster -
$mmnetverify connectivity -N all -T all
4) The following command checks problems with large data transfer between two nodes in the cluster execute a command similar to the following.
$mmnetverify data-large –N dfnode2 –T dfnode3 --verbose
5) The following command checks the throughput from all the cluster nodes to the local node on which the command is run and reports if the throughput is below the value specified in –min-bandwidth
$mmnetverify bandwidth-cluster -T all --min-bandwidth 800M -verbose
6) The following command checks problems with UDP datagram packet loss between local node and all other nodes of the cluster
$mmnetverify flood-cluster --verbose
7) In the following example we block port 1191 on node dfnode2 and run a port check and as expected the checks for daemon-port and sdrserv-port fails for dfnode2.
$mmnetverify connectivity port -T dfnode1
Additional examples based on update to Spectrum Scale 4.2.3 :
8) The following command checks the CTDB and all the object services ports connectivity from all nodes to the CES nodes in the cluster.
9) Check the overall cluster bandwidth from all the target nodes to the local node.
Tools like
mmnetverify
provide a powerful and convenient way to debug common network issues in an existing Spectrum Scale cluster and checking pre-cluster readiness .
Edit 1: Updated with changes included in Spectrum Scale 4.2.3
#IBMSpectrumScale#network#Softwaredefinedstorage#debugging