File and Object Storage

 View Only

Diagnosing network problems in IBM Spectrum Scale with mmnetverify

By RAVIKUMAR NADAR posted Fri February 24, 2017 08:54 AM

  
IBM Spectrum Scale is a clustered filesystem and like any other clustered systems highly depends on the well-being of the underlying network to operate efficiently.  Any issue with the network would hamper the proper operation of the cluster and there by the applications running  or clients connected to the cluster would be impacted.

IBM Spectrum Scale version 4.2.2 introduces a new command called mmnetverify which can be used to detect common network issues in an existing cluster or network configuration issues before configuring the cluster on a group of nodes. This blog is based on an internal presentation by William Brown from Spectrum Scale development.

mmnetverify is not yet a replacement for other network performance tools like nsdperf available with the Spectrum Scale. nsdperf helps measure network throughput by generating traffic patterns that mimics the way NSD clients and servers communicate. Unlike nsdperf it does not support running throughput tests over RDMA , many to many node throughput tests to provide the overall cluster throughput, or reports node resource stats. Also all the checks are performed within the cluster boundary and it cannot be used to run tests against external entities like protocol/object clients or authentication servers.

mmnetverify can be used to diagnose the following kind of networking issues in an existing cluster and this article will delve into more details on usage of this command.

  • Basic network connectivity issues

  • Time synchronization

  • Remote shell execution

  • Name resolution

  • Bandwidth between nodes

  • Packet drops between nodes


Command usage -

mmnetverify [Operation[ Operation...]]
[-N {Node[,Node...] | all}]
[--target-nodes {Node[,Node...] | all}]
[--configuration-file File] [--log-file File]
[--verbose] [--min-bandwidth Number]


Updates in Spectrum Scale 4.2.3 :

Two new options added [--max-threads N] [--ces-override]


There are basically 2 modes of operation for the mmnetverify command

  • Configuration file (pre-cluster creation)

  • Existing cluster 


Configuration file
This mode is used to verify the network correctness of a group of nodes before the Spectrum Scale cluster is configured. This mode can also be used when one does not want to use the current Spectrum Scale configuration values.

--configuration-file option is used to specify the path of the file. This file is used to describe the nodes of the cluster and the corresponding configuration values. The format of the file is as follows and only the node parameter need to be specified, remaining parameters will revert to default values if not specified. At least one node entry for the local node must be included in the configuration file.

Updates in Spectrum Scale 4.2.3 :

Specifying this option automatically sets the --ces-overide switch.



#
# node – name of node
# Required
#
node <daemon_hostname> [<admin_hostname>]

#
# rshPath – path to the remote shell command to use
# Optional – defaults to /usr/bin/ssh
#
# rshPath <path_to_remote_shell_command>

#
# rcpPath – path to the remote copy command to use
# Optional – defaults to /usr/bin/scp
#
# rcpPath <path_to_remote_copy_command>

#
# tscTcpPort – port used to communicate with GPFS daemon
# Optional – defaults to 1191
# tscTcpPort <port_used_by GPFS_daemeon>

#
# mmsdrsrvPortPort – port used to communicate with GPFS mmsdrsrv daemon
# Optional – defaults to 1191
#
# mmsdrsrvPortPort <port_used_by_GPFS_mmsdrsrv_daemon>
#
# tscCmdPortRange – range of default OS ephemeral ports
# Optional – defaults to OS ephemeral ports
#
# tscCmdPortRange <min>-<max>

For a detailed explanation of the above parameters please refer to the mmnetverify man page at        http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_mmnetverify.htm

Existing cluster
This mode is used to verify the correctness of an already existing Spectrum Scale cluster using the information obtained from the cluster configuration file  /var/mmfs/gen/mmsdrfs

Node roles
The mmnetverify command uses the concept of node roles, specifically local node and target node

Local node is the node from which the one or more checks done by mmnetverify are initiated . Checks can be initiated from one or more nodes . Local nodes are specified with the -N option with a comma separate list of nodes. The default is to only run on the node where the mmnetverify command is executed.

Target node is the node against which the test will be run. Target nodes are specified with the –target-nodes (-T) option as a comma separated list of node names. The default is to run the checks against all the nodes in the cluster or those defined in the configuration file.

Updates in Spectrum Scale 4.2.3 :

Local node and target nodes now also support node classes .


If more than one nodes are specified  with -N options then the tests will now run in parallel from these nodes. The new --max-threads command can be used to define on how many nodes the test can run in parallel, this default for --max-threads is 32 and valid range is 1-64


If Sudo wrappers are enabled on the cluster the mmnetverify can be run under sudo on the administration node as the admin user. While using sudo wrappers the -N option is not supported.


Operations
Operations are the checks done by the mmnetverify command and they can directed against the local node, a specific set of nodes, or all the nodes in the cluster.

Local operations
These operations are run only on the local node and does not involve any target node. The only local operation that is supported is “Interface” which checks if the IP address for the node's daemon and admin interfaces are enabled on the local node.

Node operations
These operations are initiated from the local node (or the node/s specified by -N) and always involve one or more target nodes in the cluster. The configuration parameters like node names, remote shell and copy commands etc. defined in the cluster configuration file mmsdrfs are used for the various checks detailed below.

Following are the node operations supported with a brief explanation of each:









































































OperationChecks performedDetails
resolutionHostname resolutionTarget node’s daemon name and admin host names can be resolved to same IP on the local node. Reverse lookup of target node IP to daemon host name
pingNetwork connectivity via ping

 
Local node can ping the target node’s daemon and admin interfaces .
 shellPassword less remote shell executionLocal node can execute passwordless remote commands on the target node.
copyPassword less remote copyLocal node can execute passwordless remote copy to  the target node.
timeTime synchronizationCheck time sync between local and target node.
daemon-portGPFS daemon port connectivityTarget node can establish TCP connection to mmfsd on local node.
sdrserv-portGPFS configuration port connectivityTarget node can establish TCP connection to mmsdrservPort port on local node.
tsccmd-portRandom Tsccmd port range testTarget node can establish TCP connection to random ports in defined in tscCmdPortRange range on local node.
data-smallData exchange check for 100 byte packetsTarget node can exchange a series of 100B data  packets with local node.
data-mediumData exchange check for 16K byte packetsTarget node can exchange a series of 16KB data  packets with local node.
data-largeData exchange check for 64M byte packetsTarget node can exchange a series of 64MB data  packets with local node.
bandwidth-nodeNetwork bandwidth for one-to-oneBandwidth check on target node on exchange of large data to local node.
flood-nodeDatagram flood for one-to-oneTarget node floods local node with UDP datagrams . Records packet loss.

 

Updates in Spectrum Scale 4.2.3 : Following new node operations are added













protocol-ctdbCTDB port connectivityTarget node can establish connectivity to CTDB port on local node

Skips test on nodes which is not CES enabled. Can be overridden using --ces override option or configuration file.
protocol-objectObject protocol connectivityTarget node can establish connectivity to ports used by object services on local node.

Skips test on nodes which is not CES enabled. Can be overridden using --ces override option or configuration file.

 

Cluster operations

These operations are initiated from the local node and all the other nodes in the cluster.
Following are the cluster operations supported:


















OperationChecks performedDetails
bandwidth-clusterNetwork bandwidth many-to-oneAll target nodes can exchange large data in parallel to local node  and measures bandwidth on each target node .
flood-clusterFlood datagram many-to-one

 
All target nodes can flood UPD datagrams to local node in parallel recording any packet loss.

 

Updates in Spectrum Scale 4.2.3 : Following new cluster  operations is added








gnr-bandwidthOverall cluster bandwidthReports sum of  bandwidth from all the target nodes to the local node.

Multiple operations can also be combined using shortcut as follows :



































ShortcutChecks performed
connectivityResolution, ping,  shell,  copy
portdaemon-port,  sdrserv-port,  tsccmd-port
datadata-small, data-medium, data-large
bandwidthbandwidth-node, bandwidth-all
floodflood-node, flood-cluster
allall operations except flood-node and flood-cluster
defaultIf no operations provided, all operations except data-large, flood-node and flood-cluster

 

Updates in Spectrum Scale 4.2.3 : Following new shortcut is added







Protocolprotocol-ctdb, protocol-object

Example

$mmnetverify port shell flood

is the same as :

$mmnetverify daemon-port sdrserv-port tsccmd-port shell flood-node flood-cluster

Logging options

mmnetverify command provides options for logging the command output to a file or display more detailed output messages















Option
--log-file FileSpecifies the path of the file to log the output of the checks. If not specified all messages are displayed on the console only
--verboseUsed to display much more details of each checks

 

Min-bandwidth option
The --min-bandwidth option is used to specify the minimum bandwidth for data throughput bandwidth check operations. If the throughput between any two nodes is below the specified value, then this is reported in as an issue in the command output

Examples on analyzing configuration problems with mmnetverify command

As previously mentioned one of the uses of mmnetverify command is to check the correctness of a group of nodes before configuring the Spectrum Scale cluster. The necessary parameters are provided via a configuration file. This mode is also used to specify customer parameters different from those defined in the configuration file.

Following example checks if all the nodes that one wants to build a cluster out of have everything setup up properly with respect to connectivity, ports and time synchronization you can run the following command.

$mmnetverify --configuration-file mm.config connectivity port time -N all

Picture1

 

 

Picture2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Note : Above output is trimmed for brevity

Following example checks if the port connectivity is all setup correctly before a cluster is configured. We have blocked port 1191 on dfnode2 and the command reports the issue as expected

mmnetverify --configuration-file mm.config port  -N all

Picture3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Few examples on analyzing network issues with the mmnetverify command on an existing cluster

1) The following command checks basic network connectivity problems between a node and other node of the cluster -

$mmnetverify ping -N dfnode2 -T dfnode3

Picture4

 

 

 

 

2) The following command checks network and daemon port connectivity problems between the local node and all others nodes of the cluster -

$mmnetverify connectivity port

Picture5

 

 

 

 

 

 

 

 

 

 

3) The following command checks for problems with network connectivity between each nodes of the cluster -

$mmnetverify connectivity -N all -T all

Picture6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4) The following command checks problems with large data transfer between two nodes in the cluster execute a command similar to the following.

$mmnetverify data-large –N dfnode2 –T dfnode3 --verbose

Picture7

 

 

 

 

 

 

 

5) The following command checks the throughput from all the cluster nodes to the local node on which the command is run and reports if the throughput is below the value specified in –min-bandwidth

$mmnetverify bandwidth-cluster -T all --min-bandwidth 800M -verbose

Picture8

 

 

 

 

 

 

 

 

6) The following command checks problems with UDP datagram packet loss between local node and all other nodes of the cluster

$mmnetverify flood-cluster --verbose

Picture9

 

 

 

 

 

 

7) In the following example we block port 1191 on node dfnode2 and run a port check and as expected the checks for daemon-port and sdrserv-port fails for dfnode2.

$mmnetverify connectivity port -T dfnode1

Picture10

 

 

 

 

 

 

 

 

 

 

 

Additional examples based on update to Spectrum Scale 4.2.3 :

8) The following command checks the CTDB  and all the object services ports connectivity from all nodes to the CES nodes in the cluster.



 

 

 

 

 

 

 

 

9) Check the overall cluster bandwidth from all the target nodes to the local node.



 

 

 

 

 

 

Tools like mmnetverify provide a powerful and convenient way to debug common network issues in an existing Spectrum Scale cluster and checking pre-cluster readiness .

 

Edit 1: Updated with changes included in Spectrum Scale 4.2.3

 
#IBMSpectrumScale
#network
#Softwaredefinedstorage
#debugging
0 comments
26 views

Permalink