High Performance Computing

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

 View Only

How to configure IBM Spectrum Symphony to support a mixed MTU cluster

By Archive User posted Wed December 21, 2016 03:52 PM

  

Originally posted by: Leo Lin @Symphony


 

An IBM Platform Symphony cluster requires a homogeneous MTU (maximum transmission unit) environment. This is technically required by the UDP communications of a base IBM Platform Symphony daemon: LIM. The LIM daemon on the master host needs to communicate with LIMs on all other hosts in the cluster by UDP. Such communication will fail if the MTU setting on any other host is different from the master host. Hosts in the cluster can be in unavailable or in OK state, but not have load information, and are therefore, not able to run workload.

The best solution for such an environment is to set the MTU consistently over all hosts. However, for some scenarios, it is difficult to achieve this setting in a short amount of time. For those cases, as a workaround, you can configure your master host and master candidate hosts to support a multi-home environment.

Figure 1 depicts a specific configuration. While the following discussion assumes two different MTUs: 9000 and 1500 respectively, this configuration should also work for other MTU sizes.

 

                                                       Figure 1

 

 

Ensure the following settings:

  1.  Create two subnets so that each subnet only contains hosts of the same MTU size
  2.  Add a switch or router so hosts from different subnets can communicate.
  3.  Add an additional NIC (network interface card) for the master and master candidate, so that each can join both subnets with corresponding MTU size set.
  4.  Set the route table in the master and master candidate so that all traffic to subnet 9000 will go through the MTU9000 NIC and all traffic to subnet 1500 will go through the MTU1500 NIC.

 

In some network infrastructures, if a host has two NICs, it probably has two host names as well. Two host names can be handy if a client wants to control which NIC it connects to by simply specifying the corresponding name. IBM Platform Symphony, however, depends on host names to be unique to identify the host. IBM Platform Symphony allows users to configure the $EGO_CONFDIR/hosts file to give IBM Platform Symphony the unique ID, without affecting the infrastructure’s view of the host. The $EGO_CONFDIR/hosts file has the same format as the /etc/hosts file.

To configure IBM Platform Symphony to support a mixed MTU cluster, configure the $EGO_CONFDIR/hosts file:

 

  1.  Shut down your IBM Platform Symphony cluster:

# soamcontrol app disable all

# egosh service stop all

# egosh ego shutdown all

 

  1.  For hosts with two host names, find the one suitable to be used by IBM Platform Symphony by running:

# /bin/hostname

We call this host name the official name.

Here is an example of a master host with two names (master-01 and master-01-jumbo), with following details:

master-01: MTU1500 IP=172.29.11.41

master-01-jumbo: MTU9000 IP=172.31.43.41

subnet 1500: 172.29.0.0 netmask=255.255.0.0

subnet 9000: 172.31.0.0 netmask=255.255.0.0

 

Run the hostname command to determine the official name:

# hostname

master-01

 

We choose master-01 as the official name.

 

  1.  For master host and master candidate hosts, create the $EGO_CONFDIR/hosts file so that both IP addresses of the master host map to the official master host name, and both IP addresses of the master candidate map to the official master candidate host name.

Here is an example, assuming the official name for master is master-01 and master candidate is master-02:

# cat $EGO_CONFDIR/hosts

172.29.11.41  master-01    master-01.example.ibm.com
172.31.43.41  master-01    master-01.example.ibm.com
172.29.11.42  master-02    master-02.example.ibm.com
172.31.43.42  master-02    master-02.example.ibm.com

  1.  For all management hosts that are not master host nor master candidate, create the $EGO_CONFDIR/hosts file so that official names of master and master candidate can be resolved to the desired IP address (NIC).

 

For example, for any management hosts in subnet 9000, the $EGO_CONFDIR/hosts file should be configured as follows:

# cat $EGO_CONFDIR/hosts

172.31.43.41  master-01    master-01.example.ibm.com
172.31.43.42  master-02    master-02.example.ibm.com

 

On the contrary, for any management hosts in subnet 1500, the $EGO_CONFDIR/hosts file should be configured as follows:

# cat $EGO_CONFDIR/hosts

172.29.11.41  master-01    master-01.example.ibm.com
172.29.11.42  master-02    master-02.example.ibm.com

 

Note: In a typical cluster, $EGO_CONFDIR is on a shared file system for the master host, master candidates, and management hosts. To allow them to be configured differently, follow these steps:

 

  1.  Create an $EGO_LOCAL_CONFDIR/hosts file instead of an $EGO_CONFDIR/hosts file respectively for the master host, master candidate, and management hosts.
  2.  Create a symbolic link from $EGO_LOCAL_CONFDIR/hosts to $EGO_CONFDIR/hosts. To do this, in one of the above-mentioned hosts, run:

# ln –s $EGO_LOCAL_CONFDIR/hosts $EGO_CONFDIR/hosts

 

  1.  For all compute hosts (as well as client hosts if applicable), create the $EGO_CONFDIR/hosts file so that all compute hosts can successfully resolve the official name of the master and master candidates, as well as map to the desired IP address (NIC).

For example, for any hosts in subnet 9000, as well as any client hosts that want to connect to the master on the NIC of MTU9000, the $EGO_CONFDIR/hosts file should be configured as follows:

# cat $EGO_CONFDIR/hosts

172.31.43.41  master-01    master-01.example.ibm.com
172.31.43.42  master-02    master-02.example.ibm.com

 

On the contrary, for any hosts in subnet 1500, or any client hosts that want to connect to the master on the NIC of MTU1500, the $EGO_CONFDIR/hosts file should be configured as follows:

# cat $EGO_CONFDIR/hosts

172.29.11.41  master-01    master-01.example.ibm.com
172.29.11.42  master-02    master-02.example.ibm.com

 

  1.  Start the IBM Platform Symphony cluster so that the new configurations take effect:

# egosh ego start all


#SpectrumComputingGroup
0 comments
0 views

Permalink