Power

 View Only

Virtual Ethernet Software Multi Queue

By lokesh kambadur posted Thu December 19, 2024 06:31 AM

  

Virtual Ethernet Software Multi Queue

Introduction:

    IBM AIX, existing virtual ethernet driver is not scaling in terms of bandwidth and latency with evolved network physical adapters speed of 100G and 200G speed.

  In latest IBM AIX release 7.3.3.0 TL and VIOS 4.1.1.0, IBM AIX introduced, the software multi queue feature in Virtual ethernet Driver and optimized to cope up the higher bandwidth and low latency requirements in today’s enterprise systems. With introduction of software multi queue for virtual ethernet driver, in best possible configurations there is a 2X gain in Throughput and Transactions per seconds (TPS).

   The Software multi queue for receive is controlled by a new ODM attribute queues_rx and the software multi queue for transmit is controlled by another new ODM attribute queues_tx .   By default, no. of receive queues is set to ‘0’ and number of transmit queues set to ‘4’.  User can choose to go back to the legacy virtual ethernet driver by tuning “queues_rx=0” and “queues_tx=0” odm attributes.

    An optimized Receive Buffer Pool policy is introduced in virtual ethernet driver, which is controlled by new ODM attribute: rx_pool_policy.  It can be set to either “new” or “legacy”. Default setting is “new”, which means driver will use optimal way of dividing the Receive Buffers into Pools. This policy boosts the virtual ethernet performance by 5-10%.
With “new” policy, entire receive buffers are divided into three pools, whereas with “legacy” policy, the receive buffers are divided into five pools. Customers are given an option to switch between legacy and new receive pool manger but setting the rx_pool_policy attribute appropriately.

    In case of VIOS, SEA threads will be disabled when all underlying virtual ethernet adapters have both receive and transmit software multi queue enabled i.e queues_rx and queues_tx are set to non-zero for all the virtual ethernet adapters within the SEA.

 By disabling SEA threads in the above case, communication to outside system through VIOS, is giving better performance in terms of Bandwidth and TPS.

   How to enable:

   To know, if multi queue virtual ethernet is supported on AIX and VIOS, grep for below ODMs using lsattr command.

Ex:

# lsattr -El entX | grep queues

queues_rx       0.               Requested number of Receive Queues                   True

queues_tx       12              Requested number of Transmit Queues                  True

#

    To know, if receive pool policy supported, grep for rx_pool_policy odm using the lsattr command.

  Ex:

# lsattr -El ent1 | grep rx_pool_policy
   rx_pool_policy  new            Receive Buffer Pools Policy                          True


#

 By default, IBM AIX and VIOS, virtual ethernet adapter driver operates with queues_rx=0 and queues_tx=12.  

The queues_rx and queues_tx values are of incremental value of 4. Customer can enable multi_queue with starting value of 4 for both queues_rx and queues_tx.

 To get the better performance considering CPU utilization, the optimal values to be tuned are

 queues_rx = 4 and queue_tx=12 using chdev command.

a.        chdev -l entX -a queues_rx=4

b.        chdev -l entX -a queues_tx=12

Note: To run chdev on virtual ethernet device, it should be in close state ( ifconfig enX down detach ) and run ‘chdev’ of queue parameter to tune.

The maximum queues_rx is 32 for receive path tuning and queues_tx=64 for transmit path.

What makes difference:
              Software Multi Queue V/S Existing virtual ethernet.

Software Multi Queue virtual ethernet gives > 80% gain in TCP stream bandwidth while running traffic on multiple sockets (28 sockets) and > 82% gain TPS for TCP RR while running traffic on multiple sockets (150 sockets) in case of within the CEC communication in our lab environment.

Outside the box/CEC communication (communication through VIOS SEA), the bandwidth gain is >150% for TCP stream traffic and >100% gain TPS for TCP RR traffic, with traffic running on multiple sockets (28 sockets for TCP stream and 150 sockets for TCP RR).

There is a caveat, with increasing queues, CPU utilization is not linear with Bandwidth gain. CPU utilization is ~2.5x times whereas Bandwidth 2x times with increasing the queues.  Similarly, the same for TPS where it is not linear.

The gain in the TCP stream bandwidth with multiple RX queues, leads to higher CPU utilization.

Hence, it is recommended that while deciding on the number of RX queues, users should consider if they want increased bandwidth or optimal CPU utilization.

Below 2 diagram depicts Bandwidth gain with respect to CPU utilization which is not linear.


  Upper row X-axis represents transmit queues i.e 0, 12, 18. 20

 Lower row X-axis represents receive queues i.e 0, 4,6 12

  Upper row X-axis represents transmit queues i.e 0, 12, 18. 20

  Lower row X-axis represents receive queues i.e 0, 4,6 12

About the authors:

Srikanth Kondapaneni (E-Mail : srikanko@in.ibm.com)   AIX NDD Development

Kiran Anumalasetty     (E-Mail : akiran@in.ibm.com)        AIX NDD Development

0 comments
82 views

Permalink