PowerVM

Connect, learn, share, and engage with IBM Power.

View Only

Back to Blog List

AIX Network Tuning for 10Ge and Virtual Network

By Jim Cunningham posted Mon June 22, 2020 12:47 PM

Authored By : Xiaohan Qin

(The blog is written in response to a customer’s request for performance tuning of 10Ge on POWER8. Most of the defaults and tuning recommendation are applicable to POWER7 as well.)

Typically network performance tuning involves setting network stack, interface, and NIC device parameters in addition to provisioning CPU and memory appropriately. In the case of virtual networking, the back-end device settings as well as virtual IO server resources are also relevant.

Table 1 displays key network interface tuning parameters for 10Ge and their default values in the AIX environment. Most default values are sufficient to support 10Ge throughput, with a few exceptions noted in the column of “Remarks”.

Parameters	Component	Default	Remarks
rfc1323	stack	1	Default based adapter speed and MTU See Table 2
tcp_sendspace	stack	262144	Default based adapter speed and MTU See Table 2
tcp_recvspace	stack	262144	Default based adapter rate and MTU See Table 2
udp_sendspace	stack	9216	Consider increasing to 64K if the workloads stress the UDP protocol.
udp_recvspace	stack	42080	Consider increasing to 640K if the workloads stress the UDP protocol.
mtu	interface	1500	Consider setting to 9K and enabling jumbo_frame. Not default to 9K because it requires same setting from other nodes and switches on the LAN
jumbo_frame	device	no	Consider changing to yes. Require other nodes and the switches on the LAN supporting the configuration.
flow_ctrl	device	yes
chksum_offload	device	yes
large_send	device	yes
large_receive	device	yes

Table 1 : Network Interface Tuning Parameters for 10Ge (v)NIC

Note that the “interface” attributes/parameters can be changed as follows:

chdev –l enX –a <attribute>=<value>

Similarly, the “device” attributes/parameters can be changed as below:

chdev –l entX –a <attribute>=<value>

More on TCP/UDP send and receive space

For the TCP protocol, by default, AIX does not use system-wide “no” attributes for rfc1323, tcp_sendspace and tcp_recvspace. Instead it uses interface specific network option (aka ISNO) so that these parameters can be configured based on the underlying adapter accordingly. Table 2 shows the AIX default values for the three TCP parameters. With rfc1323 on, 256K TCP send and receive space is capable of supporting 10Ge adapters.

Adapter/MTU	tcp_sendspace	tcp_recvspace	rfc1323
1G/1500	131072	65536	0
1G/9000	262144	131072	1
10G/1500	262144	262144	1
10G/9000	262144	262144	1

Table 2 : AIX defaults for TCP send and receive space and rfc1323

The ISNOs are a part of network interface attributes. The output below displays the ISNOs of interest. If any of the ISNO attributes is not set, the interface configuration takes the default from the Table 2. The chdev command (chdev –l enX –a <attribute>=<value>) can be used to change the settings if necessary.

# lsattr -El en0|grep -E "rfc|space"

rfc1323 Enable/Disable TCP RFC 1323 Window Scaling True

tcp_recvspace Set Socket Buffer Space for Receiving True

tcp_sendspace Set Socket Buffer Space for Sending True

Currently, there is no ISNO attributes for UDP. In other words, the UDP protocol still relies on the global “no” attributes for udp_sendspace and udp_recvspace, whose defaults may be too low for 10Ge adapters. They should be increased significantly (cf. Table 1) if the workloads stress the UDP protocol.

Additional 10Ge performance parameters

Table 3 lists additional tuning parameters common to many 10Ge NIC adapters. The exact attribute names and their defaults may vary from device to device. The out-of-box defaults have been tuned in the development labs to achieve the peak throughput of the adapters. No adjustment is needed except for perhaps very strenuous workloads, whose tuning might require the assistance of performance engineers.

Attributes	Remarks
ipv6 offload	ipv6 offload are configured independently from ipv4 offload
num of tx queues	The number of transmit queues
sz of tx queue	The size of transmit queues
num of rx queues	The number of receive queues
sz of rx queue	The size of receive queues
sz of sw tx queue	The size of software transmit queue.
intr_cnt	Interrupt coalescing counter
intr_time	Interrupt coalescing timer (microseconds)

Table 3: Additional 10Ge tuning parameters

Virtual Ethernet (VETH) backed by SEA

The VETH configuration attributes are unique and quite different from the attributes of physical NICs (compare Table 4 and Table 1).

First of all, VETH lacks several attributes which are common for other physical NICs, namely, jumbo_frame, largesend, and large_receive. The reason is that AIX VETH is capable of sending and receiving “super” packets up to ~64K with no required configuration. As a result, it is possible to set MTU over VETH as large as 60K.

With that said, we caution against such practice because it is more likely to cause confusion and problems than bring benefits. For traffic that is routed outside the server, large packets must be segmented/fragmented. Instead of sending large packets assuming a huge MTU, client VETH is better-off employing largesend. The difference is that “largesend packets” convey the MSS (Maximum Segmentation Size). Upon receiving “largesend packets”, SEA passes the information to physical NIC for TCP segmentation offload. Without MSS, SEA would have to perform IP fragmentation in software (slower). And because fragmented IP packets can be a security risk, they are blocked by firewall rules in some environment. In reality, with path MTU discovery (on by default in AIX), the MSS negotiated between two end points across servers reflects the network path MTU rather than the huge MTU set on the interface.

How does one enable “largesend” over VETH when the device itself does not have the attribute? The largesend configuration (over VETH) is done via network interface attribute mtu_bypass. In AIX 7.1, the default for mtu_bypass is still “off”. In AIX 7.2, the default value has been changed to “on” (See Table 4).

Secondly, AIX VETH supports multiple receive buffer sizes, which is uncommon for physical NICs. The device attributes include a set of receive buffer tuning parameters. Most of the time, the default values work fine. However, for heavy workloads, particularly for workloads consisting predominantly of small messages, the defaults for tiny and small buffers prove inadequate. We recommend increasing them to their maximums (See Table 4).

Another frequently asked question is how fast is the VETH device? A number of sources cited VETH to be equivalent to 1Ge. That is not accurate. The bandwidth of VETH is workload dependent. For stream workloads, VETH can easily achieve 20G or even higher. For transactional type of workloads, the VETH rate is less impressive, performing a little better than 1Ge. AIX network interface configuration treats VETH as 10Ge when it comes to setting defaults for TCP send and receive space and rfc1323.

Parameters	Component	Default	Remarks
rfc1323	stack	1
tcp_sendspace	stack	262144
tcp_recvspace	stack	262144
udp_sendspace	stack	9216	Consider increasing to 64K if the workloads stress the UDP protocol.
udp_recvspace	stack	42080	Consider increasing to 640K if the workloads stress the UDP protocol.
mtu	interface	1500	Not default to 9K as it requires proper setting from other nodes on the LAN.
mtu_bypass	interface	On Off	This attribute applies to VETH only. In AIX 7.2, default On In AIX 7.1, default Off (need to turn on)
Checksum_offload	device	yes
max_buf_huge min_buf_huge	device	64 24
max_buf_large min_buf_large	device	64 24
max_buf_medium min_buf_medium	device	256 128
max_buf_small min_buf_small	device	2048 512	Recommend 4096 for both.
max_buf_tiny min_buf_tiny	device	2048 512	Recommend 4096 for both

Table 4 : Tuning parameters for VETH

SEA configuration

As mentioned at the beginning, for VETH to perform well, the SEA adapter must be configured accordingly. Table 5 includes the SEA attributes that impact the VETH throughput and their defaults in the recent versions of VIOS (2.2.4.0 later). Please see the “Remarks” column for tuning recommendation.

Parameters	Default	Remarks
jumbo_frame	no	Consider changing to yes for high performance. Require other nodes and switches on the LAN supporting the configuration.
largesend	1
large_receive	0	Enable large_receive. The reason that large_receive has been disabled by default is because previously Linux and IBM-i were not able to handle large_receive packets. This issue has been resolved recently by VIOS, Linux and IBM-i. We plan to enable large_receive by default in the near future.
thread	1
nthreads	7
realin_threads	0	Number of threads dedicated for processing packets received on physical NIC. “0” means packets received on virtual and physical sides share the threads. Note realin_threads < nthreads.
queue_size	8192

Table 5 : Performance tuning parameters for SEA

For configuration parameters of the physical NIC under SEA, please refer to the “device” attributes in Table 1. Likewise, for the configuration parameter of VETH under SEA, please refer to the “device” attributes in Table 4.

SEA is a dual-function device. On one hand it serves as a bridge for client LPARs, on the other hand it can be used as a network interface device for the VIOS partition. For the bridging function, the client LPAR traffic does not involve VIOS networking stack. Hence VIOS/SEA network stack or interface related parameters such as TCP send and receive space and MTU are irrelevant for client VETH performance. In case the SEA is used as a NIC for a VIOS partition, the network stack and interface parameters over SEA are determined based on the physical adapter underneath SEA (cf. Table 1 and Table 2).

SR-IOV backed vNIC

Although the SR-IOV backed vNIC adapter is a virtual device and capable of LPM, configuration-wise it resembles more like a physical NIC than VETH (use Table 1 and Table 2). The recommendation is to turn on largesend, large_receive, and jumbo frame if possible. Note that to enable jumbo_frame on vNIC, one must enable jumbo_frame on the physical port of the SR-IOV adapter, which can be done through HMC GUI. This requirement is similar to enabling jumbo_frame for a HEA logical port (on POWER7 systems).

CPU and memory resources

The rule of thumb for CPU sizing for 10Ge adapters is 1-2 processor core per 10Ge throughput depending on the processor, more for Power 7 (~2) and less for Power 8 (~1.5).

As for memory, the bulk part of device memory consumption comes from the packet buffers used in Tx/Rx. If jumbo_frame is not enabled, the packet buffer size is 2K. Otherwise, it is 16K. For a device configured with 3 Rx queues of 1K elements, 2 Tx queues of 1K, a software Tx queue of 8K elements, and jumbo_frame not enabled, the memory footprint can be as large as ~26MB, calculated based on ~(2*1K+3*1K+8K)*2K. In case jumbo_frame is enabled, the memory footprint may be increased by 8 fold.

References:

http://www.ibmsystemsmag.com/aix/administrator/networks/network_tuning/

http://www.ibmsystemsmag.com/aix/administrator/networks/A-Primer-on-Power-Systems-10-Gb-Ethernet

Contacting the PowerVM Team

Have questions for the PowerVM team or want to learn more? Follow our discussion group on LinkedIn IBM PowerVM or IBM Community Discussions

0 comments

154 views

Permalink

https://community.ibm.com/community/user/blogs/jim-cunningham1/2020/06/22/aix-network-tuning-for-10ge-and-virtual-network

PowerVM

PowerVM

AIX Network Tuning for 10Ge and Virtual Network

By Jim Cunningham posted Mon June 22, 2020 12:47 PM

More on TCP/UDP send and receive space

Additional 10Ge performance parameters

Virtual Ethernet (VETH) backed by SEA

SEA configuration

SR-IOV backed vNIC

CPU and memory resources

References:

Contacting the PowerVM Team

Permalink

Additional
Resources

Office

Quick Links

PowerVM

PowerVM

AIX Network Tuning for 10Ge and Virtual Network

By Jim Cunningham posted Mon June 22, 2020 12:47 PM

More on TCP/UDP send and receive space

Additional 10Ge performance parameters

Virtual Ethernet (VETH) backed by SEA

SEA configuration

SR-IOV backed vNIC

CPU and memory resources

References:

Contacting the PowerVM Team

Permalink

Additional Resources

Office

Quick Links

Additional
Resources