WebSphere Application Server & Liberty

Lessons from the field #4: High Impact AIX Network Tuning

By Brent Daniel posted Wed April 28, 2021 12:01 PM

As the App Platform SWAT team, our focus is on WebSphere Application Server traditional and Liberty; however, even we've been surprised by how often we need to get into network tuning. General symptoms such as timeout errors and performance problems may be with the application, application server, or database, but they may also be with the underlying network's performance. In the same way that application server administrators need to be concerned about Java garbage collection and operating system CPU tuning, we highly recommend that administrators become more comfortable with basic investigations of the network and operating system network stack that they run on. TCP network traffic in particular is a critical aspect of the application server environment. One would hope that network and operating system administration teams would proactively monitor and fix such network issues; but, from what we've seen, they're sometimes not looking.

This is the first part in a series on high impact network tuning and we start with AIX. Properly tuning the network will depend on many factors such as the system hardware and the load on the system, so there isn't a one size fits all solution. However, there are some common AIX network symptoms and tunings that we have found to be useful in resolving peformance problems that we will cover here.

We will cover three high impact symptoms and common resolutions:

TCP retransmissions

If an operating system network stack doesn't receive an acknowledgment for a TCP packet that it sent within a certain amount of time, it will retransmit it. This may be due to network congestion, packet loss, network device saturation, VIOS saturation, operating system network buffer saturation, and many other factors. TCP retransmission is one of the core reasons why TCP is generally used rather than UDP because TCP provides certain guarantees to the application.

However, one of the downsides of this is that the application doesn't know about TCP retransmits. All the application knows is that it asked the network to send some data, and it's up to the operating system and its TCP implementation to guarantee the data was sent (or throw a timeout or error). From the point of view of the user and administrator, you'll often just see an increase in the response time or a timeout error. At least with the latter, there's some obvious symptom, but with the former, it's pretty hard to notice unless you have very deep response time introspection.

The good news is that there's a simple heuristic to monitor for TCP retransmissions: In most modern, internal (LAN) networks, a healthy network should not have any TCP retransmissions. If it does, you've likely got a problem. Therefore, you simply just need to use a tool like `netstat` to watch for retransmissions. For example, periodically run the following command and monitor for increases in the values:

$ netstat -s -p tcp | grep retrans
1583979 data packets (9088131222 bytes) retransmitted
15007 path MTU discovery terminations due to retransmits
185201 retransmit timeouts
34466 fast retransmits
344489 newreno retransmits
7 times avoided false fast retransmits
0 TCP checksum offload disabled during retransmit

If you observe retransmissions, engage your network team and AIX support (if needed) to review whether the retransmission are true retransmissions or not and to investigate the cause(s). One common cause is a saturation of AIX OS TCP buffers and you may consider testing tuning such as the following using the `no` command. For some tuning, you may need to reboot the node.

no -o tcp_sendspace=524176
no -r -o tcp_sendspace=524176
no -o tcp_recvspace=524176
no -r -o tcp_recvspace=524176
no -o sb_max=1048352
no -r -o sb_max=1048352

For details, see https://www.ibm.com/docs/en/aix/7.2?topic=tuning-tcp-streaming-workload

Hypervisor send & receive failures

Hypervisor send & receive failures record various types of errors sending and receiving TCP packets which may include TCP retransmissions and other issues. As with TCP retransmissions, they should generally be 0 and are relatively easy to monitor using `netstat`:

$ netstat -v | grep "Hypervisor.*Failure"
Hypervisor Send Failures: 0
Hypervisor Receive Failures: 14616351

We have seen issues with these send & receive failures quite frequently and in some cases there was a large performance impact. The general solution is to increase the Virtual Ethernet Adapter (VEA) buffers to their maximum values as there is little downside other than increased memory usage. Use the `chdev`  command to change the buffer sizes. Changes to the buffers requires rebooting the node.

First, review the maximum value for each parameter. For example:

$ lsattr -R -l ent0 -a max_buf_small
512...4096 (+1)

Then, set to that maximum value. For example:

$ chdev -P -l ent0 -a max_buf_small=4096

Perform this for the following: min_buf_tiny, max_buf_tiny, min_buf_small, max_buf_small, min_buf_medium, max_buf_medium, min_buf_large, max_buf_large, min_buf_huge, and max_buf_huge.

For details, see https://www.ibm.com/support/pages/causes-hypervisor-send-and-receive-failures

Network interrupt processing CPU bottleneck

Enabling dog threads allows incoming packets to be processed in parallel by multiple CPUs. On an SMP system, a single CPU can become a bottleneck for receiving incoming packets from a fast adapter, possibly leading to dropped packets. Dog threads will resolve this bottleneck, but will increase latency in situations where the load is too light to take advantage of parallel processing. They also will lead to increased CPU utlization because a packet will have to be queued to a thread and the thread will have to be dispatched.

When deciding whether to use dog threads, consider the following guidelines:
  • More CPUs than adapters need to be installed. Typically, at least two times more CPUs than adapters are recommended.
  • Systems with faster CPUs benefit less. Machines with slower CPU speed may be helped the most.
  • This feature is most likely to enhance performance when there is high input packet rate. It will enhance performance more on MTU 1500 compared to MTU 9000 (jumbo frames) on Gigabit as the packet rate will be higher on small MTU networks.
  • The dog threads run best when they find more work on their queue and do not have to go back to sleep (waiting for input). This saves the overhead of the driver waking up the thread and the system dispatching the thread.
  • The dog threads can also reduce the amount of time a specific CPU spends with interrupts masked. This can release a CPU to resume typical user-level work sooner.
  • The dog threads can also reduce performance by about 10 percent if the packet rate is not fast enough to allow the thread to keep running. The 10 percent is an average amount of increased CPU overhead needed to schedule and dispatch the threads.

Dog threads can be enabled using the command 'ifconfig <interface> thread'. For example:

ifconfig en0 thread

Additionally, you can determine the number of CPUs that will be used for dog threads using the 'ndogthreads' setting. For example, the following command will allow one CPU to be used for threads:

no -o ndogthreads=1

When testing with threads, it's advisable to start with a low number and increase it as needed. Setting 'ndogthreads' to 0 will use all available CPUs up to a maximum of 256.

With dog threads enabled, you can check to see the processing that the threads are doing using 'netstat -s'. For example:

$ netstat -s| grep hread
352 packets processed by threads
0 packets dropped by threads

Pause threads

If ethernet flow control is enabled, in general, a healthy network should show no increase in PAUSE frames (e.g. from network switches). Monitor the number of `XOFF` counters (`PAUSE ON` frame). For example:

$ netstat -v | grep -i xoff
Number of XOFF packets transmitted: 0
Number of XOFF packets received: 0
Number of XOFF packets transmitted: 0
Number of XOFF packets received: 0
Number of XOFF packets transmitted: 0
Number of XOFF packets received: 0
Number of XOFF packets transmitted: 0
Number of XOFF packets received: 0

For more advanced network tuning, review the following resources:

See our team's previous post in the Lessons from the field series: OpenShift Live Container Debugging

#AIXnetworktuning #app-platform-swat