Authored By : Xiaohan Qin
TCP Large Send Offload is a technique to offload the TCP segmentation (typically done in software at the TCP layer) to a NIC adapter to increase transmit throughput and to lower CPU overhead. For stream type of workloads, typically TCP Large Send Offload boosts performance as much as three or four times. AIX has a proprietary virtual Ethernet Large Send solution for many years. PowerVM 2.2.5 adds new capabilities to accelerate workloads with TCP Large Send Offload which are especially helpful for IBM i and Linux workloads. The main benefits of the new LargeSend solution are two folds 1) enable Linux and IBM-I for LargeSend offload. 2) Allow LargeSend between different OS types, e.g. AIX and Linux, Linux and IBM-i.
PowerVM's virtual Ethernet Large Send Offload has unique architecture properties because the virtual Ethernet interface resides in the client LPAR, its corresponding physical NIC device is owned by VIOS. For Large Send packets to be segmented by the NIC in VIOS, the client LPAR TCP/IP stack (the sender of packets) needs to convey two facts:
- Whether a packet needs to be segmented and is destined to another LPAR that can handle large sized packet.
- If the packet needs to be segmented, the Maximum Segmentation Size (MSS) needs to be set.
For packets destined to another LPAR within the managed system and connected to the same virtual switch and on the same VLAN, the receiver must have the ability to handle large sized packets, otherwise the packets may be dropped.
AIX has had an innovative solution to support virtual Ethernet Large Send Offload for many years. There are two components to the existing AIX TCP virtual Ethernet Large Send Offload.
- A mechanism to discover whether the receiver is capable of handling large packets
- AIX network stack employs a proprietary TCP protocol extension during the connection’s 3-way handshake for determining the receiver’s capability.
- A method to convey Large Send packets and their MSS.
- AIX makes use of the checksum field in the TCP header (to be over-written in the packet segmentation process) for storing the MSS.
However the existing AIX Large Send solution met resistance by the Linux community due the proprietary nature of the TCP protocol extension; hence, for a long time POWER Linux lacked a Large Send solution. With the rising importance of the POWER Linux platform, PowerVM engineers provide a new alternative Large Send solution which is described below.
PowerVM 2.2.5 introduces the Power Hypervisor (PHYP) assisted platform TCP Large Send Offload solution. This new Large Send technique removes the dependency on the proprietary TCP protocol extension currently available only for AIX. Instead, it requires a virtual Ethernet end-point to register with PHYP, whether it is capable of handling large packets. Furthermore, Power Architecture interfaces have been defined to carry the Large Send flag and MSS from one LPAR to another.
The operation of the new LargeSend requires the following minimal FW and OS levels:
- POWER System Firmware 840.10
- VIOS 126.96.36.199
- AIX 7.2 TL1 or 7.1 TL4 SP3
- IBM i 7.1 TR10 or IBM i 7.2 TR3
- POWER Linux
- RedHat 6.8, 7.2
- SLES 11SP4, 12 SP1
- Ubuntu 14.04.4, 15.10, 16.04
(Note that if you upgrade AIX to the above level (7.2 TL1 or 7.1 TL4 SP3), please ensure that your system firmware is at 840.10 or later. Due to a defect in the firmware 840.00, largesend packets are dropped, resulting in significantly deteriorated performance)
When a client LPAR OS is PHYP-Large Send-capable and initiates the transfer of a PHYP-Large Send packet, if the destination LPAR is also PHYP-Large Send-capable, then PHYP will deliver the Large Send packet un-segmented. If the destination happens to be the trunk adapter in VIOS, the MSS will be extracted from the Large Send packets and passed onto the physical NIC for hardware TCP segmentation offload.
In case the destination virtual Ethernet interface is not capable of handling PHYP-Large Send packets, PHYP will segment the packets based on the MSS provided by the sender. There is still an advantage of having the hypervisor perform the TCP segmentation because the transmission overhead is lower than having client LPAR perform the segmentation.
The major advantage of the PHYP assisted Large Send is its ability to accommodate all OS types (AIX, IBM i, Linux) in a similar manner. Previously AIX enables Large Send only if the connection partner is another AIX LPAR or is outside the managed system. In other words, AIX would not turn on Large Send if the other LPAR is running IBM i or Linux. When all the client LPARs meet the PHYP Large Send requirements, AIX would be able to exploit Large Send even if the receiving LPARs run Linux or IBM-I, and vice versa.
Now that AIX supports two flavors of Large Send, how does the user choose which type of Large Send to use? At the network interface layer, there is one Large Send control (attribute mtu_bypass). When mtu_bypass is on, the AIX TCP stack first tries to establish the legacy Large Send for the connection. If that fails, and if the PHYP level supports the new Large Send, AIX then uses the new PHYP Large Send method. The implementation choice of dynamic selection of the Large Send solution is to avoid performance degradation in cases where the sender is AIX and at a new level which is PHYP Large Send enabled, but the receiver is an AIX LPAR at an older code level which is not PHYP Large Send capable. In short, the choice of which large send to use is automatically handled so no explicit user action is required.
Contacting the PowerVM Team
Have questions for the PowerVM team or want to learn more? Follow our discussion group on LinkedIn IBM PowerVM or IBM Community Discussions