Co-authored by @Charls Chacko
Fixing network latency with a dedicated network adapter
This is part 3 of our blog series aimed at providing tips on how to optimize performance of your IBM Sterling OMS. In each blog, we address common issues that affect performance, best practices for addressing them, and real-world performance issues customers have faced and how they were resolved.
In this blog, we’re going to discuss how adding a dedicated network adapter can fix network latency issues with your Sterling OMS and improve performance.
Performance tests show inconsistent results (response time/ throughput) across all components/transactions.
In this example, a customer ran multiple performance tests with the same load/data and got different results, irrespective of the day/time the tests were run. The customer then took the following steps:
- Investigated each layer of the solution, starting with DB/SQL performance, application (CPU, Memory, Threads etc) and JMS (Configuration, load etc).
- Multiple tests were run, and it was observed that, whenever the customer had bad test results, the ping stats from DB AIX server to all app and agent servers showed higher response time.
- The attempts below were made to fix the issue, but it persisted:
- SFP module replaced in DC7-agg18 port 1/25
- Repeated tests with DB Entitlement Capacity (EC) set to 14
- Switched to Standby database server
- Used Dataguard VRF to reduce the traffic on firewall
- After further investigation, the high ping response time was suspected to be caused by a network adapter being shared across multiple servers and thus choking the network.
Introduced Dedicated Network adapter (SR-IOV) for DB AIX server along with Dataguard VRF and increased DB CPU Entitled Capacity to 14 cores which gave improved and stable results.
The table below shows the network statistics pattern at different stages of performance tests:
Introducing a dedicated network adapter with bigger bandwidth provided the following benefits:
- Avoided network slowness during high traffic
- Delivered 5-15% improvements in production for most of the OMS components (services/APIs) as show in the table below
- Experienced a stable response time to WebService calls in production, i.e. the number of calls with higher response time (red and orange count in below graph) were reduced and flattened. This improved the customer’s experience.
For more details on this particular scenario, please contact me or Charls Chacko (firstname.lastname@example.org) and stay tuned for the next blog in this series.
In case you missed our previous performance insights, you can access them below.
Part 1: How to mitigate the risk of a production application server outage
Part 2: How to avoid redundant organization configuration