As systems/workloads grow more complex, we at AIX believe that built-in auto-tuning capabilities will play a crucial role in maintaining optimal performance. By dynamically adjusting system resources based on workload behavior, it will ensure efficient operation with minimal manual tuning. Hence, at AIX we have been lately enhancing ASO (Active System Optimizer) capabilities with innovative approaches to tune the system based on the workload characteristics to enhance the system performance. ASO is a user space daemon designed to enhance the workload performance by auto tuning the system. This blog will take you through some of the recent enhancements in ASO and their potential workload benefits.
Auto 16MB Data/Text Page Promotion
The latest version of ASO features a new lightweight profiling mechanism that identifies hot shared memory pages and hot test segments within the system. These hot pages with a base size of 4KB or 64KB are then automatically promoted to 16MB which helps reduce Translation Lookaside Buffer (TLB) faults, resulting in performance improvements for workloads.
A TLB (Translation Lookaside Buffer) is a small, fast cache in the CPU that stores recent translations of virtual page to physical page, speeding up memory access in systems that use virtual memory. When the CPU cannot find an entry for a virtual page in the TLB, a TLB fault is generated, and the CPU needs to walk through the page table to populate the TLB entry to resume the execution. These TLB faults stalls the CPU and results in minor performance penalties. Upgrading the page to 16MB in size increases the coverage in the TLB from normal 4K or 64K virtual address range to 16MB virtual address space, which reduces the probability of TLB faults.
In our internal testing, the TPCE benchmark showed approximately a 5% performance improvement with the hot page promotion feature enabled. This highlights the effectiveness of automatic memory tuning in real-world workloads.
Affinity-Aware Placement – Bringing Threads Closer for Performance
Efficient thread/process placement is critical for maximizing workload performance, especially on large multi SRAD/node systems. The latest version of ASO introduces a new strategy where it identifies clusters/group of cooperating threads and place them within a smaller CPU affinity domain to improve locality. Additionally, based on the workload’s memory access patterns, the memory pages are also migrated to the same affinity domain to further improve workload performance.
We evaluated this feature using SAP-SD and DT7 benchmarks in our internal testing. SAP-SD showed upto 11% improvements in SAPs/Core Metric. DT7 resulted in up to 15% improvement in TPS. This again highlights the effectiveness of ASO’s thread placement strategies in real-world workloads.
The performance benefits observed from the above two auto-tuning capabilities indicate that we are on the right track. These results validate our strategy of building self-tuning systems that adapt to workload characteristics. Looking ahead we remain committed to advancing this approach - because The future of performance optimization is self-tuning.
Technical References
https://community.ibm.com/community/user/blogs/yasser-sait/2025/02/14/16mb-mpss
https://community.ibm.com/community/user/blogs/kaushal-kumar/2024/12/18/aix-16mb-text-page-promotion