Maintaining system uptime is a top priority for enterprises running mission-critical workloads on AIX. The Live Kernel Update (LKU) feature has always been a cornerstone for achieving zero-downtime maintenance. With AIX 7.3 TL4, IBM introduces significant improvements to LKU, making it faster, more efficient and more secure.
Why Live Kernel Update Matters
LKU allows administrators to apply kernel updates without rebooting the system, ensuring continuous availability. This capability minimizes disruptions and helps organizations meet stringent SLAs for uptime.
Overall LKU Time Improvements
Optimized Page Migration
One of the most time-consuming aspects of LKU is page migration, which involves transferring memory pages from the original LPAR to a surrogate LPAR. Traditionally, this process could cause application slowdowns due to page faults during this window. The latest enhancements address these challenges by:
- Increasing Parallelism: Multiple memory pages can now be migrated simultaneously, reducing overall migration time.
- Reducing Bottlenecks: Optimized data transfer mechanisms ensure smoother and faster migration.
Parallelized Mirrored Copy Creation
Creating a mirrored copy of the root volume group is critical but often time-intensive. The new design parallelizes this process, significantly improving overall Live Update duration.
LKU Blackout Time Enhancements
The blackout period, the brief phase when the original LPAR is paused has been optimized to minimize the impact:
- Optimized Disk and Volume Group Handling: Overhead during blackout is reduced, enabling efficient handling of large numbers of disks and volume groups.
- Improved Creation of NFS-Based Paging Spaces: Optimized approach for creating NFS based paging spaces, to facilitate memory migration.
New Features in LKU
AIX 7.3 TL4 introduces new capabilities that enhance flexibility and security:
- Support Read-Only Access to /proc Files: Processes can now keep psinfo, map, status, or fd files open in read-only mode during LKU, preventing LKU failures.
- Support for Encrypted Physical Volumes: LKU now fully supports LPARs using encrypted physical volumes, ensuring compliance without compromising uptime.
- Intelligent Management of Rapidly Spawing Processes: Introduced a mechanism to maintain system stability when applications create processes at a high rate. During the checkpoint operation all the fork operations are intelligently managed to ensure consistency and prevent disruptions.
Scaling the LKU Time Estimator
The LKU Time estimator assesses the LPAR state, taking into account the workload running on the system and the system’s configuration and state, and generates an estimate for the LPAR blackout time and the total LKU time.
The estimator has been enhanced by AI and has been scaled to account for a wider variety of situations and hence would provide accurate estimations for efficient and effective planning of LKU.
Why These Enhancements Matter
These improvements make LKU faster, more reliable, and better suited for modern enterprise environments. Organizations can now perform kernel updates with minimal disruption, even in complex setups with large storage and memory configurations.
Authors:
Vinod Boddukuri
Darmoju Deekshitha
Abhishek Paliwal
Venkateshwar Yerravalli
Sridhar Arra
Barenya Nandy
Gururaj Mujumdar