Power Global

Power Global

Connect, learn, share, and engage with IBM Power.

 View Only

Power11 and... Zero Planned Downtime (ZPD)

By Artur Studzian posted 14 days ago

  

1. Why Zero Planned Downtime?

Common Causes & Frequency of Infrastructure Maintenance:

  • Firmware and IO updates (4x/year)
  • Operating System updates (2x/year)
  • Individual patches (many/year)
  • New Firmware & OS Releases (1x/year)
  • Repair / Services (as needed)
  • Critical Updates due to CVEs (as needed, becoming more frequent)

2. What does "Zero Planned Downtime" mean?

Zero Planned Downtime with Power11 means you can perform necessary system maintenance, updates, and upgrades without ever taking your critical applications offline. Through advanced technologies like live updates, rolling upgrades, and autonomous patching, this capability eliminates the impact of scheduled service interruptions, ensuring continuous operations and maximum staff productivity.

3. Zero Planned Downtime (High Availability and reliability in Power systems)

  • making use of the spare core within the processor to avoid a planned outage for repair
  • providing a Serviceability System (Power server) dedicated to a group of Power servers with capacity,  configuration, and memory to support LPM and LKU/LLU (Live Kernel Update / Live Library Update)
  • HMC orchestrated updates for one touch non-disruptive update (HMC evacuates target system (LPM), updates the infrastructure (system FW, adapter FW, VIOS, and OS), brings workloads back (LPM))
  • Virtualization management with failed BMC (baseboard management controller). Keeps management console functional to preserve non-disruptive repair options (i.e. LPM)
  • Memory PMIC (power management integrated circuit) recovery through DDR4 DDIMM EEPROM redundancy

4. Zero Planned Downtime for system maintenance (Automated Power Platform Update)

  • End-to-end automation for platform updates, including automated movement of workloads, with zero application downtime from a single update flow
  • Ability to update System FW , VIOS (Update and Upgrade) and IO adapter
  • Can be triggered directly from the HMC
  • Supported for both concurrent and disruptive updates
  • Validation for LPM and VIOS redundancy (VIOS maintenance readiness check)
  • Ability to automatically migrate partitions and return as part of the update process
  • Option to choose to return to the source or leave in the target system
  • Option to evacuate all lpars / choose a subset of lpars and order of lpars
  • Option to choose order of updates 
  • Option to either download only or download & update
  • Different source of updates

5. Red Hat Ansible and IBM Terraform can both provide automation for updates. How is the Zero Planned Downtime capability in Power11 different?

The Power11 Zero Planned Downtime feature is part of the platform and does not require Ansible or Terraform. Zero Planned Downtime automations are built into the HMC, require significantly fewer steps for the admin than the manual process required for earlier Power generations, and takes most of the decision making out of the process. It also enables affinity-based partition movement if needed with a disruptive system update.

6. Power Server LPM feature takes care of migration of LPAR. How does Zero Planned Downtime help a customer in such a use case?

Use of LPM by the Automated Platform Maintenance is an option, it facilitates downtime avoidance for use cases that require reboot (i.e. HW maintenance, non-concurrent system FW updates, etc). In addition, Automated Platform Maintenance brings lots of value beyond automating LPM. Automated display of available patch levels for platform components, auto download of all selected patches, validation that the environment is ready for maintenance, applying the patches in the best practices sequence, etc.

7. Automated platform and OS maintenance with zero application downtime feature matrix

8. Disclaimer Zero Planned Downtime

Based upon IBM internal testing of system upgrade scenarios; many (i.e. VIOS, hot plug adapters, I/O adapter FW, and concurrent system firmware updates) can be done in-place while some (i.e. non-concurrent system FW and HW maintenance) may require Live Partition Mobility (LPM) support

PS.

Maybe also soon for non-disruptive OS updates? ;)

0 comments
38 views

Permalink