Enterprise Linux

Enterprise Linux on Power

Enterprise Linux on Power delivers the foundation for your open source hybrid cloud infrastructure with industry-leading cloud-native deployment options.

 View Only

From Firmware to Linux: Mapping Sockets, Cores, and Threads on IBM Power via the Device Tree

By Praveen Pandey posted yesterday

  

From Firmware to Linux: Mapping Sockets, Cores, and Threads on IBM Power via the Device Tree
Authors:  Manvanthara Puttashankar, Praveen Kumar Pandey and Srikar Dronamraju
         

Introduction

IBM Power Systems are purpose-built for enterprise workloads that demand high performance, scalability, and reliability. Whether running SAP HANA, AI/ML pipelines, or traditional databases, optimizing performance often begins with understanding the hardware topology: sockets, cores, and threads. Linux provides several tools: lscpu, top, numactl, lparstat and lstopo to expose CPU information, but these tools show an interpreted, summarized view. For precision and validation, especially in partitioned (LPAR) or virtualized environments, you must look deeper.

      That’s where the device tree plays an important role. The device tree is a firmware-provided data structure that describes hardware to the operating system. On IBM Power, it acts as the authoritative source for CPU topology, making it invaluable for:

  • Verifying resource allocation across sockets, cores, and threads
  • Debugging topology inconsistencies in LPAR environments
  • NUMA-aware scheduling and capacity planning
  • Cross-checking firmware vs. Linux kernel understanding of the system

This blog provides a step-by-step guide to inspecting the device tree on IBM Power, with examples, interpretations, and best practices.

Output of widely used tool

lstopo :


lscpu :


numactl:

Why Use the Device Tree?

On IBM Power Systems, the Device Tree acts as the kernel’s “ground truth” for hardware configuration. System utilities like lscpu and /proc/cpuinfo rely on this data to report CPU and system details. Specifically:

  • lscpu fetches its information from sysfs, which is populated by the kernel based on the Device Tree.
  • /proc/cpuinfo also derives its data from the kernel, which in turn references the Device Tree.

Because of this dependency chain, direct inspection of the Device Tree provides the most accurate and authoritative view of the system’s hardware layout and capabilities.

with:

- Hardware Agnostic Booting
   • The kernel does not need to be recompiled for every hardware variation.
   • The Device Tree abstracts sockets, cores, and threads into a standard description, making Linux portable across Power platforms.

- Accurate CPU Topology Awareness
   • Linux relies on the Device Tree to understand the physical vs logical CPU layout.
   • Scheduler decisions (which thread runs on which core) depend on this topology.
   • Without the DT, Linux might treat all CPUs as flat entities, ignoring cache or socket boundaries.

- Performance Optimization
   • On Power, NUMA (Non-Uniform Memory Access) domains and associativity are defined in the Device Tree.
   • This allows Linux to place memory and tasks closer to the right CPU, avoiding expensive cross-socket memory fetches.

- Scalability for Large Systems
   • IBM Power servers can scale to hundreds of threads. The Device Tree ensures Linux scales gracefully by exposing a structured hierarchy.

- Consistency Across Firmware and OS
   • Device Tree is used by firmware, bootloader, and kernel ensuring no mismatch in understanding hardware configuration.

- Debugging and Diagnostics
• Administrators can read /proc/device-tree to verify sockets, cores, and threads directly from user space.
• This provides a ground truth for debugging performance issues or verifying firmware updates.

Functional Breakdown

1. Verifying Core(s) per Socket

Each core is represented by a directory under /proc/device-tree/cpus/.
Example:

Each PowerPC,POWER11@X directory corresponds to a core. In this example, there are 14 cores.

2. Verifying Thread(s) per Core

Inside each core directory, the ibm,ppc-interrupt-server#s property lists the thread IDs.
Example:



This means 8 hardware threads are available per core.

3. Verifying Socket(s)

Socket are represented indirectly through associativity-domains



The first field indicates how many fields/entities are there to be read in this property
Multiply the second and third values to get the total number of sockets (e.g., 1 × 2 = 2).
The last field is the cores.
The second field is for the tertiary domain index (most likely to be never used and may always be 1)
The third field is for secondary domains or planners
The penultimate field is for the coregroups (only on P10)
The fourth field is for the primary domain index

Alternatively, inspect ibm,associativity inside each core directory:

- Second value → Socket ID
- Third value → Planner ID

ibm,associativity is available only for dedicated cores. Shared cores use kernel-level methods not exposed via the device tree.

Associativity Properties at a Glance

• ibm,associativity -  Values for each entity (core → socket → planar → group).
ibm,current-associativity-domains - Max values in all the same class of hardware layout.
ibm,max-associativity-domains -  Specifies the maximum number of associativity domains supported by the system's hypervisor firmware. These values are consistent across all hardware platforms running the same firmware version. This parameter is crucial for Live Partition Mobility (LPM), as it ensures compatibility between source and destination systems .Sets the maximum associativity limits for migration. Systems with the same firmware usually share similar values due to consistent hardware support.

• ibm,thread-group - helps identify which threads share cache. On Power10/Power11, only 4 threads per core share the L2/L3 cache.


Visualizing the Topology

Here’s a diagram showing 1 socket, 2 cores, and 8 threads per core

Best Practices for Verification

- Cross-check outputs - Always compare device tree values with lscpu and lparstat.
- Automate for scale – Write shell/python scripts to parse ibm,associativity in large systems.
- Provisioning checks - Validate device tree integrity during system provisioning.
- NUMA mapping - Use associativity data for workload placement in NUMA-sensitive workloads (for example, database, HPC).

Troubleshooting Tips

- Incorrect lscpu output - check /proc/device-tree directly.
- Shared cores in LPAR - Rely on kernel logs or virtualization tools.
- Firmware upgrades – After microcode or firmware changes, reverify associativity mapping.
- LPM migrations:  To determine if a partition supports LPM, check the presence of the property ibm,migratable-partition, which indicates LPM capability. Additionally, use ibm,max-associativity-domains to verify that the destination system can accommodate the required associativity mappings, ensuring compatibility during migration. 
 
Conclusion
Mapping sockets, cores, and threads is not just a diagnostic exercise; it is foundational for performance optimization, workload placement, and capacity planning on IBM Power systems. By leveraging the device tree, administrators and performance engineers gain firmware-accurate insights into CPU topology, enabling more informed decisions than relying solely on higher-level Linux tools.
As systems evolve with POWER11 and beyond, the device tree remains the trusted lens through which Linux understands its hardware, bridging firmware and the OS.

0 comments
18 views

Permalink