High Performance Computing

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

 View Only

LSF: 20 Years of GPU Support

By Bill McMillan posted Tue March 03, 2026 06:48 AM

  

It has been twenty years since NVIDIA introduced CUDA and opened the door to using GPUs as general purpose parallel processors. Long before GPUs became the backbone of AI, NVIDIA was already a major adopter of LSF (then Platform LSF) and remains one of the largest users today, relying on LSF to support the design and verification of GPUs and other silicon devices.


A Little History...

GPUs began life as fixed function hardware for rendering pixels and textures. Over the next two decades, they quietly evolved into massively parallel compute engines as researchers realized that graphics pipelines map naturally onto scientific, engineering, and machine learning workloads. That shift fundamentally changed how we design compute clusters. GPUs are no longer optional accelerators you bolt onto a node—they are first class resources that require scheduling, telemetry, isolation, and careful packing.

This history matters because it explains the two pressures every administrator faces today:

  • GPUs are powerful but scarce. A single modern accelerator may have tens or hundreds of gigabytes of memory and thousands of compute cores. These devices are expensive, and idle time is costly.
  • Workloads are wildly diverse. Tiny inference tasks, medium sized analytics jobs, and massive distributed training runs all want the same hardware, but with very different shapes, durations, and resource footprints.

Good GPU orchestration is about resolving those tensions—giving each workload what it needs while keeping the hardware busy.

Over the past twenty years, LSF’s GPU model has evolved from simple device counts to rich, multi dimensional resources. LSF blends scheduler logic, OS level enforcement, and vendor telemetry to deliver predictable behaviour in multi user environments.

ecosystem


GPU Discovery: Zero Touch Hardware Integration

Before we can use GPU’s, the scheduler must know they exist and what their capabilities are. Originally this required the Administrator to manually add this information to LSF. This was not only tedious it highly error prone, and often users were unable to use new GPU’s until the Administrator had time to do this work. Auto-discovery is even more important when using cloud resources where server instances can be dynamically created in response to workloads.

Today, GPU’s, like CPU’s, are auto discovered: the moment a host joins the cluster, its GPUs—vendor, model, driver, NUMA association, and MIG capability—become available for scheduling. This eliminates manual work, reduces misconfiguration risk, and shortens time to value for new hardware. 

In this demonstration cluster, we can see there are four hosts, and 13 GPU’s available:

$ lshosts -gpu
HOST_NAME   gpu_id       gpu_model   gpu_driver   gpu_factor      numa_id       vendor          mig
ma1gpu04         0         TeslaT4    590.48.01          7.5            0       Nvidia            N
                 1         TeslaT4    590.48.01          7.5            1       Nvidia            N
ma1gpu03         0       NVIDIAA40    590.48.01          8.6            1       Nvidia            N
ma1gpu05         0         TeslaT4    590.48.01          7.5            0       Nvidia            N
                 1         TeslaT4    590.48.01          7.5            1       Nvidia            N
ma1gpu06         0 NVIDIAH10080GBH    570.86.10          9.0            0       Nvidia            N
                 1 NVIDIAH10080GBH    570.86.10          9.0            0       Nvidia            N
                 2 NVIDIAH10080GBH    570.86.10          9.0            0       Nvidia            N
                 3 NVIDIAH10080GBH    570.86.10          9.0            0       Nvidia            N
                 4 NVIDIAH10080GBH    570.86.10          9.0            1       Nvidia            N
                 5 NVIDIAH10080GBH    570.86.10          9.0            1       Nvidia            N
                 6 NVIDIAH10080GBH    570.86.10          9.0            1       Nvidia            N
                 7 NVIDIAH10080GBH    570.86.10          9.0            1       Nvidia            N

Static discovery provides the foundation. The next step is dynamic awareness.


Telemetry: Turning GPUs into Measurable, Observable Resources

To schedule GPUs effectively, LSF integrates with NVIDIA DCGM and nvidia-smi to collect:

  • Utilization (SM + memory)
  • Temperature and power draw
  • Health status
  • ECC state
  • Free vs. reserved memory
  • Device mode (exclusive/shared)

Commands like lsload -gpu and lsload -gpuload expose real time conditions, helping the scheduler:

  • Avoid unhealthy or overheating GPUs
  • Place jobs based on actual free memory
  • Detect inefficient or runaway workloads
  • Improve packing and eliminate silent fragmentation

For example:

$ lsload -gpu
HOST_NAME   status ngpus gpu_shared_avg_mut gpu_shared_avg_ut ngpus_physical
ma1gpu04        ok     2                 0%                0%              2
ma1gpu03        ok     1                 0%                0%              1
ma1gpu05        ok     2                 0%                0%              2
ma1gpu06        ok     8                 0%                0%              8

$ lsload -gpuload
HOST_NAME gpuid   gpu_model gpu_mode gpu_temp gpu_ecc gpu_ut gpu_mut gpu_power gpu_mtotal gpu_mused gpu_pstate gpu_status gpu_error
ma1gpu04  0         TeslaT4        3      68C       0   100%     49%     68469        15G      600M          0         ok  -
          1         TeslaT4        3      55C       0   100%     61%     68438        15G      600M          0         ok  -
ma1gpu03  0       NVIDIAA40        3      84C       0   100%     62%    299058      44.9G      894M          0    warning  GPU_0:_Detected_Power_error
ma1gpu05  0         TeslaT4        3      71C       0   100%     47%     66775        15G      600M          0         ok  -
          1         TeslaT4        3      29C       0     0%      0%     13862        15G      450M          8         ok  -
ma1gpu06  0 NVIDIAH10080GBH        3      35C       0     0%      0%     76795      79.6G      469M          0         ok  -
          1 NVIDIAH10080GBH        3      31C       0     0%      0%     77539      79.6G      469M          0         ok  -
          2 NVIDIAH10080GBH        3      50C       0     0%      0%     79882      79.6G      469M          0         ok  -
          3 NVIDIAH10080GBH        3      33C       0     0%      0%     76005      79.6G      469M          0         ok  -
          4 NVIDIAH10080GBH        3      32C       0     0%      0%     76871      79.6G      469M          0         ok  -
          5 NVIDIAH10080GBH        3      28C       0     0%      0%     73853      79.6G      469M          0         ok  -
          6 NVIDIAH10080GBH        0      32C       0     0%      0%     75965      79.6G      469M          0         ok  -
          7 NVIDIAH10080GBH        0      30C       0     0%      0%     75103      79.6G      469M          0         ok  -

We can see that the majority of GPU’s are currently in mode 3 – which is exclusive use, but two H100 GPU's are in mode 0, which is shared use. We can also see that the A40 GPU has a hardware warning, but the card is still useable.   This also illustrates that a cluster does not need to be homogeneous - in addition to a mixing different cpu versions, you can mix different GPU's from multiple vendors and LSF will manage them appropriately. Likewise, a job does not need to be homogenous either and can request different GPU's based on the application requirements.

Telemetry turns scheduling from guesswork into data driven decision making with GPU metrics considered in fundamental scheduling concepts such as Fairshare and ordering.


Exclusive vs. Shared GPU Allocation: A Unified Model

Originally, GPU’s were treated as exclusive use devices – a job either used them, or it didn’t – they didn’t share the GPU with anything else. Over the years this has become more nuanced, with many ways of using them. Today LSF supports multiple ways to allocate GPUs depending on workload size, isolation needs, and throughput goals.


Exclusive Mode

Exclusive mode dedicates a full GPU—or MIG partition—to one job. This is ideal for:


MIG Integration (Static and Dynamic)

If a GPU supports NVIDIA Multi Instance GPU (MIG):

  • Static MIG scheduling: LSF places jobs onto existing MIG instances.
  • Dynamic MIG mode: LSF creates and destroys MIG partitions automatically to match incoming workload shapes

Dynamic MIG prevents the classic utilization traps where a GPU is technically “busy” but unusable because its existing partition layout is mismatched with demand. Users can request MIG instances be created based on the memory required, or the number of slices needed.

For organizations, MIG means predictability, isolation, and high multi tenant density without performance interference.


Shared Mode: Maximizing Throughput for Small or Bursty Jobs

Shared mode time slices the GPU across multiple jobs. This is valuable for:

  • Inference (e.g vLLM) (note: Running vLLM's with LSF will be the topic of a future blog)
  • Micro batch analytics
  • Lightweight ML experiments
  • GPU accelerated utilities that don’t saturate a device


MPS for Concurrent Execution

Beyond simple time slicing, LSF supports:

  • Single user MPS – higher concurrency for one user
  • Multi user MPS – controlled sharing between multiple users, with isolation

MPS allows multiple CUDA processes to run concurrently inside a shared GPU, improving utilization for workloads that under use compute resources.


Automatic Mode Switching

In many other schedulers, these configuration changes must be made by the Administrator, however with LSF, the users request the mode they need, and the scheduler automatically flips GPUs between exclusive, shared, MIG and MPS use without administrator intervention.  


Fractional GPU’s via Memory Reservations

Instead of limiting concurrency or relying solely on hardware partitioning, LSF allows memory based fractional GPUs. Using “gmem=” jobs reserve a slice of GPU memory. LSF ensures that:

  • The sum of all reservations never exceeds physical capacity
  • Fragmentation is minimized
  • Small jobs can pack tightly on large GPUs

To illustrate this, we’ll submit six jobs, each requesting 14 GB of GPU memory, we’ll also restrict the choice of nodes to just ma1gpu03 and ma1gpu04.

This approach provides many of the benefits of fractional GPUs—without requiring MIG. It also enables clever extensions. For example, an esub wrapper can translate fractional requests like “bsub –a ‘gpuf(0.1)’ a.out” into appropriate memory slices. (e.g. bsub -gpu "gmem=16GB" a.out).

To illustrate this, we’ll submit six jobs, each requesting 14 GB of GPU memory, we’ll also restrict the choice of nodes to just ma1gpu03 and ma1gpu04.

image

Five jobs start—three on a host with an ma1gpu03 A40, two on ma1gpu04 with two T4s. The sixth job remains pending because no GPU has enough free memory to satisfy its reservation. LSF’s bhosts -gpu output makes this visible by showing memory used (MUSED) and memory reserved (MRSV) per GPU.


Device Isolation: Enforcing Correct and Safe GPU Usage

CUDA’s CUDA_VISIBLE_DEVICES helps restrict jobs to assigned GPUs—but users can override environment variables. LSF therefore enforces GPU isolation using Linux cgroups to restrict access to GPU device nodes. This prevents:

  • Unauthorized GPU use
  • Cross job interference
  • Security issues in multi tenant environments

Cgroups form the backbone of safe, reliable GPU sharing. For enterprises, this eliminates noisy neighbour effects—a common cause of unpredictable SLAs.


Runtime GPU Usage Telemetry: Ensuring Jobs Use What They Ask For

Allocating GPUs is not enough. Organizations need to validate that jobs use GPUs efficiently. LSF records GPU utilization, memory footprint, and energy consumption at runtime.

With bjobs -gpu, administrators and users can inspect:

  • SM utilization
  • Memory bandwidth usage
  • Power and energy consumed
  • Peak and average memory use
  • Efficiency metrics

This data helps organizations:

  • Identify inefficient code
  • Tune models for higher throughput
  • Justify GPU budgets with real consumption metrics
  • Enforce chargeback or showback models

For shared-mode , MIG and MPS jobs, NVIDIA telemetry is more limited, but LSF still captures everything NVIDIA APIs expose.

Let’s look at an example – we’ll submit a job that requires 1 GPU in exclusive mode:

$ bsub -gpu “num=1:mode=exclusive:gmem=7168” gpujob 20 10 600 2048

We can use “bjobs -gpu” to see the detailed GPU telemetry collected which is persisted in the account record. We can clearly see what the user requested, and what GPU(s) were assigned to the job – and how well the job used the gpu’s. In this case, the job has run to completion so we also see a summary of its energy usage.

image

For jobs running in shared mode, the NVIDIA telemetry API’s do not provide utilization or energy use metrics.


Bringing It All Together

Maximizing GPU utilization is a balancing act between isolation and flexibility. LSF combines hardware level, scheduler level, and runtime level controls:

  • MIG provides hard isolation and performance guarantees.
  • gmem + MPS provide flexibility and high concurrency.
  • Cgroups enforce isolation and prevent interference.
  • DCGM telemetry enables intelligent placement and continuous optimization.
  • Automatic mode switching eliminates admin bottlenecks.

The result is an orchestration layer that keeps GPUs busy with the right workloads.

GPU's and Accelerators will continue evolve and LSF will continue evolving to ensure maximize utilization while protecting performance.

0 comments
66 views

Permalink