IBM Spectrum LSF is constantly evolving to meet the needs of HPC environments as they grow in scale and complexity. IBM Spectrum LSF 10.1 Fix Pack 13 is now available. It includes new features in hybrid cloud, containerization, ease of use and more.
Let’s walk through how these features can help enhance your current cluster.
Resource connector enhancements
Spot instance users rejoice! LSF Fix Pack 13 can now automatically select the lowest cost spot instance priced template that runs your workload. This highly sought feature can be configured to choose the most cost-effective spot instance strategy: use an even spot pool distribution strategy, or by default, use the spot pools with optimal capacity. This also means no more waiting for a specific spot instance price to drop below the threshold. LSF automatically tries the next available spot instance template that is below your threshold price. With this enhancement, you can save money and time with increased work throughput and price optimization.
To leverage even more savings, you can now create optimization rules to have LSF smartly upgrade your cluster’s demand allocation. The rules define when LSF tries to optimize the demand allocation in cases where it would be beneficial for a virtual machine (VM) to be allocated over multiple other VMs. Giving you configuration flexibility to be optimized for cost, reliability, or resources.
In Fix Pack 13, LSF now supports the use of Podman 3.3.1. Podman integration allows LSF to run jobs in Podman OCI containers on demand. Your containerized cluster can now use Podman 3.3.1.
Fix Pack 13 provides support for Apptainer containerized workloads. The CONTAINER parameter in LSF application profiles and batch queues now supports the apptainer keyword and continues to support the singularity keyword for Singularity containerized workloads.
The tmp directory can now be configured to be mounted or not mounted to containers of Docker jobs. By default, the tmp directory is mounted.
Job scheduling and execution
GPU resizable jobs are here! Introducing new environment variables for GPU resource allocation for resizable jobs: LSB_RESIZE_GPU and LSB_RESIZE_TIME are added to allow GPU resizing when you use resizable jobs. Additionally, new allocated GPU information is now displayed in JOB_RESIZE_NOTIFY_START record for the lsb.events file.
The bkill command is now extended to include the -d option. The bkill -d command kills jobs in running (RUN), user suspended (USUSP), or system suspended (SSUSP) state, and marks them as DONE. Enjoy the flexibility to no longer need to manually mark jobs as DONE.
Gone are the days of killing each job in RUN state, one by one. Starting in LSF Fix Pack 13, you can kill all jobs of a certain state. The bkill -stat command takes an argument of RUN, PEND, or SUSP and kills all applicable jobs with the specified job state.
Previously, only job slots can be considered for fairshare scheduling. In Fix Pack 13, fairshare can be configured to evaluate, based on the number of jobs.
Job groups can now be deleted based of idle times. The bgdel -d command can now take an argument to specify which job groups to delete, given the idle time. No more job groups with long idle times!
Ran a job and realizing that you need more memory and swap than the cgroup (control groups) limit configured? Now you can modify a running job’s cgroup memory and swap limits. You no longer need to resubmit a new job with new limits; simply run the bmod -M or bmod -v command to change these limits.
We’re pleased to announce LSF Fix Pack 13 can now handle global resources and global limits. All MultiCluster connected clusters can define resources and limits to be shared across these clusters. What’s best is that you don’t need to change how you submit your jobs requirements: you simply configure your clusters to use global resources and limits. Global resources can even be configured for complete resources or even distribution policies!
You can now prevent the bwait command from running within a job. If you previously encountered low slot usage from blocking slots with bwait, you can now configure your cluster by setting the new LSB_BWAIT_IN_JOBS = N in the lsf.conf file.
Command output formatting
The bqueues -o command is now extended to include many more fields. More fields mean less time to determine how to process the human readable output, allowing more time spent automating where it matters. This feature extends to the JSON output as well.
Welcome blimits to the -o family. The blimits command in Fix Pack 13 is now able to output specific fields and values, such as bqueues -o and bjobs -o.
More information is always useful. The CPU and memory usage details is added to bjobs, bhist, and bacct commands. These commands can now show CPU efficiency, CPU peak usage, and memory efficiency values.
The hardware locality (hwloc) library is updated to 2.6 with Fix Pack 13. This provides topology support for the latest server platforms.
If you prefer a host for dispatching jobs in a host group, you can accomplish this by specifying the preference in lsb.hosts file. LSF Fix Pack 13 can be configured to accept host preferences in the host group.
Existing commands that support using a hostname as a filter now support host groups! The battr, bresume, brsvs, lshosts, and lsload commands are now all extended to allow filtering based on host groups.
MultiCluster jobs can now specify the number of jobs or tasks that can be configured at the receive-job queue.
LSF resource connector and LSF operator plug-ins are now open source! You can contribute and modify the plug-ins to fit your cloud compute and containerization needs. Just follow the links at https://github.com/IBMSpectrumComputing/cloud-provider-plugins and https://github.com/IBMSpectrumComputing/lsf-operator for the cloud provider plug-ins and LSF operator, respectively.
Lastly in Fix Pack 13, LSF supports more computer architectures. Specifically, RHEL 8.5, RHEL 8.6, and RHEL 9 on x64 and POWER; and IBM AIX 7.x and Linux on IBM POWER 10. On top of all these options, we now support LSF Data Manager, LSF License Scheduler, and LSF resource connector features on Linux on ARM (aarch64) systems. Opening up more options to expand your clusters!
Give it a try!
Try these new and exciting features today! Download IBM Spectrum LSF 10.1 on Passport Advantage and then apply Fix Pack 13 from IBM Fix Central. To learn details about what’s included, refer to the Fix Pack 13 release notes: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=wn-whats-new-in-lsf-101-fix-pack-13
Talk to us
Log in to this community page to comment on this blog post. We look forward to hearing from you on the new features, and what you would like to see in future releases. #SpectrumComputingGroup