High Performance Computing

 View Only



LinkedIn Share on LinkedIn

LSF Fix Pack 14: better hybrid cloud and more

By Gábor Samu posted Thu July 06, 2023 05:10 PM

  

For over three decades, IBM Spectrum LSF has evolved to meet the HPC scheduling needs for the most demanding environments. Organizations are turning to hybrid cloud in response to an increasing demand for compute, and LSF including multicluster and resource connector are crucial for meeting these needs. IBM has released IBM Spectrum LSF 10.1 Fix Pack 14 to further extend LSF in supporting the operation and use of hybrid HPC cloud environments. Let’s take a closer look these capabilities:

Global Job ID Previously, when operating an LSF multicluster environment, each cluster had its own unique and independent job ID for submitted jobs. Fix Pack 14 will allow the configuration of a global job ID that is unique across all the clusters in a group. This will greatly simplify identifying jobs when forwarded from one LSF cluster to another. Furthermore, the common job ID will simplify performing consolidated accounting across clusters as there are no longer local and remote job IDs to consider. 

Global Policies Limits and fair share settings can now be shared or synchronized across multiple clusters, ensuring that resources are consumed in a consistent way.

Enhanced IBM Cloud support Performance enhancements to resource connector for IBM Cloud.

Automatic cgroup creation Using the LSF affinity[] resource requirement is no longer mandatory for the enforcement of control groups. When affinity[] is not specified at LSF job submission time, LSF will automatically append -R “affinity[core=N]” to the submitted job.

Rate limiter Protects against load on the LSF mbatchd daemon with automatic throttling of queries, submissions, and other request and the ability to block specific users or hosts running excessive queries. Decreasing unintended load on the LSF scheduler can help to improve scheduling performance.

Consumable resource enhancement The LSF resource map now supports assigning non-numeric values to a defined resources. Take, as an example, a use case where a server has four FPGA devices: device0, device1, device2, device3. It’s now possible to assign device0, device1, device2, device3 to the defined FPGA resource. When users are submitting a job, they can request a specific device unit number, rather than simply specifying the number of devices needed.

Updated systemd support The LSF systemd configuration will now automatically restart the LSF lim and sbatchd daemons upon any failure.

You can find more detail about these and other capabilities in the release notes for IBM Spectrum LSF 10.1 Fix Pack 14 here.

2 comments
23 views

Permalink

Comments

Wed July 26, 2023 02:44 PM

Learn more about LSF 10.1 Fix Pack 14 from scheduled Webinar Sessions (Aug 10/Aug 15, 2023).

Mon July 10, 2023 01:41 PM

Looking forward to testing and implementing this Gabor!