There is no escaping the fact that COVID-19 has changed how we do business. Some industries have been hit harder than others, while some have seen the demand for compute explode, particularly in health care and life sciences.
With the challenges in procurement, the supply change and facilities, many organizations have turned to the cloud to meet these needs. Intersect360’s 2020 market forecast indicates a significant rise in cloud usage for HPC, while on premise deployments are depressed. However, it is not all cloudy, there is a ray of sunshine with the long term CAGR for HPC being largely unaffected.
To get onto the cloud quickly, you ideally want to be able to take your existing workloads, workflows and scripts and be able to run them “as-is” – lift and shift if you will. IBM Spectrum LSF has a wide range of capabilities to help you accomplish this:
- LSF MultiCluster forwards work from the on-premise cluster to the cloud cluster based on pre-defined policies – pick the right workload to forward and when.
- LSF Resource Connector autoscales the Cloud cluster based on workload policies. The Resource Connector allow the cluster to rapidly scale from 0 to thousands of instances and back down again, based policy. As it is workload policy aware, it considers more factors than just how many jobs are waiting such as workload priorities and instance time to live. Clients have reported significant cost savings with the Resource Connector compared to scaling on just “waiting jobs”. It also provides a consistent interface to the underlying cloud specific API’s to manage instance creation and tear down.
- LSF DataManager provides intelligent pre-staging of data to the cloud. Of course, if your data is already there, great, problem solved. But if it is mainly on premise, you want to ensure that any data that is required is there before you instantiate cloud resources and incur the costs.
No, not that simple.
We need a Plumber
There is a lot of plumbing involved in setting up a hybrid environment. How to set up basics like VPN/VPC between your on prem environment and the cloud differs for each provider. How will user authentication be performed? Will you have a single namespace covering both environments, or do you need to set up user mapping? There is a fair bit of plumbing that has to be done before you can burst to the cloud, and it does vary from provider to provider. Luckily, for LSF, there is some guidance from IBM Cloud, AWS and Azure. While some clients do this themselves, others do enlist the help of professional plumbers services.
What is my Cloud composed of?
If you are talking about the weather, there are basically four types of clouds: cumulus, cirrus, stratus and nimbus. Though many would simplify this down to rain cloud and not rain cloud!
But when it comes to choosing which instances to use it is a little more challenging. Each cloud provider has a bewildering plethora of different cloud instance types with different costs and capabilities. In HPC, we are generally interested in “compute optimized” instance types, so that does narrow it down – a little. But if your workload is memory hungry, a memory optimized instance may be better.
But do we want to size an instance for each job, or have multiple jobs share an instance? In a previous blog I talked about the LSF Simulator and how it can be used for what-if analysis. We can leverage the Simulator to examine how a given workload will run on different instance mixes and quantities, allowing the Administrator to make an informed decision on which instance types should be used.
There is no denying that getting HPC workloads onto the Cloud is more involved than moving a web server, but the challenges are not insurmountable. LSF provides many capabilities to help you get onto the cloud, and to boldly go where others have gone before! (Sorry, it is Star Trek day today!)