High Performance Computing

 View Only

Going FaaSter on my LSF Cluster

By Bill McMillan posted Tue September 08, 2020 11:14 AM

  

Function as a service (FaaS) is popular model for cloud computation.  You can develop your application (typically in Python) in your local environment, and when the application needs more compute power, just offload those functions to the cloud and don’t worry about the underlying infrastructure.   No need to set up a cluster, a scheduler, or anything else. It’s all done automagically by the cloud provider. How the provider accomplishes this varies from provider to provider, but as a user, you don’t need to care – you only pay for the computation.

 

But what do you do if you already have an on-premise compute grid, cluster, private cloud or whatever you want to call it, and you have users wanting to develop FaaS-like applications?

 

On the face of it, a simple approach would be to have these users use the cloud, and those running traditional batch work use the on-premise cluster. 

 

There are two potential issues with this approach:

  • If both sets of users need the same data, you end up with a complex data synchronization problem and incur additional egress costs.
  • If the on-premise environment has compute capacity, why are you paying for additional resources?


But what if you could support FaaS within the existing LSF environment?   We do have an existing Python API for IBM Spectrum LSF, and you can use that API from any Python program or from a Jupyter notebook. However, that API interacts with the existing cluster, so the user is aware of all the cluster concepts, so not FaaS.

  

To support a FaaS-like experience we have created an additional Python API which allows FaaS and traditional workloads to be mixed in the same cluster.  The LSF FaaS API connects to LSF’s Application Center using it’s RESTful API, and each function call transparently creates an underlying LSF job to asynchronously execute the function.  The user does not need to be aware of LSF or any LSF concepts.   This package is available on github.

 

To use this interface with iPython, the flow is as follows:

  

  1. Install the package in your iPython environment:

$ git clone https://github.com/IBMSpectrumComputing/lsf-faas

$ cd lsf-faas/src/

$ scripts/setup.sh -c src/lsf_faas

 

  1. Authenticate your session against the LSF RestAPI:

$ ipython

In [1]: lsf.logon(username="your-name", password="your-password", host="your-rest-server")

Out[1]: True

 

  1. Define your local python functions:

In [2]: import pandas

   ...: import numpy

   ...: from sklearn import linear_model

 

In [3]: def regression(file):

    ...:     df = pandas.read_csv(file)

    ...:     cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]

    ...:

    ...:     msk = numpy.random.rand(len(df)) < 0.8

    ...:     train = cdf[msk]

    ...:

    ...:     regr = linear_model.LinearRegression()

    ...:     train_x = numpy.asanyarray(train[['ENGINESIZE']])

    ...:     train_y = numpy.asanyarray(train[['CO2EMISSIONS']])

    ...:     regr.fit (train_x, train_y)

    ...:

...:     return regr

 

  1. Submit your function to the LSF-FaaS service for asynchronous execution. The API will automatically transfer the specified data files to the execution environment.

 

In [4]: id = lsf.sub(regression, './FuelConsumption.csv', files="./FuelConsumption.csv")

 

  1. Retrieve your results:

In [5]: result = lsf.get(id)

 

This simple API allows you to add FaaS workloads to your existing LSF environment. We’d love to hear your feedback.

 

I’d like to thank Yong Wang & Xun Pan from our Xi’An development lab for creating this interface.

​​
#LSF
#SpectrumComputingGroup
#Spectrum-LSF
0 comments
16 views

Permalink