IBM Verify

Join this online user group to communicate across Security product users and IBM experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

Running untrusted code in SaaS

By Vivek Shankar posted Wed July 20, 2022 08:53 PM

Consider this:

You have a SaaS service that needs to offer several customization options that requires some type of scripting
The "code" runs on your service
The "code" can make API calls out of the boundary of your system
You offer a free trial that allows anyone in the world to use and potentially exploit this capability
You use shared compute resources to drive down cost

This is a performance and security nightmare for a multi-tenanted environment. I grappled with this few years back with IBM Security Verify SaaS (then IBM Cloud Identity). As with any other problem, the first thing I did was to take a peek at online search results and came across this great article on the challenges of running user-supplied (i.e. untrusted) JavaScript. This article does a great job of listing the challenges, so I won't repeat them here.

Can we use JavaScript?

JavaScript and Node JS is probably the most used language now, but it is extremely difficult to sandbox and jail code from:

Accessing the process space and messing with it
Peeking into other processes and hijacking compute resources
Executing tight loops and I/O exploits
Printing out system credentials

And so on...

But couldn't we isolate using containers? Yes, this is possible and is the approach used with popular compute engines like AWS Lambda, IBM Cloud Functions and others. The diagram below illustrates one way to implement this.

This requires the following:

A massive compute farm with pre-warmed containers, to avoid paying the cost of initializing the container process (usually nodejs)
A method to assign a container to a tenant based on the request, to isolate execution of scripts
A method to remove the assignment if the container has been idle for a certain period, to recover resources
Ensure the container runs rootless and there is nothing in the environment variables that might resemble a credential or anything sensitive
Lock down the HTTP and other targets you can call out to
Keep up with fixes for the latest vulnerabilities, particularly the ones that allow unprivileged processes to break out of the container boundary

How did I solve this problem? Introducing CEL

Google's common expression language (CEL) allows single-line scripting by implementing expression syntax checking and evaluation. This was originally designed to represent policy rules in Firebase and other Google products, but we were able to re-use this extensible expression language for the purpose of executing user-supplied code in a variety of use cases. Here are a few:

Computing attribute claim values for an OAuth authorization grant
Building an account payload that is provisioned into an external system
Author access control policies using scripted rules
Transform request and response payloads for web hook API calls

This language worked very well from a performance and security perspective. As the introduction to the language says:

The Common Expression Language (CEL) is a non-Turing complete language designed for simplicity, speed, safety, and portability.

However, while we did add several utilities, data types and functions to supplement the default syntax, this became cumbersome very quickly for more complex expressions. Here is a relatively simple example:


user.getManager() != null ? (user.getManager().firstName + " " + user.getManager().lastName) : ""

Introducing multi-line expressions - CEL++

Rather than invent a completely new language, we chose to represent multi-line statements in a YAML document. Here is an example:


statements: 
- context: "manager := user.getManager()" 
- if: 
    match: context.manager != null 
    block: 
      - return: context.manager.firstName + " " + context.manager.lastName 
- return: ""

Each YAML property value is a CEL expression. This extension offers several capabilities:

Variable assignment
Conditional checks and nested conditionals
Statement blocks and local variables
Return statements to exit the evaluation

The full syntax document can be accessed here.

Advantages to this approach

Compute resources are shared and doesn't require any complicated tenant isolation process
CEL provides support for most operators and functions that you would expect in an expression language
The expression is guaranteed to finish executing
Using CEL in a YAML-based language simplifies development, while maintaining the advantages

The architecture looks a lot simpler, as you can see here.

Things to consider with CEL

While this article is focused on how Verify used CEL, the following can be applied to any Expression Language.

CEL is extensible, so you might be tempted to add a number of utilities and functions. Be wary of what you add because you might end up introducing Turing-complete characteristics that is guaranteed to complete without any consideration for resource consumption.
Limit the number of instructions allowed and/or ensure that the expression can be terminated when it exceeds a maximum computation time. We use Go in IBM Security Verify for this engine and leverage the Go context timeout to terminate the program.
If you implement utilities like HTTP clients, limit the memory consumption and make liberal use of timeouts.
Build layers and controls - rate limits, quarantine measures for badly behaving scripts, size of the expression, etc.

Conclusion

Running untrusted code in a multi-tenanted SaaS environment is hard. While a compute farm can be built to execute common languages, such as JavaScript, Python, Groovy and others, it comes with significant challenges around runtime compute resource utilization, tenant allocation, managing the blast radius in the event of exploits, to name a few. IBM Security Verify extends the Google Common Expression Language to solve this problem along with several layers of controls on top of the computation environment.

0 comments

29 views

Permalink

https://community.ibm.com/community/user/blogs/vivek-shankar1/2022/07/20/running-untrusted-code-in-saas

IBM Verify

IBM Verify

Running untrusted code in SaaS

By Vivek Shankar posted Wed July 20, 2022 08:53 PM

Can we use JavaScript?

How did I solve this problem? Introducing CEL

Introducing multi-line expressions - CEL++

Advantages to this approach

Things to consider with CEL

Conclusion

Permalink

Additional
Resources

Office

Quick Links

IBM Verify

IBM Verify

Running untrusted code in SaaS

By Vivek Shankar posted Wed July 20, 2022 08:53 PM

Can we use JavaScript?

How did I solve this problem? Introducing CEL

Introducing multi-line expressions - CEL++

Advantages to this approach

Things to consider with CEL

Conclusion

Permalink

Additional Resources

Office

Quick Links

Additional
Resources