Introduction
In this blog, we discuss rate limiting of REST APIs and explore pros and cons of some of the options that we cover. There are many ways to go about implementing a rate limit on your REST APIs - from using inbuilt solutions in API Gateways, all the way to developing a custom bespoke solution unique to your situation. There is no single best solution - it depends on many factors. Our focus mainly is to address scenarios where we may not have a readily available OOTB solution.
The idea behind rate limiting is simple - block requests if they are made more frequently than a certain limit. In a cloud native deployment, an API request traverses many hops before reaching the destination service. As such, rate limiting can be implemented at any of those different hops. Some of the options are:
- API Gateway. This is the best place to implement rate limiting if you're using an API Gateway provided by the cloud or you've built your own custom API Gateway using Spring Cloud Gateway, for example.
- K8s ingress. If you are operating your own K8s cluster, an ingress may be a suitable place to intercept and rate limit a request.
- Sidecar container. In case the API request is directly handled by a container running your service, rate limiting can be implemented in a sidecar container running along with the service.
- In-band rate-limiting. The above options are so-called out of the band rate limiting options where the rate limiting is handled outside of the application running business logic. In situations, where there is no central point where your unique requirements of rate limit can be met, you can code the rate limiting logic within the service itself. This may be the most flexible option, but carries the computational burden associated with rate limiting, however little that might be.
The options 1 and 2 are well covered in their respective documentation included in the references. In this blog, we'll start with an in-band solution. We'll then use a sidecar container using Envoy proxy that will address some of the limitations of an in-band solution. Finally, we'll deploy our solution on IBM Code Engine.
Rate Limiting with Redisson
Rate limiting is the process of enforcing restrictions on how often clients can make requests, controlling the frequency at which they are allowed to send them. Irrespective of how rate limiting is implemented, most solutions will have following ingredients:
- Create a bucket with specified limit
- Use the bucket to decide to allow or rate limit a request
The first step is to create a bucket with a key that identifies a set of attributes of the request on which we want to enforce rate limit. Then we set the request limits. For example, for a url - https://insights.ibm.com/gui/11eec8fb-6ad8-1be8-8cc9-13d8f21beecb
- where the UUID refers to a tenant id of IBM Storage Insights, we might want to create a bucket with tenant UUID as the key and a limit of say 1000 requests allowed in a minute for any tenant.
The next step is to actually ensure that limit in an efficient manner when a request is received by the application. The most common algorithm for enforcing a rate limit is known as the token bucket. One simple way to implement it is to use Redis:
- Initialize a key that will expire after the desired time window and with desired rate limit. For example, key = tenant UUID, time window = 1 min, and limit = 1000.
- Every time a request is received for the key, check if the limit is not zero. If it is available, allow the request and decrement the limit. Else, return HTTP response status as
429 Too Many Requests
.
- Expire the key after 1 min.
- Repeat 1-3 for the next minute.
To use Redis, we can use one of the Redis client libraries, for example, Redisson, that will simply the implementation of above steps. Below, we illustrate the above steps using Redisson client in Java.
Add Redisson as a dependency in a Maven project pom.xml
file:
<dependency>
<groupId>org.redisson</groupId>
<artifactId>redisson</artifactId>
<version>[3.12,4.0)</version>
</dependency>
Add a filter that will intercept the HTTP request before it is serviced by the Controllers. Here we illustrate how to write a filter:
- Initialize the Filter in
init()
method. In this step, get access to the Redisson client. Also, read the rate limit values set from the configuration.
- Implement the rate limit logic in Filter's
doFilter()
method. Extract the key from the HTTP request. Check if there is a rate limiter is available in the internal cache for the key. If yes, use it to evaluate rate limit, else initialize a new Redisson RRateLimiter
object for the key. Try to consume a token from the RateLimiter. If it allows, the allow the request to go ahead, else return HTTP 429
status code along with appropriate headers.
package none.rks.myapp;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.redisson.api.RRateLimiter;
import org.redisson.api.RateIntervalUnit;
import org.redisson.api.RateType;
import org.redisson.api.RedissonClient;
public class RateLimitingFilter implements Filter {
private static RedissonClient redissonClient;
private static final Map<String, RRateLimiter> RRateLimiterCache = new HashMap<>();
private int rateLimitMinute = 1000; // read it from a configuration
/**
* Initialize the Filter
*/
@Override
public void init(FilterConfig filterConfig) throws ServletException {
// Initialize RedissonClient
redissonClient = getRedissonClient();
// get the rate limit values from config
rateLimitMinute = getRateLimitConfig();
}
/**
* Process the HTTP Request
*/
@Override
public void doFilter(ServletRequest pReq, ServletResponse pResp, FilterChain pChain)
throws IOException, ServletException {
HttpServletRequest mReq = (HttpServletRequest) pReq;
// Extract key from the HTTP Request
String appKey = getTenantUUID(mReq);
// Check if Redisson RateLimiter is present in Redis for the key
RRateLimiter rateLimiter;
if (RRateLimiterCache.containsKey(appKey)) {
rateLimiter = RRateLimiterCache.get(appKey);
} else {
// Create a new Redisson RateLimiter for the key
rateLimiter = redissonClient.getRateLimiter(appKey);
rateLimiter.trySetRate(RateType.PER_CLIENT, rateLimitMinute, 1, RateIntervalUnit.MINUTES);
RRateLimiterCache.put(appKey, rateLimiter);
}
// Rate limit the request
HttpServletResponse httpResponse = (HttpServletResponse) pResp;
if (rateLimiter.tryAcquire(1)) {
// the limit is not exceeded
httpResponse.setHeader("X-Rate-Limit-Remaining", "" + rateLimiter.availablePermits());
pChain.doFilter(pReq, pResp);
} else {
// limit is exceeded
httpResponse.setStatus(429);
httpResponse.setHeader("X-Rate-Limit-Retry-After-Seconds",
"" + TimeUnit.NANOSECONDS.toSeconds(rateLimiter.remainTimeToLive()));
httpResponse.setContentType(APPLICATION_JSON);
String mContent = "{\"metadata\":{\"rc\":\"error\",\"message\":\"" + "Too many requests" + "\"}}";
httpResponse.getWriter().append(mContent);
}
}
}
Finally, insert the RateLimitingFilter class as a filter into your application's web.xml
, so that this filter is invoked before any other filter or listener.
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
version="3.1">
<display-name>My application</display-name>
<filter>
<filter-name>RateLimitingFilter</filter-name>
<filter-class>none.rks.myapp.RateLimitingFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>RateLimitingFilter</filter-name>
<url-pattern>/myapp/v1/*</url-pattern>
</filter-mapping>
.....
</web-app>
As the rate limiting logic exists in the code maintained by the developers, this approach provides high flexibility in terms to selecting keys and how to handle requests. This approach has few drawbacks though:
- Rate limit configuration change will require service restart
- Rate limit logic is implemented in the application itself
- Redis is an external dependency
Let's try to address some of the drawbacks by externalizing the rate limiting logic in a sidecar container.
Rate Limiting with Envoy Proxy
Envoy proxy is an open-source, high-performance proxy designed for cloud-native applications. It is commonly used as a sidecar container within a microservices architecture, particularly in service meshes like Istio, Consul, and AWS App Mesh. Envoy provides advanced traffic management capabilities, including load balancing, service discovery, traffic routing, and observability, which makes it a key component in modern distributed systems.
Envoy proxy also implements the token bucket algorithm, and the configuration sections that define a token bucket are fairly straightforward.
The key that is used to rate limit a request is uniquely identified by "descriptors". Following configuration declares a descriptor that consists of a combination of tenant UUID extracted from the path and the HTTP method. The IBM Storage Insights REST API are of the form - “/restapi/v1/tenants/3ac0c8cb-69f0-4b3e-a5e1-790beaef4cc0/<entity>”
. To use the tenant UUID as the key, we declare descriptors to be used as key as follows.
rate_limits:
- actions:
- request_headers:
descriptor_key: tenant_uuid # Key
header_name: "x-tenant-uuid" # tenant UUID extracted from the path
- request_headers:
descriptor_key: method # Key
header_name: ":method" # Value from pseudo-header ":method", e.g. GET, PUT, POST, DELETE.
The Lua script to extract tenant UUID from the path.
envoy.filters.http.lua:
"@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.LuaPerRoute
source_code:
inline_string: |
function envoy_on_request(request_handle)
-- Extract UUID from the path
local path = request_handle:headers():get(":path")
local uuid = string.match(path, "/restapi/v1/tenants/([a-f0-9-]{36})")
if uuid then
-- Add UUID as a custom header
request_handle:headers():add("x-tenant-uuid", uuid)
end
end
The rate limits can be defined as shown below. In a time window of 5s, 3 API calls are allowed as per this configuration.
In this example, an HTTP GET request to “/restapi/v1/tenants/3ac0c8cb-69f0-4b3e-a5e1-790beaef4cc0/volumes”
would result in the descriptor map for the request having a value of {"tenant_uuid": "3ac0c8cb-69f0-4b3e-a5e1-790beaef4cc0", "method": "GET"}
.
envoy.filters.http.local_ratelimit:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 3
tokens_per_fill: 3
fill_interval: 5s
The complete Envoy Proxy configuration file is available on Github.
Install Envoy on Mac and run it to check if the configuration file is valid.
% brew install envoy
% envoy -c envoy.yaml
...
[2024-11-20 14:45:18.818][19546225][info][main] [source/server/server.cc:978] starting main dispatch loop
[2024-11-20 14:45:18.819][19546225][info][runtime] [source/common/runtime/runtime_impl.cc:625] RTDS has finished initialization
[2024-11-20 14:45:18.819][19546225][info][upstream] [source/common/upstream/cluster_manager_impl.cc:249] cm init: all clusters initialized
[2024-11-20 14:45:18.819][19546225][info][main] [source/server/server.cc:958] all clusters initialized. initializing init manager
[2024-11-20 14:45:18.819][19546225][info][config] [source/common/listener_manager/listener_manager_impl.cc:930] all dependencies initialized. starting workers
Start the simple Go application available on Github and send some requests to check if Envoy is correctly intercepting the request and applying rate limits configured.
% go run main.go
Starting server on :8080
From another terminal, send requests in a loop.
for i in {1..20}; do
curl http://localhost:10000/restapi/v1/tenants/1e1c8bc9-31da-4393-9b08-e933da47f4a3
echo
done
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
local_rate_limited
local_rate_limited
Notice that requests are rate limited after 3 requests.
At this point, we have both the applications - the main Go application, and the Envoy Proxy ready to be run in a Pod as containers. Let's containerize the Go application and push it to a Container registry.
% podman build -t quay.io/singh_randhir/go-service:1.0 .
% podman push quay.io/singh_randhir/go-service:1.0
Create a K8s deployment with Envoy Proxy as a sidecar as shown below in deployment.yaml
.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: go-service
template:
metadata:
labels:
app: go-service
spec:
containers:
- name: go-service
image: quay.io/singh_randhir/go-service:1.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
- name: envoy
image: envoyproxy/envoy:v1.32-latest
ports:
- containerPort: 10000
volumeMounts:
- name: envoy-config
mountPath: /etc/envoy/envoy.yaml
subPath: envoy.yaml
imagePullSecrets:
- name: quay-registry-secret
volumes:
- name: envoy-config
configMap:
name: envoy-config
Create the deployment and expose it as a service with the following service.yaml.
apiVersion: v1
kind: Service
metadata:
name: go-service
spec:
selector:
app: go-service
ports:
- name: http-port
protocol: TCP
port: 10000
targetPort: 10000
Once the Pod is ready, test to check if Envoy Proxy is applying the rate limits.
# kubectl exec -i -t dnsutils -- bash -c 'for i in {1..20}; do curl 10.103.224.104:10000/restapi/v1/tenants/1e1c8bc9-31da-4393-9b08-e933da47f4a3; echo; done'
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
local_rate_limited
local_rate_limited
local_rate_limited
We have used Envoy Proxy as a sidecar to implement rate limits on the main application, both running in a Pod on K8s.
Next, let's see how to implement rate limit on a containerized application running on a serverless platform like IBM Code Engine.
Rate Limiting on IBM Code Engine
IBM Code Engine is a fully managed, serverless platform designed to simplify the deployment, scaling, and management of applications and workloads in the cloud. To run an application on Code Engine, we just need a container image to create an application. Once the application is created, Code Engine will handle the deployment, scaling, and management of our app.
However, since IBM Code Engine does not natively support sidecars in the same way Kubernetes does, Envoy would need to operate as a reverse proxy in a separate application, and we would direct traffic from Envoy to the Go service using external routing (e.g., external load balancer or public URL).
Deploy main application
We've created our Go service and pushed the image to quay.io
container registry. For Code Engine to be able to pull the image, we need to create a registry secret for authentication with quay.io
.
Create a registry secret on IBM Code Engine using CLI.
% ibmcloud ce secret create --name quay-io-secret --username <username> --password <password> --server quay.io --format registry
Creating registry secret 'quay-io-secret'...
OK
Create the main application.
% ibmcloud ce application create --name go-service --image quay.io/singh_randhir/go-service:1.0 --registry-secret quay-io-secret --port 8080
Creating application 'go-service'...
Configuration 'go-service' is waiting for a Revision to become ready.
Ingress has not yet been reconciled.
Waiting for load balancer to be ready.
Run 'ibmcloud ce application get -n go-service' to check the application status.
OK
https://go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud
Verify that the main application handles the request sent directly to it.
% curl https://go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud/restapi/v1/tenants/1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
At this point, we've the main application running on IBM Code Engine. Let's apply some rate limiting on it by using Envoy Proxy next.
Deploy Envoy Proxy
As we've to run Envoy Proxy as a separate application on Code Engine, let's create a container image with a configuration file baked into the image. We also need to configure the cluster to which Envoy Proxy will forward requests.
Update the envoy.yaml
configuration with the URL of the main application.
clusters:
- name: service_app_1
type: LOGICAL_DNS
connect_timeout: 500s
dns_lookup_family: V4_ONLY
load_assignment:
cluster_name: service_app_1
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud
port_value: 443
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
sni: go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud
Here we have updated the envoy.yaml
with the Code Engine URL of the main application deployed previously. However, it can be passed as an environment variable.
Next, let's containerize the Envoy Proxy so that we can pass envoy.yaml
as the configuration for it to run.
FROM envoyproxy/envoy:v1.32-latest
COPY envoy-ce.yaml /etc/envoy/envoy.yaml
EXPOSE 10000
ENV GO_SERVICE_URL=https://go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud
CMD ["/usr/local/bin/envoy", "-c", "/etc/envoy/envoy.yaml"]
Create a container image and push it to registry.
% podman build --platform linux/amd64 -t quay.io/singh_randhir/envoy-proxy:latest . -f Dockerfile-envoy-ce
% podman push quay.io/singh_randhir/envoy-proxy:latest
Now, create Code Engine application using the image.
% ibmcloud ce application create --name envoy-proxy --image quay.io/singh_randhir/envoy-proxy:latest --registry-secret quay-io-secret --env GO_SERVICE_URL=go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud --port 10000 --min-scale 1 --max-scale 1
Creating application 'envoy-proxy'...
Configuration 'envoy-proxy' is waiting for a Revision to become ready.
Ingress has not yet been reconciled.
Waiting for load balancer to be ready.
Run 'ibmcloud ce application get -n envoy-proxy' to check the application status.
OK
https://envoy-proxy.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud
Here we're setting both the minimum and maximum number of instances to 1, ensuring that only 1 instance of Envoy Proxy is running at all times. This is the so-called Local Rate Limiting mode. If you want to deploy Envoy Proxy in a cluster where all the instances share the configuration, consider deploying them in Global Rate Limiting mode. We'll cover Local Rate Limiting only in this blog.
Now that both the applications are deployed on Code Engine, let's send some requests to the Envoy Proxy application to see if it rate limits the requests.
% for i in {1..20}; do
curl https://envoy-proxy.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud/restapi/v1/tenants/1234
echo
done
Hello 1234
Hello 1234
Hello 1234
Hello 1234
Hello 1234
Hello 1234
local_rate_limited
Hello 1234
local_rate_limited
Notice that the Envoy Proxy is applying rate limits configured to the main application.
Summary
Rate limiting is an important consideration when deploying an application with public APIs. We considered several options to apply rate limiting in different scenarios. There are OOTB solutions available for some platforms. However, there are situations where we need to develop bespoke rate limiting solutions. We started by created a rate limiting solution in the main application itself using Redisson. Next, we externalized the rate limit to an Envoy Proxy which we deployed on K8s as a sidecar container to the main application container inside a Pod. Finally, on a serverless platform like IBM Code Engine, we deployed Envoy Proxy as a reverse proxy to rate limit API requests sent to the main application also deployed on Code Engine.
Acknowledgements
Thank you to the IBM Storage Insights team who supported and shaped the ideas on rate limiting REST APIs.
References
- GitHub - https://github.ibm.com/Singh-Randhir/rate-limiting-envoy
- Envoy Proxy rate limit - https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/rate_limit_filter
- IBM Code Engine - https://cloud.ibm.com/containers/serverless/overview