Global Storage Forum

 View Only

Rate limiting with Envoy Proxy

By Randhir Singh posted 2 days ago

  

Introduction

In this blog, we discuss rate limiting of REST APIs and explore pros and cons of some of the options that we cover. There are many ways to go about implementing a rate limit on your REST APIs - from using inbuilt solutions in API Gateways, all the way to developing a custom bespoke solution unique to your situation. There is no single best solution - it depends on many factors. Our focus mainly is  to address scenarios where we may not have a readily available OOTB solution.

The idea behind rate limiting is simple - block requests if they are made more frequently than a certain limit. In a cloud native deployment, an API request traverses many hops before reaching the destination service. As such, rate limiting can be implemented at any of those different hops. Some of the options are:

  1. API Gateway. This is the best place to implement rate limiting if you're using an API Gateway provided by the cloud or you've built your own custom API Gateway using Spring Cloud Gateway, for example.
  2. K8s ingress. If you are operating your own K8s cluster, an ingress may be a suitable place to intercept and rate limit a request.
  3. Sidecar container. In case the API request is directly handled by a container running your service, rate limiting can be implemented in a sidecar container running along with the service.
  4. In-band rate-limiting. The above options are so-called out of the band rate limiting options where the rate limiting is handled outside of the application running business logic. In situations, where there is no central point where your unique requirements of rate limit can be met, you can code the rate limiting logic within the service itself. This may be the most flexible option, but carries the computational burden associated with rate limiting, however little that might be.

The options 1 and 2 are well covered in their respective documentation included in the references. In this blog, we'll start with an in-band solution. We'll then use a sidecar container using Envoy proxy that will address some of the limitations of an in-band solution. Finally, we'll deploy our  solution on IBM Code Engine.

Rate Limiting with Redisson

Rate limiting is the process of enforcing restrictions on how often clients can make requests, controlling the frequency at which they are allowed to send them. Irrespective of how rate limiting is implemented, most solutions will have following ingredients:

  1. Create a bucket with specified limit
  2. Use the bucket to decide to allow or rate limit a request

The first step is to create a bucket with a key that identifies a set of attributes of the request on which we want to enforce rate limit. Then we set the request limits. For example, for a url - https://insights.ibm.com/gui/11eec8fb-6ad8-1be8-8cc9-13d8f21beecb - where the UUID refers to a tenant id of IBM Storage Insights, we might want to create a bucket with tenant UUID as the key and a limit of say 1000 requests allowed in a minute for any tenant.

The next step is to actually ensure that limit in an efficient manner when a request is received by the application. The most common algorithm for enforcing a rate limit is known as the token bucket. One simple way to implement it is to use Redis:

  1. Initialize a key that will expire after the desired time window and with desired rate limit. For example, key = tenant UUID, time window = 1 min, and limit = 1000.
  2. Every time a request is received for the key, check if the limit is not zero. If it is available, allow the request and decrement the limit. Else, return HTTP response status as 429 Too Many Requests.
  3. Expire the key after 1 min.
  4. Repeat 1-3 for the next minute.

To use Redis, we can use one of the Redis client libraries, for example, Redisson, that will simply the implementation of above steps. Below, we illustrate the above steps using Redisson client in Java.

Add Redisson as a dependency in a Maven project pom.xml file:

       <dependency>
            <groupId>org.redisson</groupId>
            <artifactId>redisson</artifactId>
            <version>[3.12,4.0)</version>
        </dependency>

Add a filter that will intercept the HTTP request before it is serviced by the Controllers. Here we illustrate how to write a filter:

  1. Initialize the Filter in init() method. In this step, get access to the Redisson client. Also, read the rate limit values set from the configuration.
  2. Implement the rate limit logic in Filter's doFilter() method. Extract the key from the HTTP request. Check if there is a rate limiter is available in the internal cache for the key. If yes, use it to evaluate rate limit, else initialize a new Redisson RRateLimiter object for the key. Try to consume a token from the RateLimiter. If it allows, the allow the request to go ahead, else return HTTP 429 status code along with appropriate headers.
package none.rks.myapp;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.redisson.api.RRateLimiter;
import org.redisson.api.RateIntervalUnit;
import org.redisson.api.RateType;
import org.redisson.api.RedissonClient;

public class RateLimitingFilter implements Filter {
    private static RedissonClient redissonClient;
    private static final Map<String, RRateLimiter> RRateLimiterCache = new HashMap<>();
    private int rateLimitMinute = 1000; // read it from a configuration

    /**
    * Initialize the Filter
    */
    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
        // Initialize RedissonClient
        redissonClient = getRedissonClient();
        // get the rate limit values from config
        rateLimitMinute = getRateLimitConfig();
    } 

    /**
     * Process the HTTP Request
     */
    @Override
    public void doFilter(ServletRequest pReq, ServletResponse pResp, FilterChain pChain)
            throws IOException, ServletException {
        
        HttpServletRequest mReq = (HttpServletRequest) pReq;
        // Extract key from the HTTP Request
        String appKey = getTenantUUID(mReq);
        
        // Check if Redisson RateLimiter is present in Redis for the key
        RRateLimiter rateLimiter;
        if (RRateLimiterCache.containsKey(appKey)) {
            rateLimiter = RRateLimiterCache.get(appKey);
        } else {
           // Create a new Redisson RateLimiter for the key
            rateLimiter = redissonClient.getRateLimiter(appKey);
            rateLimiter.trySetRate(RateType.PER_CLIENT, rateLimitMinute, 1, RateIntervalUnit.MINUTES);
            RRateLimiterCache.put(appKey, rateLimiter);
        }

         // Rate limit the request
         HttpServletResponse httpResponse = (HttpServletResponse) pResp;
         if (rateLimiter.tryAcquire(1)) {
                // the limit is not exceeded
                httpResponse.setHeader("X-Rate-Limit-Remaining", "" + rateLimiter.availablePermits());
                pChain.doFilter(pReq, pResp);
          } else {
                // limit is exceeded
                httpResponse.setStatus(429);
                httpResponse.setHeader("X-Rate-Limit-Retry-After-Seconds",
                        "" + TimeUnit.NANOSECONDS.toSeconds(rateLimiter.remainTimeToLive()));
                httpResponse.setContentType(APPLICATION_JSON);

                String mContent = "{\"metadata\":{\"rc\":\"error\",\"message\":\"" + "Too many requests" + "\"}}";

                httpResponse.getWriter().append(mContent);
          }
    }
}

Finally, insert the RateLimitingFilter class as a filter into your application's web.xml, so that this filter is invoked before any other filter or listener.

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
	version="3.1">
	<display-name>My application</display-name>
	<filter>
		<filter-name>RateLimitingFilter</filter-name>
		<filter-class>none.rks.myapp.RateLimitingFilter</filter-class>
	</filter>
	<filter-mapping>
		<filter-name>RateLimitingFilter</filter-name>
		<url-pattern>/myapp/v1/*</url-pattern>
	</filter-mapping>
	.....
</web-app>

As the rate limiting logic exists in the code maintained by the developers, this approach provides high flexibility in terms to selecting keys and how to handle requests. This approach has few drawbacks though:

  1. Rate limit configuration change will require service restart
  2. Rate limit logic is implemented in the application itself
  3. Redis is an external dependency

Let's try to address some of the drawbacks by externalizing the rate limiting logic in a sidecar container.

Rate Limiting with Envoy Proxy

Envoy proxy is an open-source, high-performance proxy designed for cloud-native applications. It is commonly used as a sidecar container within a microservices architecture, particularly in service meshes like Istio, Consul, and AWS App Mesh. Envoy provides advanced traffic management capabilities, including load balancing, service discovery, traffic routing, and observability, which makes it a key component in modern distributed systems.

Envoy proxy also implements the token bucket algorithm, and the configuration sections that define a token bucket are fairly straightforward.

The key that is used to rate limit a request is uniquely identified by "descriptors". Following configuration declares a descriptor that consists of a combination of tenant UUID extracted from the path and the HTTP method. The IBM Storage Insights REST API are of the form - “/restapi/v1/tenants/3ac0c8cb-69f0-4b3e-a5e1-790beaef4cc0/<entity>”. To use the tenant UUID as the key, we declare descriptors to be used as key as follows.

                          rate_limits:
                            - actions:
                                - request_headers:
                                    descriptor_key: tenant_uuid  # Key
                                    header_name: "x-tenant-uuid" # tenant UUID extracted from the path
                                - request_headers:
                                    descriptor_key: method  # Key
                                    header_name: ":method"  # Value from pseudo-header ":method", e.g. GET, PUT, POST, DELETE.

The Lua script to extract tenant UUID from the path.

                       envoy.filters.http.lua:
                            "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.LuaPerRoute
                            source_code:
                              inline_string: |
                                function envoy_on_request(request_handle)
                                  -- Extract UUID from the path
                                  local path = request_handle:headers():get(":path")
                                  local uuid = string.match(path, "/restapi/v1/tenants/([a-f0-9-]{36})")
                                  if uuid then
                                    -- Add UUID as a custom header
                                    request_handle:headers():add("x-tenant-uuid", uuid)
                                  end
                                end

The rate limits can be defined as shown below. In a time window of 5s, 3 API calls are allowed as per this configuration.

In this example, an HTTP GET request to “/restapi/v1/tenants/3ac0c8cb-69f0-4b3e-a5e1-790beaef4cc0/volumes” would result in the descriptor map for the request having a value of {"tenant_uuid": "3ac0c8cb-69f0-4b3e-a5e1-790beaef4cc0", "method": "GET"}.

                        envoy.filters.http.local_ratelimit:
                            "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
                            stat_prefix: http_local_rate_limiter

                            token_bucket:
                              max_tokens: 3
                              tokens_per_fill: 3
                              fill_interval: 5s

The complete Envoy Proxy configuration file is available on Github.

Install Envoy on Mac and run it to check if the configuration file is valid.

% brew install envoy

% envoy -c envoy.yaml
...
[2024-11-20 14:45:18.818][19546225][info][main] [source/server/server.cc:978] starting main dispatch loop
[2024-11-20 14:45:18.819][19546225][info][runtime] [source/common/runtime/runtime_impl.cc:625] RTDS has finished initialization
[2024-11-20 14:45:18.819][19546225][info][upstream] [source/common/upstream/cluster_manager_impl.cc:249] cm init: all clusters initialized
[2024-11-20 14:45:18.819][19546225][info][main] [source/server/server.cc:958] all clusters initialized. initializing init manager
[2024-11-20 14:45:18.819][19546225][info][config] [source/common/listener_manager/listener_manager_impl.cc:930] all dependencies initialized. starting workers

Start the simple Go application available on Github and send some requests to check if Envoy is correctly intercepting the request and applying rate limits configured.

% go run main.go
Starting server on :8080

From another terminal, send requests in a loop.

for i in {1..20}; do
  curl http://localhost:10000/restapi/v1/tenants/1e1c8bc9-31da-4393-9b08-e933da47f4a3
  echo 
done

Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
local_rate_limited
local_rate_limited

Notice that requests are rate limited after 3 requests.

At this point, we have both the applications - the main Go application, and the Envoy Proxy ready to be run in a Pod as containers. Let's containerize the Go application and push it to a Container registry.

% podman build -t quay.io/singh_randhir/go-service:1.0 .
% podman push quay.io/singh_randhir/go-service:1.0

Create a K8s deployment with Envoy Proxy as a sidecar as shown below in deployment.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: go-service
  template:
    metadata:
      labels:
        app: go-service
    spec:
      containers:
        - name: go-service
          image: quay.io/singh_randhir/go-service:1.0
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
        - name: envoy
          image: envoyproxy/envoy:v1.32-latest
          ports:
            - containerPort: 10000
          volumeMounts:
            - name: envoy-config
              mountPath: /etc/envoy/envoy.yaml
              subPath: envoy.yaml
      imagePullSecrets:
        - name: quay-registry-secret
      volumes:
        - name: envoy-config
          configMap:
            name: envoy-config

Create the deployment and expose it as a service with the following service.yaml.

apiVersion: v1
kind: Service
metadata:
  name: go-service
spec:
  selector:
    app: go-service
  ports:
    - name: http-port
      protocol: TCP
      port: 10000
      targetPort: 10000

Once the Pod is ready, test to check if Envoy Proxy is applying the rate limits.

# kubectl exec -i -t dnsutils -- bash -c 'for i in {1..20}; do curl 10.103.224.104:10000/restapi/v1/tenants/1e1c8bc9-31da-4393-9b08-e933da47f4a3; echo; done'
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3
local_rate_limited
local_rate_limited
local_rate_limited

We have used Envoy Proxy as a sidecar to implement rate limits on the main application, both running in a Pod on K8s.

Next, let's see how to implement rate limit on a containerized application running on a serverless platform like IBM Code Engine.

Rate Limiting on IBM Code Engine

IBM Code Engine is a fully managed, serverless platform designed to simplify the deployment, scaling, and management of applications and workloads in the cloud. To run an application on Code Engine, we just need a container image to create an application. Once the application is created, Code Engine will handle the deployment, scaling, and management of our app.

However, since IBM Code Engine does not natively support sidecars in the same way Kubernetes does, Envoy would need to operate as a reverse proxy in a separate application, and we would direct traffic from Envoy to the Go service using external routing (e.g., external load balancer or public URL).

Deploy main application

We've created our Go service and pushed the image to quay.io container registry. For Code Engine to be able to pull the image, we need to create a registry secret for authentication with quay.io.

Create a registry secret on IBM Code Engine using CLI.

% ibmcloud ce secret create --name quay-io-secret  --username <username> --password <password> --server quay.io --format registry
Creating registry secret 'quay-io-secret'...
OK

Create the main application.

% ibmcloud ce application create  --name go-service  --image quay.io/singh_randhir/go-service:1.0  --registry-secret quay-io-secret  --port 8080
Creating application 'go-service'...
Configuration 'go-service' is waiting for a Revision to become ready.
Ingress has not yet been reconciled.
Waiting for load balancer to be ready.
Run 'ibmcloud ce application get -n go-service' to check the application status.
OK

https://go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud

Verify that the main application handles the request sent directly to it.

% curl https://go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud/restapi/v1/tenants/1e1c8bc9-31da-4393-9b08-e933da47f4a3       
Hello 1e1c8bc9-31da-4393-9b08-e933da47f4a3

At this point, we've the main application running on IBM Code Engine. Let's apply some rate limiting on it by using Envoy Proxy next.

Deploy Envoy Proxy

As we've to run Envoy Proxy as a separate application on Code Engine, let's create a container image with a configuration file baked into the image. We also need to configure the cluster to which Envoy Proxy will forward requests.

Update the envoy.yaml configuration with the URL of the main application.

  clusters:                 
    - name: service_app_1       
      type: LOGICAL_DNS             
      connect_timeout: 500s         
      dns_lookup_family: V4_ONLY
      load_assignment:              
        cluster_name: service_app_1 
        endpoints:      
          - lb_endpoints:
              - endpoint: 
                  address:  
                    socket_address:
                      address: go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud
                      port_value: 443 
      transport_socket:       
        name: envoy.transport_sockets.tls
        typed_config:         
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          sni: go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud

Here we have updated the envoy.yaml with the Code Engine URL of the main application deployed previously. However, it can be passed as an environment variable.

Next, let's containerize the Envoy Proxy so that we can pass envoy.yaml as the configuration for it to run.

FROM envoyproxy/envoy:v1.32-latest

COPY envoy-ce.yaml /etc/envoy/envoy.yaml
EXPOSE 10000
ENV GO_SERVICE_URL=https://go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud

CMD ["/usr/local/bin/envoy", "-c", "/etc/envoy/envoy.yaml"]

Create a container image and push it to registry.

% podman build --platform linux/amd64 -t quay.io/singh_randhir/envoy-proxy:latest . -f Dockerfile-envoy-ce
% podman push quay.io/singh_randhir/envoy-proxy:latest

Now, create Code Engine application using the image.

% ibmcloud ce application create  --name envoy-proxy  --image quay.io/singh_randhir/envoy-proxy:latest --registry-secret quay-io-secret   --env GO_SERVICE_URL=go-service.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud --port 10000 --min-scale 1 --max-scale 1
Creating application 'envoy-proxy'...
Configuration 'envoy-proxy' is waiting for a Revision to become ready.
Ingress has not yet been reconciled.
Waiting for load balancer to be ready.
Run 'ibmcloud ce application get -n envoy-proxy' to check the application status.
OK

https://envoy-proxy.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud

Here we're setting both the minimum and maximum number of instances  to 1, ensuring that only 1 instance of Envoy Proxy  is running at all times. This is the so-called Local Rate Limiting mode. If you want to deploy Envoy Proxy in a cluster where all the instances share the configuration, consider deploying them in Global Rate Limiting mode. We'll cover Local Rate Limiting only in this blog.

Now that both the applications are deployed on Code Engine, let's send some requests to the Envoy Proxy application to see if it rate limits the requests.

% for i in {1..20}; do
  curl https://envoy-proxy.10ve6j8g4q0x.us-south.codeengine.appdomain.cloud/restapi/v1/tenants/1234
  echo
done
Hello 1234
Hello 1234
Hello 1234
Hello 1234
Hello 1234
Hello 1234
local_rate_limited
Hello 1234
local_rate_limited

Notice that the Envoy Proxy is applying rate limits configured to the main application.

Summary

Rate limiting is an important consideration when deploying an application with public APIs. We considered several options to apply rate limiting in different scenarios. There are OOTB solutions available for some platforms. However, there are situations where we need to develop bespoke rate limiting solutions. We started by created a rate limiting solution in the main application itself using Redisson. Next, we externalized the rate limit to an Envoy Proxy which we deployed on K8s as a sidecar container to the main application container inside a Pod. Finally, on a serverless platform like IBM Code Engine, we deployed Envoy Proxy as a reverse proxy to rate limit API requests sent to the main application also deployed on Code Engine.

Acknowledgements

Thank you to the IBM Storage Insights team who supported and shaped the ideas on rate limiting REST APIs.

References

  1. GitHub - https://github.ibm.com/Singh-Randhir/rate-limiting-envoy
  2. Envoy Proxy rate limit - https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/rate_limit_filter
  3. IBM Code Engine - https://cloud.ibm.com/containers/serverless/overview
0 comments
31 views

Permalink