Instana

 View Only

How Instana Uses AWS EKS

By Shannon Early posted Mon May 30, 2022 04:46 AM

  

[Republished from Instana.com]
[Blog written by Marcel Birkner]

How Instana Uses AWS EKS
Instana is the first and only Enterprise Observability solution designed specifically for the challenges of managing microservices and distributed, cloud-native applications. Our SaaS platform has to process and store large amounts of telemetry data from our customers. Each day we process about a petabyte of ingress data.

Instana processes a ton of data.
In this article we will show how we use AWS EKS to run all ingress and processing components for our SaaS platform.

High level architecture

On a high level view we have two types of regions, GlobalRegion and MultiRegion. The GlobalRegion runs all global components used for licensing, authentication and accounting. The MultiRegions run all processing components that drive our SaaS product. MultiRegions are spread across multiple continents and cloud providers to offer the best service and latency for our customers. The networks for all MultiRegions are completely isolated from one another and have their own VPC configuration. Each region has its own datastore clusters and handles the processing of a subset of our customers.

Once a region and its clusters reach a certain size, we spin up a new region and deploy all new customers to that region. This allows us to minimize the blast radius in case of failures and thus reduces risk for us as well as for our customers. Being able to easily create a complete new region while our customer base grows was one reason for using managed EKS clusters in our AWS regions.

Instana's high-level architecture, including two types of regions, GlobalRegion and MultiRegion.

 

GlobalRegion

Let us take a look at our GlobalRegion. It is our smallest Kubernetes cluster and runs cross-functional components to manage licenses, authentication, authorization and accounting across our customer base.


MultiRegion

All our high level ingress and processing happens in our MultiRegions. Each one of these regions runs about 2-3,000 processes. Most of our components are written in Java using the Dropwizard framework or JavaScript / NodeJS. We spread our processing components in three nodegroups:

  • Acceptor NodeGroup for all ingress traffic
  • Core NodeGroup for shared processing components
  • Tenant Unit (TU) NodeGroup for tenant unit specific processing components

Using Kubernetes selectors we can easily separate our components and group them together by resource requirements.

Here is what a typical MultiRegion looks like. We label each region with a color since this gives us a unique identifier that everyone inside Instana understands. Using the AWS regions would limit us to only create one MultiRegion, i.e. for us-west-2, which is something we did not want to be limited by.

Each MultiRegion has its own VPC with public and private subnets spread across three availability zones. The EKS nodegroups are only configured to use the private subnets so none of our components are directly accessible from the internet.

We label each MultiRegion with a color.We label each MultiRegion with a color.

Acceptor NodeGroup

Instana supports infrastructure monitoring, end-user-monitoring, distributed tracing and serverless monitoring. Therefore we operate different ingress endpoints that are accessible via AWS TCP or HTTPS load balancers. All of the ingress components run in a dedicated nodegroup, the Acceptor NodeGroup, which has a custom resource profile that best matches its workload.

Core NodeGroup

Besides the ingress components we have a pool of shared components that are used for different purposes. Some of these components process, transform and redistribute the ingress data. Other components store that data, in various formats, across several datastores that we use. And others serve our user interface.

TenantUnit (TU) NodeGroup

Last but not least, we have a dedicated nodegroup for tenant unit components. In this nodegroup, we run all the heavy-duty processing that comes with 1s metric resolution and tracing every call.

As an example, this is a snippet how the tenant unit nodegroup configuration looks like. By default we enable autoScaler, externalDNS and certManager policies for this nodegroup, which will be covered in the next chapter. If you are interested in the full example, you will find a full EKS cluster config in the appendix.

Acceptor NodeGroup

Instana supports infrastructure monitoring, end-user-monitoring, distributed tracing and serverless monitoring. Therefore we operate different ingress endpoints that are accessible via AWS TCP or HTTPS load balancers. All of the ingress components run in a dedicated nodegroup, the Acceptor NodeGroup, which has a custom resource profile that best matches its workload.

Core NodeGroup

Besides the ingress components we have a pool of shared components that are used for different purposes. Some of these components process, transform and redistribute the ingress data. Other components store that data, in various formats, across several datastores that we use. And others serve our user interface.

TenantUnit (TU) NodeGroup

Last but not least, we have a dedicated nodegroup for tenant unit components. In this nodegroup, we run all the heavy-duty processing that comes with 1s metric resolution and tracing every call.

As an example, this is a snippet how the tenant unit nodegroup configuration looks like. By default we enable autoScaler, externalDNS and certManager policies for this nodegroup, which will be covered in the next chapter. If you are interested in the full example, you will find a full EKS cluster config in the appendix.

- name: private-tenantunit-0
  instanceType: r5.4xlarge
  privateNetworking: true
  labels: {instanaGroup: tenantUnit, vendor: instana, zone: private}
  minSize: 1
  maxSize: 100
  iam:
    withAddonPolicies:
      autoScaler: true
      externalDNS: true
      certManager: true
  tags:
     instanaGroup: tenantUnit
     vendor: instana
     zone: private

Creating an EKS cluster via eksctl

Now that we have an understanding of the overall architecture, let’s take a look at how we approach our cluster setup. To follow the infrastructure-as-code paradigm, we use eksctl to create and maintain all our EKS clusters. The configuration is stored in a yaml file and can be used to manage the complete lifecycle of a cluster. We started with EKS K8s version 1.15 and used eksctl from the start. We also did our upgrades from 1.15 to 1.16 and later on from 1.16 to 1.17 via eksctl and so on. At the moment we are using version 1.19 and will be upgrading to 1.20 in the upcoming weeks.

Install eksctl

To install eksctl run the following commands and make sure you have at least version 0.33.0:

> brew tap weaveworks/tap
> brew install weaveworks/tap/eksctl
> source <(eksctl completion bash)
> eksctl version
0.33.0

Create KMS key

Before creating the EKS Kubernetes cluster, you must create a KMS (Key Management Service) key. The KMS key is used to encrypt / decrypt the K8s secrets in the managed EKS cluster. This will encrypt secrets stored in etcd.

Create the key in AWS Console and follow the wizard to create a symmetric key and define key usage permissions: https://us-west-2.console.aws.amazon.com/kms/home?region=us-west-2#/kms/keys


Create the KMS key in AWS Console and follow the wizard to define permissions.

Create EKS cluster from config.yaml

In the appendix of this post you can find a full example cluster config.yaml file that you can use to create an EKS cluster that matches the architecture diagram above.

> eksctl create cluster --config-file eks-pink-config.yaml


Once you execute the above command, eksctl will trigger CloudFormation to create all required resources.


Set up cluster-wide services

For all of our Kubernetes clusters we use a set of cluster-wide services that make our life in the SRE team easier when maintaining the cluster. Here are a few examples of services that are important to us:

  • external-dns to configure DNS entries in Route53
  • cluster-autoscaler to automatically add or shrink the nodegroups when new customers are deployed / undeployed or core components are scaled out
  • kube-dns-autoscaler to scale coredns pods to match the overall cluster size
  • instana-agent for full infrastructure and Kubernetes monitoring, and distributed tracing

 

external-dns

ExternalDNS regularly synchronizes exposed Kubernetes Services and Ingresses with DNS providers. It supports a wide range of standard DNS providers like AWS Route53. Like KubeDNS, it retrieves a list of resources (services, ingresses, etc.) from the Kubernetes API to determine a desired list of DNS records. Unlike KubeDNS however, it is not a DNS server itself, but merely configures other DNS providers accordingly. In our case this is AWS Route 53.

external-dns allows us to automatically generate Route53 entries for our Kubernetes services.

An example config to install external-dns is in the appendix:

> kubectl apply -f external-dns-pink.yaml


You can check the logs for external-dns using:

> kubectl logs -f -l app=external-dns


Check the logs for external-dns.

Check the logs for external-dns.

cluster-autoscaler

The Kubernetes Cluster Autoscaler automatically adjusts the number of EC2 nodes in your cluster. It will launch new nodes into a node group when there are not enough resources left to launch a pod and makes sure to move pods and remove nodes when there are too many under-utilized resources.

Install cluster-autoscaler:

> kubectl apply -f cluster-autoscaler-autodiscover.yaml


You can check the logs for the autoscaler using:

kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler


Check the logs for the autoscaler.
Check the logs for the autoscaler.

kube-dns-autoscaler

The Kubernetes cluster-proportional-autoscaler is used to automatically adjust the number of coredns pods in our cluster when the number of running pods increases.

Install kube-dns-autoscaler:

> kubectl apply -f kube-dns-autoscaler.yaml


You can check logs:

> kubectl -n kube-system logs -f deployment.apps/kube-dns-autoscaler

Instana Agent

Finally, as a monitoring company ourselves, the natural choice for us is to eat your own dog food and use Instana to monitor Instana. There are several methods to install the instana-agent onto a Kubernetes cluster. We recommend installing the agent using the Helm Chart or YAML file (DaemonSet) or using the K8s Operator.

Currently, we use a daemonset for all EKS cluster as described here: https://www.instana.com/docs/setup_and_manage/host_agent/on/kubernetes/#install-as-a-daemonset

A few minutes after deploying the Instana agents across the K8s cluster, we get full insight into all the components running in the cluster, as well as the K8s cluster itself. This includes infrastructure and component metrics, distributed traces, auto-profiling and alerting on built-in events.

In a few minutes we get full insight into all the components running in the cluster, as well as the K8s cluster.

In a few minutes we get full insight into all the components running in the cluster, as well as the K8s cluster.

Here is a screenshot of the EKS cluster dashboard for our test environments that is used by developers. This dashboard gathers all high level metrics for the cluster and is the starting point to dig into further metrics for nodes, namespace, deployments, daemonsets, statefulsets, services, pods and infrastructure.

Here is the EKS cluster dashboard for our test environments.

Here is the EKS cluster dashboard for our test environments.

This is a map of all running containers grouped by namespace. There are several useful views available that help investigate pod distribution and resource usages across the whole EKS Kubernetes cluster.

This is a map of all running containers grouped by namespace.
This is a map of all running containers grouped by namespace.

Summary

So far we are happy with our decision to use managed EKS clusters for our AWS regions. Using managed EKS, we do not have to spend time operating and maintaining the core cluster infrastructure ourselves, which allows for more time for focusing on improving our platform and product. If there is one improvement we could make it would be to improve the provisioning times. This can sometimes take a bit until a cluster is fully up and running.

We have not had any production problems, and we hope it stays that way. Take a guided tour through our Play With environment to learn more about how Instana works.

Appendix

This appendix includes a couple stripped down config files we use to setup our EKS Kubernetes clusters. Make sure to use the latest versions since the tools we used in this article move fast and update their APIs regularly.

eksctl cluster eks-pink-config.yaml for pink MultiRegion

external-dns-pink.yaml

kube-dns-autoscaler.yaml


#Instana
#AmazonAWScloud
#AWS
#EKS
#HowTo
#setup
#observability
0 comments
52 views

Permalink