API Connect

 View Only

Analytics Rollover and Retention in v10.0.5

By Geraint JONES posted Thu September 07, 2023 08:24 AM

  

Introduction

Since v10.0.5, API events sent to Analytics are written to a back-end OpenSearch and stored in indices.

An OpenSearch index is composed of shards. These shards are distributed across the nodes in the OpenSearch cluster. Each document in an index is stored in the shards of that index. An index has 2 types of shard - primary and replica. When a document is written to an index, it is first written to that index's primary shards, and then replicated to the replica shards. In API Connect v10.0.5, Analytics assigns 5 primary shards and 10 replica shards (2 per primary shard) to each API event index.

Each index and shard has an overhead - storage, memory, and CPU. OpenSearch itself makes use of the JVM heap when managing shards. I won't go into detail here, but in general, having a small number of large shards uses fewer resources than a large number of small shards.

So as more API events arrive, the overhead increases.

To help with this, Analytics performs rollover and retention operations.

In this blog I explain what rollover and retention means in IBM API Connect Analytics, and how Analytics has made use of OpenSearch Index State Management (ISM) to implement a policy that handles rollover and retention.

Rollover

Rollover is a process where the active index transitions to a new active index. When an API event index is rolled over, a new index is created and made the active index, while the “old” index becomes a read-only index, a repository of historic API events.

In order for Analytics to know which is the active index, it assigns the active index a write alias called “apic-api-w”. When Analytics writes API events to an index, it actually writes to this alias.

So when rolling over, this is what happens:

  • New index created.
  • New index assigned the write alias.
  • Write alias removed from old index.

Example

Before a rollover, the following API event index exists:
 
apic-api-2023.09.04-000001 <---- apic-api-w alias refers to this index.
 
After a rollover, the following API event indices exist:
 
apic-api-2023.09.04-000001
apic-api-2023.09.05-000002 <---- apic-api-w alias refers to this index.

Retention

Retention is a process where old rolled-over indices are deleted.

Rollover and Retention using ISM

The Analytics rollover and retention implementation relies on OpenSearch Index State Management (ISM). This is a powerful framework that provides a built-in capability to manage the lifecycle of indices.

Using ISM, Analytics have created an index policy that manages rollover and retention for all API event indices.

An index policy defines states, a series of actions per state, conditions that trigger those actions, transitions to other states, and the conditions that trigger those transitions.

The following state transition diagram illustrates the rollover and retention index policy.

This image is of a state transition diagram that illustrates the rollover and retention index policy. It shows how new API event indices are initially put into the rollover state. In this statement ISM monitors index document count and index age. If either of these reaches a threshold value defined in the index policy settings, a rollover action is invoked. The index remains in the rollover state, but now only its age is monitored. If this reaches another threshold defined in the index policy settings, it’s transitioned to the Steele state, where the delete action is invoked.

As you can see, the Analytics rollover and retention index policy has two states:

Rollover State (initial state)

As soon as an API event index is created, it enters the Rollover state. In this state, ISM monitors two index attributes - age and document count.

When either of those reaches or exceeds a threshold value defined in the index policy, ISM initiates the built-in rollover action, creating a fresh index and transferring the write alias.

The rolled-over index remains in the Rollover state, with its age becoming the sole focus of attention. When this reaches or exceeds another threshold value defined in the index policy, ISM transitions the index into the Delete state.

Delete State (final state)

In this state, ISM performs its final act on the index, triggering the built-in delete action, deleting the index and freeing-up the disk space it occupied.

Rollover and Retention Settings

The settings which control rollover and retention are embedded in the rollover and retention policy.

For rollover, the settings are:

  • min_doc_count. The default value is 25000000. The index is rolled over when the number of documents in the index reaches or exceeds this number.
  • min_index_age. The default value is 1d (1 day). Rollover is triggered when the age of the index exceeds this value.

For retention (delete), the settings are:

  • min_index_age. The default value is either 30d (30 days) for an n1 profile, or 90d (90 days) for an n3 profile. The index is deleted when the age of the index exceeds this value.

These settings can be changed using the Analytics API or the Analytics user interface.

Changing these values will potentially have a serious impact on your system’s storage and performance, and so should be planned carefully.

In my next blog I’ll go into more detail about what to consider when planning of your deployment configuration.

A Final Word

The Analytics rollover and retention process, which uses OpenSearch ISM, ensures the efficient management of API event indices, with each index transitioning seamlessly from Rollover to Delete when the time is right.

For detailed information about OpenSearch Index State Management, see the OpenSearch documentation at https://opensearch.org/docs/latest/im-plugin/ism/index/.

#analytics#APIConnect#APIConnect#IBMAPIConnect

0 comments
118 views

Permalink