Some customers have advanced resiliency requirements and so have two data centers serving API Connect traffic. These are commonly in one of two patterns: cold standby or warm standby. In both scenarios it is common to have a separate gateway service in each data center. A gateway service can only be linked to a single analytics service, plus it is best practice to co-locate the analytics service with the gateway that serves is, so this therefore leads to having an analytics service in each datacenter.
These patterns can be seen described in the documentation here: https://www.ibm.com/docs/en/api-connect/10.0.8?topic=strategies-two-data-center-deployment-examples-diagrams
This gives great resiliency. Both gateway services are linked to the catalog and APIs are published to both gateway services. Each gateway service is completely independent, so this means if one datacenter goes offline the other can continue serving traffic and logging the events in its local analytics service.
The downside of this pattern is that there is no aggregation across analytics services - they too are completely independent. There is a service switcher in the API manager UI to let you switch which service you would like to view the data for.
New replication
New in API Connect 10.0.8 is the ability to replicate analytics data between services. This replication is not database level replication like API Manager or Portal, this is the ability to use offload to send data from one service’s ingestion to another. This then means that each analytics service is actually storing the data for both gateway services. This allows an aggregated view of all traffic at once. It is still possible to drill down to the traffic for individual gateway services using the standard analytics filtering options on gateway_service_name.
It is important to remember at this point that analytics handles exponentially more data than api manager and portal combined so this is not something to go into easily - it is likely that each analytics service will need to be twice the size it was if standalone as each is now handling the traffic from both gateway services.
Conditional offload in analytics ingestion means we can configure it to only send the data to the remote analytics service if the data was for the specific local gateway service. This avoids creating a circle where data is replicated from a7s1 to a7s2 and then right back to a7s1 again. We only want to replicate a7s1 to a7s2 and a7s2 to a7s1.
Note: it is not possible to replicate traffic from v5c gateways due to the fact the legacy gateway does not include gateway_service_name in the data it sends analytics, this means the conditional offload is not possible.
The process to set this up is documented here: https://www.ibm.com/docs/en/api-connect/10.0.8?topic=aco-replicating-analytics-data-2dcdr-warm-standby-data-center. It essentially involves configuring the certificates for the remote service and then configuring conditional offload.
Because of the increased load this puts on analytics it is likely to require an increase in deployment profile, or the need to scale out horizontally. Luckily analytics scales very well horizontally as it allows both reads and writes to be distributed across the multiple nodes. Because of that it is not recommended to attempt this pattern on VMWare as horizontal scaling is only supported on kubernetes and openshift.
It is important to note that analytics replication is only possible for services within the same APIC cloud. The data contains numerous provider org ids, catalog ids, etc that are used for access control in the Analytics API/UI. This means that both datacenters have to have the same names/ids. If they don’t then it will happily let you replicate the data between services on disparate clouds but there wont be any way to access that data as the analytics API uses the org/catalog/space IDs for access control.
If you’re using one of the two data center deployment patterns with API Connect then this is a new option available to you.
#IBMAPIConnect #APIConnect #Analytics