Sterling B2B Integration

 View Only

Sterling Data Exchange Container Deployment Patterns - AWS

By Connor McGoey posted Tue October 29, 2024 03:30 PM

  

Sterling Data Exchange Container Deployment Patterns - AWS

Table of Contents

Introductory Notes

Acronyms

General Deployment Considerations

Traditional Pre-Container On-Premises Deployment

Generic OCP or K8s Deployment & Details

AWS MultiAZ Deployment Model with Multiple VPCs and Multiple Clusters

AWS MultiAZ Deployment Model with a Single VPC Containing Multiple OCP Clusters

AWS MultiAZ Deployment Model with a Single Cluster and Separate Namespaces

AWS OCP and K8s Options

Compute Infrastructure and Network Bandwidth

Fargate

AWS Storage Options

AWS Networking & Firewall Options

Database Support

Queue Support

Optional Add-Ons

K8s / OCP Configuration Best Practices

Introductory Notes

The following blog is to be used as a reference guide for deploying Sterling Solutions with AWS. They are reference patterns only. Factors such as internal network/security requirements, the ability and size of the operations team, and others should be considered when choosing an exact deployment.

These examples refer to Openshift Container Platform (OCP) as that is the preferred Kubernetes deployment from IBM. However, these reference patterns also work with Kubernetes.

Patterns should match known good OCP deployments with any modifications needed for Sterling specific requirements. These patterns are standard patterns modified based on AWS specific options and capabilities which should be verified prior to deploying. Additionally, these guidelines are based on best practices and documented capabilities from AWS.

It should also be noted that the goal of this blog is not to recommend one vendor or solution over another. If there are solutions favored by you such as specific firewalls, they should work fine. However, you should use officially supported databases.

Acronyms

Below are some acronyms you may find throughout this blog.

    • B2Bi/SFG: IBM Sterling Business-to-Business Integrator / IBM Sterling File Gateway
    • AC: IBM B2Bi/SFG Adapter Container
    • ASI: IBM B2Bi/SFG Application Server-Independent
    • REST: IBM B2Bi/SFG REST API Server
    • C:D: IBM Sterling Connect Direct
    • SSP: IBM Sterling Secure Proxy
    • SSP PS: SSP Perimeter Server
    • B2Bi PS: B2Bi Perimeter Server
    • CM: SSP Configuration Manager
    • SEAS: IBM Sterling External Authentication Server
    • IDP: Identity Provider (e.g. LDAP, SAML, etc.)
    • MQ: IBM MQ Cluster
    • DB: Database Cluster
    • DMZ: Demilitarized Zone
    • OCP: Red Hat Openshift Container Platform
    • K8s: Kubernetes

General Deployment Considerations

Regardless of whether cloud is being used, there are some deployment considerations to keep in mind. Some of these considerations are as follows:

    • Make sure to check the latest support information in the IBM Software Product Compatibility Report (SPCR) as we continue to improve our support statement
    • For production environments you should provision enough worker nodes to allow one node or Availability Zone (AZ) to be down and still have enough capacity to run the environment. This allows for node updates or outages without affecting production.
    • In a production deployment, a minimum of 3 AZs is recommended. Additionally, the maximum latency must be below 10 milliseconds to span AZs in a single cluster.
    • Though it is possible to put both production and non-production environments in the same cluster separated by namespaces, it is not recommended as a best practice. This is because you cannot test cluster-level changes without affecting production as well.

      • Combining non-production environments into a single cluster separated by namespace is fine.

    • Development and non-production environments can be less highly available for cost savings.

      • Ex: Single Zone, Multi-AZ with fewer components, or a single node per AZ

    • SSD Storage should be used for production workloads
    • It is important to note that some storage types cannot fail over between AZs. If this is the case for your storage type, you should plan to have enough nodes to handle outages or use different storage.
    • Node sizes will vary depending on workload. You typically want to avoid the cases of either many small nodes or only a few large nodes.
    • Some cloud providers limit network bandwidth on some node types so it is important to pay attention to the node types being used.

Traditional Pre-Container On-Premises Deployment

Below is a sample diagram of a pre-container on-premises deployment.

In this deployment, there is a DMZ, Secure Zone, and Infrastructure Layer, each of which is separated by a firewall and has its application components running in separate VMs.

Generic OCP or K8s Deployment & Details

Below is a sample diagram of a generic OCP/K8s deployment showing a Demilitarized Zone (DMZ) and Internal Zone.

Here, there are separate clusters in the DMZ and Internal Zones to further isolate environments (not required but provides more separation if desired). Each component uses a Kubernetes service to allow for load balancing and to surface a single IP Address. These IPs can be internal to the cluster or external. Each individual component in the deployment can scale vertically via K8s CPU and Memory limits. Most components (with the exceptions being SSP although it's planned for a future release) can also scale horizontally.

Rolling upgrades and "self-healing" are supported by components where K8s will spin up another pod if one crashes. Databases, MQ, and LDAP can be running within the cluster or externally. File (Network File System) or Block Storage is external to clusters and should have High Availability (HA). Additionally, there should be enough capacity such that the environment will function with one node offline. Finally, unlike most components, SSP has a service name and separate IP for each instance as it cannot scale horizontally at this time.

AWS MultiAZ Deployment Model with Multiple VPCs and Multiple Clusters

Below is a sample diagram showing separated DMZ and Application clusters in their own VPC instances.

This deployment configuration provides the maximum separation between the DMZ and Application clusters, but also maximum management overhead. Because of the App VPC subnet configuration, you would need either PrivateLink or some other solution to access it. Using PrivateLink would also prevent traffic from crossing public internet in between VPCs which is ideal.

Each component has both load balancing and a service name to present a consistent IP. Also in this deployment, Database, MQ, and LDAP are High Availability (HA) and can fail over between availability zones. The identity provider (IDP), Database, and MQ could be either within the cluster or external deployments/services. File or Block storage is still external to the cluster and Highly Available. In this case, there should be enough spare capacity such that the environment functions with one Availability Zone (AZ) offline. 

Finally, all components are active, active meaning calls are load balanced between components. Note that this deployment should have at least one node per availability zone to allow for intra-availability zone failover as well.

This deployment pattern would be a good choice if you wish to have very high isolation as it has the DMZ and VPC in separate clusters. Though you must consider that having two clusters and two VPCs results in a higher management overhead and more infrastructure costs. The choice between PrivateLink or communicating over the public internet may also mean additional overhead.

AWS MultiAZ Deployment Model with a Single VPC Containing Multiple OCP Clusters

Below is a sample OCP Deployment showing a DMZ and Internal Zone spanning 3 AZs in a single AWS Virtual Private Cloud (VPC).

This is similar to the previous pattern, except in this deployment, only a single VPC is used. The DMZ and Internal Zone clusters are still separate to isolate environments for those not comfortable with a flat K8s/OCP model. As per AWS best practices, the VPC contains a public subnet and multiple private subnets. The DMZ cluster is in a private subnet, preventing exposing node/internal IPs to the public internet. Though, Sterling SSP servers are exposed externally via Load Balancers.  Inbound connections are possible via NAT servers in a Public Subnet. 

Each component has both load balancing and a service name to present a consistent IP. Also in this deployment, Database, MQ, and LDAP are High Availability (HA) and can fail over between availability zone. The identity provider (IDP), Database, and MQ could be either within the cluster or external deployments/services. File or Block storage is still external to the cluster and Highly Available. In this case, there should be enough spare capacity such that the environment functions with one Availability Zone (AZ) offline. 

Finally, all components are active, active meaning calls are load balanced between components. Note that this deployment should have at least one node per availability zone to allow for intra-availability zone failover as well.

This deployment pattern would provide relatively high isolation since the DMZ is a separate cluster to the internal zone. The single VPC is used to minimize network traffic between it and the possible traffic over the public internet. However, some things to keep in mind if going this route include the fact that having two clusters result in management overhead and more infrastructure costs. Additionally, sharing a VPC may not be enough isolation for some.

AWS MultiAZ Deployment Model with a Single Cluster and Separate Namespaces

Below is a sample deployment model with a single cluster and single private subnet per data center. The DMZ and Application pods are separated by namespaces with an OCP Cluster.

This this model, there is a single cluster and a single Private Subnet per data center. Physical nodes span environments and namespaces with the DMZ and Application pods separates with a single OCP Cluster via K8s Namespaces / OCP Projects.

If you wish to have lower management overhead this deployment may be a good choice. The low overhead comes from the fact that the DMZ and Applications share a single cluster and nodes. There is some level of network and management isolation with there being separate namespaces. OpenShift SDN is available by default to provide namespace isolation (Calico if using ROKS as ROKS does not support SDN).

Keep in mind that this deployment requires Calico or another CNI to be deployed in K8s. There is low isolation as environments share clusters and nodes, so care must be taken with networking rules to ensure proper isolation.

This is a common pattern in companies with a separate team that provides infrastructure to application teams. In these cases it is likely that the application team will be given their own namespace(s).

AWS OCP and K8s Options

Below are some options and considerations for OCP and K8s on AWS:

    • OCP can be manually installed
    • ROSA is a Managed OpenShift Service on AWS
    • EKS is AWS's Manages K8s Service
      • We support EKS using EC2 Infra only at this time
    • Refer to our Support Matrix that the release you want to install is supported on the version of OCP or K8s you plan on using
      • Search for "Sterling: to find most components or "Transformation" to find ITX and ITXA
    • AWS EKS Support for K8s is generally longer than the community and other vendors
      • Ensure that the version of K8s you want to use is supported both by AWS EKS and Sterling
    • Best practice for new environments is to start with the newest supported version to get the latest capability and patches and maximize time before and upgrade is required

Compute Infrastructure and Network Bandwidth

    • There are various node types available
      • For transformation heavy use cases, you would likely want to use "Compute Optimized" to minimize the number of cores required
      • For SFG/C:D heavy use cases, you likely don't need storage optimized nodes as you shouldn't use local storage with containers anyway. Focus on shared storage speed and network I/O
    • Sterling does not support ARM Architecture, so no Gravitron Processors
    • Understanding Network Bandwidth is very important
    • Network Bandwidth varies with node sizes and may be reduced between regions
    • Some nodes use a credit system for bandwidth that can reduce the bandwidth considerably once credits are exhausted 
    • Ensure at least 1Gb for Production workloads in addition to CPU/Memory Requirements. More for I/O heavy use cases like SFG or C:D

Fargate

The IBM Sterling suite of products does not support AWS Fargate at this time. In using Fargate on Amazon EKS, AWS outlines the following considerations. A few of these considerations have led Sterling to not support it. Some of these considerations in specific include:

    • "Privileged containers aren't supported on Fargate"
    • "Amazon EKS must periodically patch Fargate Pods to keep them secure. We attempt the updates in a way that reduces impact, but there are times when Pods must be deleted if they aren't successfully evicted. There are some actions you can take to minimize disruption. For more information, see Set actions for AWS Fargate OS patching events"
      • This may cause unnecessary outages

Some of the network considerations listed also give pause.

AWS Storage Options

AWS Container Storage 

Active File System Storage

      • Should define a default storage class for your cluster. EKS default is currently gp2, but gp3 is better
      • Want to use SSD based options such as gp3 or io1
      • Pay close attention to storage costs vs performance, especially for I/O intensive workloads like SFG or C:D
      • Supported EKS/AWS Storage options for shred volumes are EFS or EBS. However, pay attention to the limitations of EBS

Elastic File System (EFS)

      • Requires the AWS EFS CSI Driver to be installed
      • Deployed via Operator in ROSA 4.10+
      • NFS Based
      • Allows ReadWriteMany (Access by multiple pods at once)
      • Can fail over between AZs in a single region

Elastic Block Service (EBS)

      • Default Storage used by EKS
      • Best practice to deploy CSI Driver to allow AWS to do full storage lifecycle management, but not required
      • Provides snapshot capability
      • only supports ReadWriteOnce (can only be accessed by one pod at a time).
        • If the pod requires ReadWriteMany, you cannot use EBS
      • Will not fail over between AZs

AWS Object Storage

AWS Simple Storage Service (S3)

      • Starting with version 6.2.0, Sterling B2Bi supports Object Storage as an option for document payloads using Document Service
        • Available with Sterling B2Bi certified container deployments
        • S3 Adapter/Service support
      • S3 Storage is meant to store bulk data that is read infrequently. It is not meant to be used as an active file system storage
      • Various Flavors or Object Storage available with various Performance
      • Flavor depends on use case but should probably start with standard
      • Two main use cases in B2Bi/SFG:
        • Document Storage: Ability to store BPs and other large documents in S3 Storage vs the DB to reduce size/load on DB
        • Adapter Storage: Ability for SFG to send/receive files to/from S3 as a transfer mechanism.
          • Supports AWS S3 Storage as of B2Bi/SFG 6.1.2

AWS Networking & Firewall Options

Notes and Considerations

Sample Options

Database Support

You can either leverage a cloud Database Service or deploy a Database directly in the cluster

Note: it is not recommended to setup your own database in containers as this requires expertise in Kubernetes, the database, and how the database works in containers. 

AWS Cloud Database Support

Self Managed Databases

    • Databases listed at the IBM Software Support Link can be deployed in a VM running a supported OS or via containers
      • Search for "Sterling" and choose "IBM Sterling B2B Integrator"

Queue Support

AWS SQS & B2Bi SQSClient Adapter

    • AWS SQS is a fully managed queue service for messages
    • Sterling B2Bi SQSClient Adapter connects to AWS SQS queue and can fetch messages
      • Can be configured to execute business processes with the retrieved messages

Optional Add-Ons

There are many options for configuring an AWS Cloud Deployment. We can't know or test all of them, but here are a few that may be worth investigation:

    • AWS PrivateLink: Connect your VPC to some AWS Services without going out to the public internet
    • Gateway Endpoints for AWS S3: Connect to S3 Storage from your VPC without going out to the public internet
      • This does not require PrivateLink
    • AWS Outposts: AWS infrastructure and services on-prem

K8s / OCP Configuration Best Practices

Sterling B2Bi/SFG

A properly configured termination grace period (terminationGracePeriodSeconds) allows for a container's PreStop hooks to finish and then the subsequent stopping of the container.

As of release 6.1.2, IBM Sterling B2Bi/SFG supports a user-configurable termination grade period in contain deployments. With a properly configured termination grace period, processes, BP cleanup, and other currently running workloads will be given a long enough period to stop gracefully before the pod is killed.


#Highlights-home
#Highlights
#sustainability-highlights-home
0 comments
15 views

Permalink