Sterling B2B Integration

 View Only

Sterling Data Exchange Container Deployment Patterns - Microsoft Azure

By Connor McGoey posted 13 days ago

  

Sterling Data Exchange Container Deployment Patterns - Microsoft Azure

Table of Contents

Introductory Notes

Acronyms

General Deployment Considerations

Traditional Pre-Container On-Premises Deployment

Generic OCP or K8s Deployment & Details

Azure MultiAZ Deployment Model with Multiple VNETs and Multiple Clusters

Azure MultiAZ Deployment Model with a Single VNET Containing Multiple OCP Clusters

Azure MultiAZ Deployment Mode with a Single Cluster and Separate Namespaces

Azure OCP and K8s Options

Compute Infrastructure and Network Bandwidth

Azure Storage Options

Azure Networking & Firewall Options

Azure Database Support

Optional Add-Ons

K8s / OCP Configuration Best Practices

Introductory Notes

The following blog is to be used as a reference guide for deploying Sterling Solutions with Microsoft Azure. They are reference patterns only. Factors such as internal network/security requirements, the ability and size of the operations team, and others should be considered when choosing an exact deployment.

These examples refer to Openshift Container Platform (OCP) as that is the preferred Kubernetes deployment from IBM. However, these reference patterns also work with Kubernetes.

Patterns should match known good OCP deployments with any modifications needed for Sterling specific requirements. These patterns are standard patterns modified based on Azure specific options and capabilities which should be verified prior to deploying. Additionally, these guidelines are based on best practices and documented capabilities from Azure.

It should also be noted that the goal of this blog is not to recommend one vendor or solution over another. If there are solutions favored by you such as specific firewalls, they should work fine. However, you should use officially supported databases.

Acronyms

Below are some acronyms you may find throughout this blog.

    • B2Bi/SFG: IBM Sterling Business-to-Business Integrator / IBM Sterling File Gateway
    • AC: IBM B2Bi/SFG Adapter Container
    • ASI: IBM B2Bi/SFG Application Server-Independent
    • REST: IBM B2Bi/SFG REST API Server
    • C:D: IBM Sterling Connect Direct
    • SSP: IBM Sterling Secure Proxy
    • SSP PS: SSP Perimeter Server
    • B2Bi PS: B2Bi Perimeter Server
    • CM: SSP Configuration Manager
    • SEAS: IBM Sterling External Authentication Server
    • IDP: Identity Provider (e.g. LDAP, SAML, etc.)
    • MQ: IBM MQ Cluster
    • DB: Database Cluster
    • DMZ: Demilitarized Zone
    • OCP: Red Hat Openshift Container Platform
    • K8s: Kubernetes
    • VNET: Virtual Network

General Deployment Considerations

Regardless of whether cloud is being used, there are some deployment considerations to keep in mind. Some of these considerations are as follows:

    • Make sure to check the latest support information in the IBM Software Product Compatibility Report (SPCR) as we continue to improve our support statement
    • For production environments you should provision enough worker nodes to allow one node or Availability Zone (AZ) to be down and still have enough capacity to run the environment. This allows for node updates or outages without affecting production.
    • In a production deployment, a minimum of 3 AZs is recommended. Additionally, the maximum latency must be below 10 milliseconds to span AZs in a single cluster.
    • Though it is possible to put both production and non-production environments in the same cluster separated by namespaces, it is not recommended as a best practice. This is because you cannot test cluster-level changes without affecting production as well.
      • Combining non-production environments into a single cluster separated by namespace is fine.
    • Development and non-production environments can be less highly available for cost savings.
      • Ex: Single Zone, Multi-AZ with fewer components, or a single node per AZ
    • SSD Storage should be used for production workloads
    • It is important to note that some storage types cannot fail over between AZs. If this is the case for your storage type, you should plan to have enough nodes to handle outages or use different storage.
    • Node sizes will vary depending on workload. You typically want to avoid the cases of either many small nodes or only a few large nodes.
    • Some cloud providers limit network bandwidth on some node types so it is important to pay attention to the node types being used.

Traditional Pre-Container On-Premises Deployment

Below is a sample diagram of a pre-container on-premises deployment.

In this deployment, there is a DMZ, Secure Zone, and Infrastructure Layer, each of which is separated by a firewall and has its application components running in separate VMs.

Generic OCP or K8s Deployment & Details

Below is a sample diagram of a generic OCP/K8s deployment showing a Demilitarized Zone (DMZ) and Internal Zone.

Here, there are separate clusters in the DMZ and Internal Zones to further isolate environments (not required but provides more separation if desired). Each component uses a Kubernetes service to allow for load balancing and to surface a single IP Address. These IPs can be internal to the cluster or external. Each individual component in the deployment can scale vertically via K8s CPU and Memory limits. Most components (with the exceptions being SSP although it's planned for a future release) can also scale horizontally.

Rolling upgrades and "self-healing" are supported by components where K8s will spin up another pod if one crashes. Databases, MQ, and LDAP can be running within the cluster or externally. File (Network File System) or Block Storage is external to clusters and should have High Availability (HA). Additionally, there should be enough capacity such that the environment will function with one node offline. Finally, unlike most components, SSP has a service name and separate IP for each instance as it cannot scale horizontally at this time.

Azure MultiAZ Deployment Model with Multiple VNETs and Multiple Clusters

Below is a sample diagram showing separated DMZ and Application clusters in their own VNET instances.

This deployment configuration provides the maximum separation between the DMZ and Application clusters, but also maximum management overhead. Azure supports VNET peering to cross connect two VNETs which would prevent traffic going over the public internet. However, you must take into account that peering would also directly connect the two VNETs together.

VNETs contain multiple private subnets and Redhat requires separate subnets for master and worker nodes in Azure.

Each component has both load balancing and a service name to present a consistent IP. Also in this deployment, Database, MQ, and LDAP are High Availability (HA) and can fail over between availability zones. The identity provider (IDP), Database, and MQ could be either within the cluster or external deployments/services. File or Block storage is still external to the cluster and Highly Available. In this case, there should be enough spare capacity such that the environment functions with one Availability Zone (AZ) offline. 

Finally, all components are active, active meaning calls are load balanced between components. Note that this deployment should have at least one node per availability zone to allow for intra-availability zone failover as well. Also, while optional, Azure provides a reference architecture for a hub and spoke model to give multiple VNETs access to shared services.

This deployment pattern would be a good choice if you wish to have very high isolation as it has the DMZ in a separate VNET and separate cluster. Though you must consider that having two clusters and two VNETs results in a higher management overhead and more infrastructure costs. Dual VNET also requires more management overhead and either VNET Peering or communication over the public internet.

Azure MultiAZ Deployment Model with a Single VNET Containing Multiple OCP Clusters

Below is a sample OCP deployment showing a DMZ and Internal Zone spanning 3 AZs in a single Azure Virtual Network (VNET).
This is similar to the previous pattern, except in this deployment, only a single VNET is used. The DMZ and Internal Zone clusters are still separate to isolate environments for those not comfortable with a flat K8s/OCP model. The DMZ cluster is in a private subnet, preventing exposing node/internal IPs to the public internet. Though, Sterling SSP servers are exposed externally via Load Balancers.  Inbound connections are possible via NAT servers in a Public Subnet. 
VNETs contain multiple private subnets and Redhat requires separate subnets for master and worker nodes in Azure.
Each component has both load balancing and a service name to present a consistent IP. Also in this deployment, Database, MQ, and LDAP are High Availability (HA) and can fail over between availability zones. The identity provider (IDP), Database, and MQ could be either within the cluster or external deployments/services. File or Block storage is still external to the cluster and Highly Available. In this case, there should be enough spare capacity such that the environment functions with one Availability Zone (AZ) offline. 
Finally, all components are active, active meaning calls are load balanced between components. Note that this deployment should have at least one node per availability zone to allow for intra-availability zone failover as well. Also, while optional, Azure provides a reference architecture for a hub and spoke model to give multiple VNETs access to shared services.
This deployment pattern would provide relatively high isolation since the DMZ is a separate cluster to the internal zone. The single VNET is used to minimize network traffic between it and the possible traffic over the public internet. However, some things to keep in mind if going this route include the fact that having two clusters result in management overhead and more infrastructure costs. Additionally, sharing a VNET may not be enough isolation for some.

Azure MultiAZ Deployment Mode with a Single Cluster and Separate Namespaces

Below is a sample deployment model with a single cluster and single private subnet per data center. The DMZ and Application pods are separated by namespaces with an OCP Cluster.
This this model, there is a single cluster and a single Private Subnet per data center. Physical nodes span environments and namespaces with the DMZ and Application pods separates with a single OCP Cluster via K8s Namespaces / OCP Projects.
If you wish to have lower management overhead this deployment may be a good choice. The low overhead comes from the fact that the DMZ and Applications share a single cluster and nodes. There is some level of network and management isolation with there being separate namespaces. OpenShift SDN is available by default to provide namespace isolation (Calico if using ROKS as ROKS does not support SDN).
Keep in mind that this deployment requires Calico or another CNI to be deployed in K8s. There is low isolation as environments share clusters and nodes, so care must be taken with networking rules to ensure proper isolation.
This is a common pattern in companies with a separate team that provides infrastructure to application teams. In these cases it is likely that the application team will be given their own namespace(s).

Azure OCP and K8s Options

Below are some options and considerations for OCP and K8 on Azure:

    • Options for a manual installation of OCP on Azure
    • Azure Red Hat OpenShift as a managed service
    • AKS is Azure's managed K8s service
    • Refer to our Support Matrix that the release you want to install is supported on the version of OCP or K8s you plan on using
      • Search for "Sterling: to find most components or "Transformation" to find ITX and ITXA
    • AKS Kubernetes Support is generally closer to the community support schedule than AWS EKS
      • Ensure that the version of K8s the customer wants to use is supported both by AKS and Sterling
    • Best practice for new environments is to start with the newest supported version to get the latest capability and patches and maximize time before an upgrade is required
    • If you are are familiar with AWS, Azure provides an article on Azure for AWS Professionals that compares the main functions

Compute Infrastructure and Network Bandwidth

Below are some notes and considerations for Azure compute infrastructure and network bandwidth:

    • AKS has 4 preset configurations with "Production Standard" being the one recommended for most production workloads
      • Each preset configuration comes with default node types for the control plane and workloads, but can be changed prior to provisioning
    • Azure provides a description of the node types in Azure which will vary depending on environment and use case
      • You may want to start with General Purpose or Compute Optimized nodes
    • AKS enables the cluster autoscaler by default
      • This can be easily disabled
      • According to the documentation, this component will scale up worker nodes if pod deployments are failing due to insufficient space and can also scale clusters back down again if pods are deleted
    • There are ARM processor options available, but B2Bi containers are not build for ARM architecture so these are not supported
    • It is important to understand network bandwidth
    • The description of node types in Azure page provides more details on node sizes as well as the number of network cards and expected bandwidth of that node
    • Be aware of the network bandwidth costs which vary in single AZs, across AZs, or across regions
    • Azure provides a class of burstable nodes that use a credit system for CPU
      • Node performance can greatly diminish once credits are exhausted
      • Using these nodes for production is not recommended
    • Ensure at least 1Gb of network bandwidth for production workloads in addition to CPU/Memory requirements
      • More for I/O intensive use cases like SFG or C:D 

Azure Storage Options

Azure Container Storage

Azure has several container storage options and the following documentation which covers best practices for storage and backups in AKS. It is important to pay close attention to the tradeoffs between storage costs and performance, especially for I/O intensive workloads such as SFG or C:D. 

Note: The only supported AKS storage option as of the time of this blog's release is File Based Storage 

Azure Files

      • Has both a Standard and Premium option
        • Standard: backed by HDDs making this option not suitable for production
        • Premium: backed by SSDs and also supports NFS 4.1 which allows for ReadWriteMany (multiple pods can access the same storage at once)
      • Starting with K8s 1.21, Azure started using CSI drivers by default for all storage types. CSI drivers are plugins that provide more capability over defaults K8s "in tree" drivers
      • ReadWriteMany support
      • Azure has many storage redundancy options based on storage classes
        • Azure Files supports Zone Redundant Storage which replicates storage to multiple AZs in the same zone allowing pods to fail over between AZs
        • Use the link to ensure that the region you want to use supports Azure Premium ZRS storage
      • Ensure you understand the Azure File storage costs

Below are few more file-based options available to AKS (These are not currently supported):

Azure Netapp Files

      • NFS-based storage backed up Netapp hardware
      • Requires a Trident CSI driver to be installed and only available in certain regions
      • Suspect it is a high performance, higher feature option available at a higher cost

Azure Ultra Disks

      • Advertised as ultra high speed, but only available in certain locations
      • It is not clear whether it can be leveraged from AKS

Below are some details on Block, although it is not supported on AKS/Azure today according to our current documentation:

Azure Disks

      • Has a Standard (HDD) and Premium (SDD) option available
      • Only supports ReadWriteOnce (accessible by only one Pod at a time)
      • May fail over between AZs

Azure Object Storage

Azure Blob Storage (Object Storage)

      • Object storage is meant to store bulk data that is read infrequently and is not meant to be used as an active file system storage
      • Though there are three types of Blob Storage available in Azure, sterling only uses Block Blobs
      • There are two main use cases in B2Bi/SFG:
        • Document Storage: the ability to store Business Processes (BPs) in Blob storage as apposed to the database. This is to reduce the size of and load on the database
        • Adapter Storage: the ability for SFG to send/receive files to/from Blob as a transfer mechanism
      • Connect Direct also supports Blob storage as of 6.2.0

Azure Networking & Firewall Options

Notes and Considerations

Sample Options

    • Network Security Groups (NSGs)
      • NSGs are included as part of Azure VNETs and can be applied at the subnet or network interface level
      • Virtual firewalls and NSGs can be used in conjunction with one another, but are not the same
    • Virtual Firewalls
      • Provides more capability and security than Security Groups or Network ACLs by securing outbound and inbound traffic

Azure Database Support

You can either leverage a cloud Database Service or deploy a Database directly in the cluster.

Note: Though they generally work, Sterling currently does not officially support any Azure Cloud Databases. This is due to long periods of connection drops when Azure does database management.

Also, it is not recommended to setup your own database in containers as this requires expertise in Kubernetes, the database, and how the database works in containers. 

Self Managed Databases

Optional Add-Ons

There are many options for configuring an Azure Cloud Deployment. We can’t know or test all of them, but here’s a few that may be worth investigation:

    • Azure PrivateLink: Connect your VPC to some Azure services without going out to the public internet

K8s / OCP Configuration Best Practices

Sterling B2Bi/SFG

A properly configured termination grace period (terminationGracePeriodSeconds) allows for a container's PreStop hooks to finish and then the subsequent stopping of the container.

As of release 6.1.2, IBM Sterling B2Bi/SFG supports a user-configurable termination grade period in contain deployments. With a properly configured termination grace period, processes, BP cleanup, and other currently running workloads will be given a long enough period to stop gracefully before the pod is killed.


#Highlights
#Highlights-home
#sustainability-highlights-home
0 comments
8 views

Permalink