What you posted looks like a feasible path, although it looks like it was quite a legwork and will also require the assumedly minimal deployment of a web server as a minimum to handle the traffic and present the customized page plus the additional ingress controller.
Kudos for finding that out, I do think the approach is rather niche but useful in case some customer insists on having something like that.
Interloc Solutions Inc., US.
Original Message:
Sent: Mon July 22, 2024 07:46 AM
From: Andrzej Więcław
Subject: Gracefully handle MAS Manage downtimes
Hi Julio,
you have inspired me with your suggestion concerning Ingress Operator adjustments and I kept on exploring documentation. This way I found a feature called Ingress sharding which brought my attention.
As per documentation: By default, the Ingress Controller serves any route created in any namespace in the cluster. You can add additional Ingress Controllers to your cluster to optimize routing by creating shards, which are subsets of routes based on selected characteristics. To mark a route as a member of a shard, use labels in the route or namespace metadata
field. The Ingress Controller uses selectors, also known as a selection expression, to select a subset of routes from the entire pool of routes to serve.
I have built my solution which consists of following configurations:
- Create new ingress controller (
maintenance
) handling all routes in the namespace (ref. .spec.namespaceSelector
) or individual routes (ref. .spec.
routeSelector
) which are labeled e.g. type=maintenance
.
IMPORTANT: New ingress controller's domain property (.spec.domain
) should be set to the same value as OCP cluster domain. This way we avoid default
and maintenance
ingress controllers conflict, yet leaving default
ingress controller as one of higher priority when selecting ingress controller to handle requests based on HTTP host/path. - Update
default
ingress controller by excluding all routes which are to be handled by the maintenance
ingress controller (ref. Sharding the default Ingress Controller). - Configure new deployment and service which is meant to handle user traffic during the service window, when "maintenance mode" is on.
- Generate clones of MAS Core, MAS Manage, etc. routes and repoint
.spec.to.name
and .spec.to.targetPort
to the service created in previous step. We'll call those maintenance routes onwards. - Label newly generated maintenance routes (or namespace) with
type=maintenance
.
Once everything is in place:
- Enable maintenance mode by:
- labeling original MAS Core, MAS Manage, etc. routes (or namespace - ref. sharding using route or namespace labels) with
type=maintenance
- removing
type=maintenance
label from maintenance routes (or namespace).
- Disable maintenance mode by doing the opposite of Enable.
When "maintenance mode" is enabled then the default
ingress controller, which is configured to handle routes NOT labeled with type=maintenance
, starts proxying incoming traffic using maintenance routes, therefore ending with static content being served. Swapping type=maintenance
labels restores default behaviour.
In this solution technically speaking maintenance
ingress controller instances (PODs controlled by router-maintenance
deployment, automatically configured in the openshift-ingress
namespace) are never used. Due to the nature of OCP configuration it's the default
ingress controller that will always handle the load (based on request host domain matching) therefore in order to reduce resource consumption (CPU+memory) and port conflicts it's good idea to:
- limit ingress controller replicas (default: 2)
- change ports from standard (default: 80, 443, 1936) to some other ones, arbitrarily chosen.
So far I know that this solution works in cloud (AWS) and on-premise OCP deployments.
It's worth mentioning that all configuration steps described earlier (1-5) can be easily scripted. In fact, considering number of routes to (re-)create and tagging this seems to be the only reasonable option.
Furthermore, as I already succeeded doing, you may consider improving the "maintenance mode" deployment. Instead of being a simple "static page" server you can make it a reverse proxy which conditionally (e.g. based on IP of the incoming request) serves original content and only otherwise serves static page. This way you can enable "maintenance mode" for most of the users and at the same time allow others (e.g. deployers, others involved in service window activities) to normally access MAS Core, MAS Manage, etc.
I will try to document complete solution, including code and scripting, and publish it somewhere. I'll link it in this thread when I'm ready.
Based on rather poor feedback received it's hard to judge whether this topic is so niche that no one really cares or perhaps there is no other solutions out there.
I'm open for any feedback, especially concerning improvements.
------------------------------
Andrzej Więcław
Maximo Technical Consultant
AFRY
Wrocław, Poland
Original Message:
Sent: Tue July 16, 2024 03:45 AM
From: Andrzej Więcław
Subject: Gracefully handle MAS Manage downtimes
Hi Julio,
you're right, when PODs are down then we indeed get HTTP 503, rather than mentioned HTTP.
I haven't explored yet possibilities around HAProxy customization but maybe this is a way...
Thank you!
------------------------------
Andrzej Więcław
Maximo Technical Consultant
AFRY
Wrocław, Poland
Original Message:
Sent: Mon July 15, 2024 09:50 AM
From: Julio Perera
Subject: Gracefully handle MAS Manage downtimes
Hi Andrzej:
Normally you should not get "Error 500" responses when all Pods are down... As Manage ingress is "managed" by a standard OpenShift Route, when all Pods are down, you should get "Error 503 - Application is not available" with a somewhat descriptive message from OpenShift stating that:
Application is not available
The application is currently not serving requests at this endpoint. It may not have been started or is still starting.
Possible reasons you are seeing this page:
- The host doesn't exist. Make sure the hostname was typed correctly and that a route matching this hostname exists.
- The host exists, but doesn't have a matching path. Check if the URL path was typed correctly and that the route was created using the desired path.
- Route and path matches, but all pods are down. Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
The above page is generic/standard but can be customized as per section "Customizing HAProxy error code response pages" on the URL: https://docs.openshift.com/container-platform/4.12/networking/ingress-operator.html
Unfortunately, it seems to be an "all or nothing" situation, the customized page will be shown for all possible Routes (including non-existing ones) that have no available serving Pods and will not be specific to Manage.
Hope the above helps somewhat.
Regards,
Julio Perera
Maximo Technical Consultant
Interloc Solutions, US.
------------------------------
Julio Perera
Original Message:
Sent: Wed July 10, 2024 10:20 AM
From: Andrzej Więcław
Subject: Gracefully handle MAS Manage downtimes
Hi,
we're looking for a way to gracefully handle MAS Manage downtimes.
What I mean in its essence is the ability to redirect user requests, originally pointing to MAS Manage, to some static web page when Manage is down (e.g. during updatedb phase of the build process and before server bundle POD becomes ready).
For simplicity let's assume that the control over Manage downtime detection is fully manual so that it's Manage deployer's responsibility to activate/deactivate the redirect.
With classic Maximo we were doing this IHS config-based conditional rewrite rule which was intercepting user's requests using "file exists" test. So whenever we were activating "maintenance page" redirect then we were creating a file to trigger the rewrite and once done we simply were deleting the file, restoring normal operations.
It was just one of the ways to achieve what we intended, sufficient for our needs. Of course there are other ways to do it by e.g. updating DNS records, throwing in reverse-proxy in between end users and Maximo, etc. We do consider these options either by unnecessarily increasing application maintenance complexity (reverse proxy - yet another component to install and maintain) or resources dependency (IT personel available privileged to update DNS records).
With MAS Manage we noticed that Routes in mas-<inst>-manage namespace, handling requests to server bundles are managed by operator and changes to Route's essential settings (e.g. spec.to.name
, spec.port.targetPort
) are being overwritten with every MAS Manage operator reconciliation cycle.
Therefore we cannot mimic the approach we used in classic deployment by simply updating MAS Manage route(s) to point to some custom service, e.g. running nginx, which serves static content.
It could be that we're missing something or doing something wrong.
Do you have any suggestions how to achieve what we're aiming for?
Alternatively I would love to hear how else you're dealing with MAS Manage downtimes so that users see something more valuable than raw "HTTP 500 Service unavailable"? Any tips will be highly appreciated!
Thank you!
------------------------------
Andrzej Więcław
Maximo Technical Consultant
AFRY
Wrocław, Poland
------------------------------