webMethods

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to discussions

Expand all | Collapse all

Concurrency and cluster in wM IS

1. Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 12:55 PM

Reply
In another thread there was a discussion related to concurrent processing within wM IS.

Transformers in a map step.
Previous versions of SAG docs implied that the transformers can run concurrently. There are observers who indicate this never happens, and others who indicate they’ve seen it.

Relatedly, there was discussion about 2 behaviors related to IS cluster:

Execution of a given service moving from one node to another in the middle of execution.
There are observers who indicate this never happens, and others who indicate it might do that.

Processing of messages from UM where one IS instance gets the message from a durable subscriber topic and dispatches that to another IS instance in the cluster.
There is interest in learning how this would be configured as it seems counter to observed “UM client cluster” behavior where each node independently retrieves and processes messages from the queue for itself, not sending it to another node.

This thread created to continue the conversations if desired and not continue to overload the original thread that was for a completely different topic.

Side note: I cannot recall how to mark a topic as “this is a discussion so don’t pester me to mark a response as the ‘solution’”. Or if this needs to be in a different topic/forum area. I want to avoid being messaged/emailed and having the forums constantly prompting me to “mark it solved”. Any guidance about this would be appreciated.

#Integration-Server-and-ESB
#webMethods
#discussion
2. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 01:48 PM

Reply
reamon:

Execution of a given service moving from one node to another in the middle of execution.
There are observers who indicate this never happens, and others who indicate it might do that.

This happens in a stateful cluster. It uses database to store the pipeline and terracotta to store the session info. It basically splits the pipeline to run simultaneously on different integration servers.

reamon:

Processing of messages from UM where one IS instance gets the message from a durable subscriber topic and dispatches that to another IS instance in the cluster.

This was happening because of the acknowledge mechanism of IS on UM. It used to retrieve another message even though there was only 1 active trigger on a stateful cluster of integration servers. This should not be the case anymore. When I had this issue latest version was 9.12 and I remember it is possible to receive 1 message from 1 queue right now. I will add the documentation link if I find it.

Found a KB article about serial message processing. Its not mine but it is the same use case.
https://empower.softwareag.com/sl24sec/SecuredServices/KCFullTextASP/viewing/view.asp?prdfamily=webMethods&KEY=132242-11840677&DSN=PIVOTAL&DST=TCD

JMS serial triggers are not supported in an IS cluster. Clustered IS serial triggers are not supported with UM JMS. It is not part of the JMS 1.1 specification, and no other JMS provider supports it. Software AG did add support for this in v9.9 and up, but it was for native IS triggers only. This was done because Broker had a feature called Order By Publisher that can be used to ensure serial processing across multiple clients. We needed to enable this support for Broker replacement. This isn't clearly documented unfortunately. However note the documentation as stated below: "Using webMethods Integration Server to Build a Client for JMS", section "Consuming JMS Messages in Order with Multiple Consumers", has an outline only for Broker as the JMS provider. There isn't a way to do it with UM. To get serial processing in a cluster, the triggers will have to be native triggers and not JMS.

This should not be the case anymore, as IS should be able to process messages in serial manner even in a stateful clusters.

reamon:

Transformers in a map step.
Previous versions of SAG docs implied that the transformers can run concurrently. There are observers who indicate this never happens, and others who indicate they’ve seen it.

This is the interesting part in my opinion. There is a common misconception about running services as transformers will make them execute in paralel. This is even a common job interview question. Some webMethods experts expect you to run services as transformers instead of map steps in order to increase parallelism.

Personally I don’t think it is a good work around even if it executes services in parallel. A workaround is a workaround and if there is a proper way of implementing a feature, workaround must be avoided. Proper parallelism is implemented using publish and wait service.
https://documentation.softwareag.com/webmethods/integration_server/pie10-15/webhelp/pie-webhelp/#page/pie-webhelp%2Fto-publishing_documents_8.html

Even if transformers execute in paralel, it will still block the thread and wait for the reply until the last transformer finishes executing. In other terms, if a service call executes in 30 sec and another executes in 5 sec, it will occupy a thread while waiting for a response. Again, I am not implying that this works. I don’t know if transformers execute in parallel or not.

#webMethods
#Integration-Server-and-ESB
#discussion
3. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 02:25 PM

Reply
Thanks for the additional info!

Engin SARLAK:

This happens in a stateful cluster. It uses database to store the pipeline and terracotta to store the session info. It basically splits the pipeline to run simultaneously on different integration servers.

As noted before, I’ve never seen this. I’ve never seen it described such that a service execution thread can move between nodes. As @Rupinder_Singh described for a long-running conversation or as @Percio_Castro1 for checkpointed services in the face of failures, execution may go to any node.

Jumping nodes in the middle of a service execution makes little sense to me. The context switch and the handoff, then presumably the activity MUST come back to the original node so that the caller can get a response from the machine it connected to for execution in the first place.

But I remain open to the possibility that this is possible given the right configuration. Would very much like to know what that configuration might be. Would need more details beyond “IS cluster” – IS clustering has a lot of common misconceptions too so if anyone has more details that would be great.

Engin SARLAK:

of the acknowledge mechanism of IS on UM

The differences in observations may be due to using UM via “native wM Messaging” vs JMS. I’ve never used the JMS connectivity with Broker nor UM. Complete pain compared to the simplicity of “native”. Fortunately, we’ve never had any need to connect to any other JMS provider, messaging is implemented entirely within our wM installation, so we never use JMS.

But still, your description stated that one IS node retrieved the message from UM and handed it off to another IS node. I’m not aware of any configuration where IS does that. Do you have more detail, beyond IS stateful cluster?

Engin SARLAK:

There is a common misconception about running services as transformers will make them execute in parallel. This is even a common job interview question. Some webMethods experts expect you to run services as transformers instead of map steps in order to increase parallelism.

IMO/IME, it is indeed a misconception. I have seen interview question templates that have this too. Based upon my observations, this is just incorrect.

Engin SARLAK:

I don’t know if transformers execute in parallel or not.

Appreciate the openness on this. I would offer that folks consider that there have been 3 people in the other thread that have been working with IS a very long time that unequivocally state that it does not do this. I don’t say this to mean longevity is proof but more that if this were the case, it would seem it would have been encountered at some point. Of course if anyone has docs/evidence that transformers in a MAP step are executed in parallel, there are many people that would be very interested.

#Integration-Server-and-ESB
#discussion
#webMethods
4. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 02:47 PM

Reply
reamon:

Fortunately, we’ve never had any need to connect to any other JMS provider, messaging is implemented entirely within our wM installation, so we never use JMS.

I got my lesson that time and promised myself to never use it unless there is a strong reason. We were using UM as entry point to our integrations and I was at the beginning of my career.

reamon:

But still, your description stated that one IS node retrieved the message from UM and handed it off to another IS node. I’m not aware of any configuration where IS does that. Do you have more detail, beyond IS stateful cluster?

It has been several years and I don’t work for that company anymore and I don’t even live in that country either so other then documentation I have no proof. I might squeeze a POC in my current work but unfortunately not anytime soon.
https://documentation.softwareag.com/webmethods/integration_server/pie10-15/webhelp/pie-webhelp/#page/pie-webhelp%2Fto-clustering_overview_2.html%23

You know, everything(almost) is a webservice in integration server and as long as you have the pipeline you can resubmit/trigger any service payload at any given time. Stateful clusters keep the session data in Terracotta and keep the pipeline information in database, like when we enable the audit and save the pipeline upon failure/always etc. It keeps the pipeline (basically inputs of the service to be executed) and from there, any integration server can pick up that pipeline and execute it. It doesn’t necessarily mean they run in paralel again. If the flow of the service is iterative, it won’t execute in paralel but rather it will execute the pipeline whichever node is available at any time. This caused us a problem when consuming messages from UM. We set 1 trigger enabled 1 trigger disabled so that we would limit the active triggers for that queue. But since any integration server can take over the message and continue processing, the other node that we disabled the triggers took over the execution for that message, and the trigger enabled node consumed another message because it didn’t have an active thread for that queue at that time. I tried changing acknowledgment mechanism to fix the issue, but it required deep manual development that time. If the enabled trigger would wait to send ack or wait to consume another message before sending the ack my problem would be solved. I don’t remember which was the case though. It might be either. It certainly felt like too much engineering would be required this in that architecture, and we were using integration server only as message consumer back then, so we just disabled 1 node to fix that issue. It wasn’t an elegant solution, but it was the simples and it required the least work.

#Integration-Server-and-ESB
#discussion
#webMethods
5. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 04:24 PM

Reply
We may have to decompose this topic into even smaller subtopics

I hope I comprehed what is being discussed well enough not to throw this topic into an unrelated tangent, but here are my 2 cents when it comes to messaging:

As long as two (or more) Integration Servers have the same client prefix when connecting to the messaging provider (UM or Broker), it will be seen as a single client (whether stateful clusters are used or not). This essentially creates a single queue in the messaging provider for that given client. Two (or more) Integration Servers should never be able to pick up the same message at the same time because the message can only be dequeued once since it’s only one queue being shared. Of course, if one IS picks up the message and fails to acknowledge it within the specified ack window, that message will become available to be picked up by other IS’s with that same client ID. If I understand correctly, however, you’re saying that an IS, whose trigger was disabled, somehow had its trigger service executed with a message that was originally subscribed by that same trigger on a different IS, is that right?

When it comes to delivering message to specific IS’s, it is possible by having a connection that has a different client prefix in each IS. You can then use pub.publish:deliver to deliver to a specific IS, but I have personally not seen a use case that made this option worthwhile. Similarly, you can broadcast a single message to all IS’s in a cluster via a publish as long as the IS’s use a connection with a different client prefix. Again, this is possible because a different client prefix results in separate queues being created in the messaging provider and it’s not necessarily a function of a stateful IS cluster. The Process Engine used this approach via the PE_NONTRANSACTIONAL_ALIAS connection to broadcast messages to different Integration Servers in a stateless cluster.

When it comes to the behavior of stateful clusters:

I honestly haven’t heard of the feature that allows for different steps of a given service to be executed in different servers in the cluster. Yes, I’m aware that session objects are stored in the distributed Terracotta cache, allowing perhaps for a service to persist information into a session object that can be retrieved by another IS in the cluster (e.g. a shopping cart use case). I’m also aware of the “checkpoint restart” pattern that the documentation recommends but which does not come in handy in the real world very often (if ever). However, automatic storage of the pipeline into the database and automatic orchestration of the execution of different steps across different servers is news to me. Sounds interesting… but also very expensive in terms of performance. I would love to hear details so I can give it a try.

Percio

#Integration-Server-and-ESB
#webMethods
#discussion
6. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 04:53 PM

Reply
Percio Castro:

If I understand correctly, however, you’re saying that an IS, whose trigger was disabled, somehow had its trigger service executed with a message that was originally subscribed by that same trigger on a different IS, is that right?

Yes. If you check the IS clustering guide, there are steps that you need to configure in order to have a stateful cluster. One of them is having a distributed cache to store sessions data, but this has nothing to do with the service payload transfer. It is just to keep expensive session data shared across the cluster. The other is configuring the JDBC pools so that they can keep their session data in database. I believe they were ISCoreAudit and ISInternal but only one of them might be enough. If you check the architecture of the stateful cluster, you will see there is a database component. This database is used as layer between Integration Servers to share pipeline data. It is not that expensive since it already does it with internal DB same way. You only add a network layer so that load will be distributed equally and external RDBMS usually have better performance than a internal DB and if you are already saving pipeline upon failure, drawback is almost none.

https://higherlogicdownload.s3.amazonaws.com/IMWUC/webMethods/Files3/b932eea06e71e65049203572d65d5be4b2fb700c_2_1035x465.jpeg 1.5x, https://higherlogicdownload.s3.amazonaws.com/IMWUC/webMethods/Files3/b932eea06e71e65049203572d65d5be4b2fb700c_2_1380x620.jpeg 2x" data-dominant-color="DDDFDF">
image1762×794 126 KB

If you don’t configure the database or terracotta server array, that means you don’t have a stateful cluster. An Integration Server that doesn’t have stateful cluster configuration configured is a stateless one. The part about the prefixes and such are also a requirement but they are irrelevant. The issue was caused because the active node didn’t have a process running in the back ground hence, it assumed it finished processing the message and consumed another one. The other node that had trigger disabled was just running the payloads that it was able to run. So whenever inactive trigger node took over the payload, the other one thought that it finished processing and consumed another message. This was 8 years ago and I believe this behavior causing threads mixing together was solved already. It has nothing to do with JMS or prefixes. It was a multi-threading problem.

Percio Castro:

but also very expensive in terms of performance.

This is not entirely true since internal database also consume resources from disk I/O, memory and cpu time. Configuring the external DBs are good for performance. The extra overhead that is caused by network layer can be ignored since load balancing is more efficient with stateful clusters.

#webMethods
#discussion
#Integration-Server-and-ESB
7. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 05:04 PM

Reply
I guess we can agree to disagree on whether storing input pipelines for services is expensive, afterall, “expensive” is a relative term that depends on pipeline size, physical resources, and how long your service spends doing other stuff. I will say though that if it wasn’t expensive, it would be on by default and everyone would do it, but that’s not the case.

But I digress… let’s pretend for a second that it’s not expensive. I configure two IS servers to be in a stateful cluster with a shared database and a shared Terracotta cache. I then configure auditing so that the input pipeline for all services is saved. So far, so good. The piece I’m missing is: what do I do next to cause a single service execution to be shared across the two different servers?

Percio

#Integration-Server-and-ESB
#webMethods
#discussion
8. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 05:10 PM

Reply
Using stateful clusters has benefits and drawbacks as well. It certainly isn’t easy to configure, that’s one thing. The benefit of having it is reliability. Assume there is a crush in production. Since its a production environment, there will be in progress service runs when it crushes. So if it is a stateful cluster, the request will be completed on another node and no client will be effected, no data will be lost.

Percio Castro:

I will say though that if it wasn’t expensive, it would be on by default and everyone would do it, but that’s not the case.

This is not a valid argument. Half of the world is driving on the opposite direction according to the other, but this doesn’t make driving on the left or right is any more true then the other. It is not default because we need 2 extra components to implement it. Using stateless clusters is much more easier to configure.

From the documentation:

Blockquote
Failover Support for Stateful Clusters
Failover support enables recovery from system failures that occur during processing, making your applications more robust. For example, by having more than one Integration Server, you protect your application from failure in case one of the servers becomes unavailable. If the Integration Server to which the client is connected fails, the client automatically reconnects to another Integration Server in the cluster.
Note:Integration Server clustering provides failover capabilities to clients that implement the webMethods Context and TContext classes. Integration Server does not provide failover capabilities when a generic HTTP client, such as a web browser, is used.
You can use failover support with stateful clusters.
Reverb

#webMethods
#Integration-Server-and-ESB
#discussion
9. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 05:37 PM

Reply
Engin SARLAK:

Assume there is a crush in production. Since its a production environment, there will be in progress service runs when it crushes. So if it is a stateful cluster, the request will be completed on another node and no client will be effected, no data will be lost.

I would love for this to be true but it isn’t. There’s nothing in a stateful cluster that causes the transaction to be automatically restarted in another node in the cluster, let alone restarted from where the other server left off. State information has to be maintained by the code along the way for this to be true (either in the session object or in some other object in the distributed cache or via some other shared component like a database) and the client has to retry the transaction to cause it to be transferred to the other server. If the client is a Java client using a TContext or Context class, the failover will be done automatially, but again, it happens in the form of a retry from the client. It’s not something that happens on the server.

The only automated retry that I’m aware of that does gracefully cause a transaction to move from one server to another in the event of a crash, for example, is if messaging is used. But then again, this is not a feature of the stateful cluster but a feature of the messaging infrastructure, given that if the message is not acknowledged by one IS, it will eventually be made available for pick up by a different IS using the same client prefix. Even in this scenario though, execution will restart from the beginning of the trigger service. There’s nothing that will cause the transaction to restart from the point of failure unless the code maintains and checks for state.

Percio

#webMethods
#discussion
#Integration-Server-and-ESB
10. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 05:57 PM

Reply
reamon:

It is akin to the checkpoint and resume behavior

This check point is any step in the flow. After any step finishes a new pipeline is created and saved into database.

reamon:

It has been this way ever since I can remember.

This post is pretty old. Back then they were using (I was still in college that time) Oracle Coherence as cache layer if my memory is accurate. After 20 years I wouldn’t assume everything is still the same. Sometimes I see workarounds implemented decades ago because of a product incapability, and I see people still implementing the same workaround even that incapability is long gone. Its like applying windows 95 patches to windows 11. There weren’t universal messaging, terracotta back then. not even windows 7.

Percio Castro:

I would love for this to be true but it isn’t. There’s nothing in a stateful cluster that causes the transaction to be automatically restarted in another node in the cluster, let alone restarted from where the other server left off.

This claims the opposite of the documentation, which has the same information for decades. Are you absolutely sure it wasn’t a bug or configuration error, and if this is a bug (since you claim it doesn’t work as the documentation it should be a bug) is not fixed for so many years? I am pretty sure it continues processing if the configuration is done properly.

Below is the chart what clustering does and what it doesn’t. If you claim it doesn’t do something from this chart, you should create a support ticket and ask them to fix it IMO.
https://documentation.softwareag.com/webmethods/integration_server/pie10-15/webhelp/pie-webhelp/#page/pie-webhelp%2Fto-clustering_overview_10.html%23

#webMethods
#Integration-Server-and-ESB
#discussion
11. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 09:38 PM

Reply
Engin SARLAK:

This check point is any step in the flow. After any step finishes a new pipeline is created and saved into database.

I’ve seen nothing that indicates each step is check-pointed and that execution can jump back and forth between nodes during a single invocation. Multiple calls, within or not within a stateful conversation can certainly bounce between nodes as the LB will route to a node per its config (round robin, least busy, etc.). To my knowledge, IS itself does nothing explicit to manage which node executes a given request. But that is indeed what this thread is for – to fill any gaps I may have. I appreciate your continued input.

Engin SARLAK:

This post is pretty old. Back then they were using (I was still in college that time) Oracle Coherence as cache layer if my memory is accurate. After 20 years I wouldn’t assume everything is still the same.

A reasonable point. I am not making any assumption. I’ve not seen anything that indicates this is something that is done. The docs you’ve shared do not describe what you describe. There is nothing about storing the pipeline in the DB after every FLOW step. Nothing about each step being a checkpoint – only that the FLOW service must explicitly invoke a service to establish a checkpoint. In the doc link you shared in the response to @Percio_Castro1 there is one thing that sticks out for me:

The client automatically retries the service…

This is mentioned in most of the lines. And where it isn’t mentioned, it indicates manual restart of the service is needed. What is not emphasized here, but noted on another page, is that the only client type to which this is available is HTTP clients that use the Context/TContext classes. Not sure how many in the SAG wM universe actually use that. This caveat/limitation has been in the documentation since the beginning. They have never changed it.

Of course, if I’m missing or misreading what has been shared please chime in.

I recall when Terracotta replaced Coherence. I was hopeful about the possible additional capabilities. If memory serves, it was just a different implementation of the same capabilities. Possibly simply to get away from Oracle. (Replacing Broker with UM IMO was a similar “lateral” move (save for the HA/clustering feature) but that’s a different topic. )

#discussion
#Integration-Server-and-ESB
#webMethods
12. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 10:12 AM

Reply
Engin SARLAK:

This claims the opposite of the documentation

Engin,

I’ve gone through the documentation and nothing I have stated contradicts it. In fact, the link you provided as a response to my post reiterates what I’ve stated. The words “the client automatically retries/restarts” are used over and over. At the same time, I’ve looked for information that supports your claim, but I’m failing to find it.

One thing I think we can safely establish is that the documentation could be a bit clearer. Certain sections leave room for (mis)interpretation, which leads to this type of confusion. For example, the point that a couple of us have been trying to make that automatic “failover” is a responsibility of the client is just briefly touched upon the section “Failover Support for Stateful Clusters” and they do so in the form of a note. Given the title of that section, they could have gone into more detail.

Reverb.

The authors do elaborate a bit more on those two Java classes later on though: Reverb

Now, regarding checkpointing after each step and automatic failover, it sounds like a really cool feature. However, let’s consider the daunting technical details of the implementation for two very simple use cases:

(1) A Flow service with a LOOP. Are the steps in the LOOP also checkpointed so execution can recover mid-LOOP? If so, is the pipeline for each step and each iteration of the LOOP stored in the database? Consider a LOOP that loops hundreds or thousands of time and the impact that would have on a simple mapping service.

(2) A service that calls multiple adapter services related to a single LOCAL_TRANSACTION. What happens if a service begins to execute on one node, the JDBC transaction is initiated there, an adapter service is executed there, but then it fails over to another node. How could the transaction be continued from another node?

These are just two simple use cases that make the concept fairly impractical.

Having said all this, I completely agree with you that as engineers, we can take a break from talking about documentation and hypotheticals and we can put this thing to the test. If you will soon have a stateful cluster available that you can play with, I’ll wait for your results. Otherwise, I’d be happy to spin one up.

Thanks for your contributions here,
Percio

#discussion
#Integration-Server-and-ESB
#webMethods
13. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 05:42 PM

Reply
Engin SARLAK:

If the Integration Server to which the client is connected fails, the client automatically reconnects to another Integration Server in the cluster.
…
…Integration Server clustering provides failover capabilities to clients that implement the webMethods Context and TContext classes.

As noted in the doc, failover occurs ONLY with HTTP clients that use the wM IS API. Which IME, is nothing – I’ve never seen anyone use the IS client libraries. No other clients support doing that and the server has no role in this failover. It is entirely up to the client using Context/TContext to see the error and resend, which will go to the other node. It has been this way ever since I can remember.

That description does not indicate that a service that is in the middle of executed can suddenly jump to running on another node. It is akin to the checkpoint and resume behavior @Percio_Castro1 describes.

#webMethods
#discussion
#Integration-Server-and-ESB
14. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 08:44 PM

Reply
I can absolutely confirm that service inputs and outputs are not stored in the database for sharing across multiple servers. You can selectively save parts or all of the pipeline for audit logging but that is also not recommended as its a performance issue, unless used selectively. And that feature is available irrespective of the cluster being present or not.

Its not technically possible for very step in a service to serialize its input and output as that would be terribly slow. And even if that wasn’t so, keeping track of what executed where wouldn’t be so simple. Stateful clusters only allow for session state to be shared, which means if there is data needed to be shared between multiple calls, for example a call to add to cart followed by another call to checkout, then they can get access to the same state even when the second call goes to a different service because of load balancing.

Object Serialization is one of the most expensive operations in java. And the pipeline is a very complex object with multiple object references. Serialization of all that is so expensive that its never recommended. Serialization of that to a database would mean that the server would not even be usable, if it was done at every step.

I am making these arguments to just convince everybody. Otherwise, I know for sure that this doesn’t happen.

Rupinder

nibl.tech

Nibble Technologies

Observability, profiling, testing and other productivity tools for the webMethods Integration Server

#Integration-Server-and-ESB
#discussion
#webMethods
15. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Wed February 21, 2024 09:41 PM

Reply
Just a quick note to share my view that this exchange is awesome! I really appreciate the sharing of info and experiences from everyone!

I think we’re all after the same thing – correct understanding of the capabilities of wM IS and IS clustering.

#discussion
#webMethods
#Integration-Server-and-ESB
16. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 12:33 AM

Reply
In case of a Process Engine cluster configured across multiple IS, the execution of the BPM works with Subscription trigger and Transition trigger. The QOS setting (Optimize locally) determines and publish transition document to UM/Broker, and the next step of the BPM model execution could be from different IS that’s part of the PE cluster.

In the process instance monitoring view, can see which step executed in which IS.

This happens only because there is a pub/sub happening in between those steps with a messaging product and not by IS by itself.

When a series of steps part of a flow service executes, it just executes in the same server where it is invoked.

#Integration-Server-and-ESB
#webMethods
#discussion
17. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 08:44 AM

Reply
I don’t see anyone showing any proof that it doesn’t do what the documentation indicates. Everybody is referencing their past “experiences”. The problem with experience is, it is not documented and it subjective. Engineers don’t rely on experiences, they rely on proof of concepts. It may very well be a configuration error that you couldn’t observe this node switching behavior. I might be wrong as well, so is the documentation, but I refuse to believe the documentation is providing inaccurate information for decades. Someone would have tried this feature and failed by now and created a ticket and demanded SAG to fix this by now.

Since all off you strongly disagree to me, can you tell me when was the last time each of you used stateful cluster and observed its behavior? I used stateful clusters with version 9.6 and 10.5 and working on a POC for version 10.15 on kubernetes cluster. This is one of my test cases. I will update this thread once I have that test data available.

reamon:

The client automatically retries the service…

This is mentioned in most of the lines. And where it isn’t mentioned, it indicates manual restart of the service is needed.

If what that line meant was client’s manual retries, it wouldn’t make sense would it? If it relied on clients manual retries, F5 would be enough for that feature and there wouldn’t be any benefit of having a stateful cluster. Any operation failed in mid execution can be retried anyway as long as you have an F5 as cluster IP. F5 would know a server is offline and would deliver that request to another and it would always start from the beginning, not from mid level.

#Integration-Server-and-ESB
#discussion
#webMethods
18. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 10:50 AM

Reply
Engin SARLAK:

I don’t see anyone showing any proof that it doesn’t do what the documentation indicates.
…
I refuse to believe the documentation is providing inaccurate information for decades

My POV is the documentation does not state the behavior you have described. Nowhere have I seen it state that it records the pipeline for every FLOW step*** and the IS nodes in cooperation somehow determine which node is going to run a step to move the FLOW service execution forward. Nor have I seen that IS itself will dispatch a message from a queue to another IS instance.

*** the PE/PRT “steps” as noted by @Senthilkumar_G are different, and use messaging to “hop” nodes – and I assume you’re not referring to this

Engin SARLAK:

Since all off you strongly disagree to me, can you tell me when was the last time each of you used stateful cluster and observed its behavior

I have not used them in 10+, maybe 15+ years except for PE/PRT where it is used because it required for multi-node operation of PE/PRT. Other than that never had a need. But in the spirit of “engineers don’t rely on experiences” I’m still hoping to find documentation that supports your descriptions. Side note: “proof of concepts” are experiences, no?

Engin SARLAK:

If what that line meant was client’s manual retries, it wouldn’t make sense would it?

It does not say nor mean client manual retries. It means client automatic retires – but only if the client is using Context/TContext. It applies to nothing else. These too, will start at the beginning on retry, unless the service being called has explicitly implemented checkpointing and has logic to pick up from the last checkpoint.

Engin SARLAK:

F5 would be enough for that feature and there wouldn’t be any benefit of having a stateful cluster

Stating the obvious and I know we all get this, but just to be clear: Stateful cluster is useful for state sharing, when there are multiple calls for a given interaction from a client. Call one goes to node 1, it stores the state in session or pub.storage, returns, call two, perhaps to a different service, goes to node 2 (or 3 or 4) and that node can read the state from the session or pub.storage, etc. The stateful cluster is for managing the multiple interactions of a client with the server cluster, not for the clustered IS nodes to bounce the execution around themselves.

If memory serves, IS clustering did indeed try to provide load balance when it first came out. That feature was removed long ago.

Engin SARLAK:

would deliver that request to another and it would always start from the beginning, not from mid level.

Correct. To continue mid-level requires the service be coded in a way to explicitly support checkpoints. If there is only 1 IS (which there never is for production, but just mentioning as a thought exercise) then the service can resume mid-level as needed without IS stateful cluster. If there are multiple nodes, IS stateful cluster supports the subsequent call to be routed to any node in the cluster and the service using the shared Terracotta stored state to know where to resume. But again, based upon my understanding of the documentation, this requires the service to be explicitly coded to support this. It is not automatic with the runtime tracking each step.

I think everyone is very open to learning that this behavior does exist. Just have not seen any docs that provide that so far.

#webMethods
#Integration-Server-and-ESB
#discussion
19. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 11:09 AM

Reply
By the way, also stating the obvious but I want to make sure we’re all on the same page: even when the “checkpoint restart” pattern is used, execution of the service still starts from the beginning of the top-level service. As far as I know, there’s nothing in the platform that allows for execution to start at an arbitrary step in the service. It’s just that the service implements a bunch of if…then…else… statements (or BRANCHes in our world) to determine where to start from. See diagram here: Reverb

Also, this pattern is not dependent on a stateful cluster. The “checkpoint” can be implemented in a number of different ways. Using pub.cache or pub.storage are two options, but a custom implementation using some other shared persistence layer would work just the same. The pattern is not even platform-dependent for that matter.

Last but not least, for what it’s worth, I never found this pattern to be useful in the real world. If a service is complex enough that it requires checkpointing multiple times in its execution, then in my experience, it should be refactored or redesigned.

Percio

#webMethods
#Integration-Server-and-ESB
#discussion
20. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 10:13 AM

Reply
@engin_arlak the problem with your ask is that you are asking people to prove a negative. I can confirm that I am not speaking from guesswork. I don’t know which documentation says that each step is executed in FLOW by serializing the pipeline to disk. And I have been doing webMethods for 25 year out of which some were as the head of the webMethods product line. And I can confirm this was never the intended behavior or a side effect of anything planned. The only way to have services restart on another server are by using the Process Engine, Messaging or Guaranteed Delivery. And none of those have the capability to restart in the middle of a service. The only way to do that is to custom code it using checkpoints, which are hardly ever used.

#Integration-Server-and-ESB
#webMethods
#discussion
21. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 12:05 PM

Reply
Rupinder Singh:

the problem with your ask is that you are asking people to prove a negative.

This is not like proving something that does not exist, like claiming there is an elephant in the building and it is living in a safe and trying to disprove it. It can be true or not, but in order to disprove it you need to check all the safes. This is not the case here. If you have a stateful clusters, all you need to do is do load test and crush one of the nodes during load test. This is not a philosophy question.

reamon:

My POV is the documentation does not state the behavior you have described. Nowhere have I seen it state that it records the pipeline for every FLOW step*** and the IS nodes in cooperation somehow determine which node is going to run a step to move the FLOW service execution forward. Nor have I seen that IS itself will dispatch a message from a queue to another IS instance.

I should have been more clear on that entry. It is not directly related to stateful clusters. What I meant by that was:
Before executing a step, Integration Server passes the pipeline data to that step.
Step executes with that pipeline data, and creates another pipeline after executing that. If the step involves multiple inner steps, same process will be applied there as well.
In short, every step executed generates a new pipeline. I don’t know how frequent integration server saves the pipeline. But stateful clusters will use the database to save the pipeline. It may be creating check points for every step, or it may be creating random check points. It can only be known by the developers themselves, unless they disclose that information somewhere, like documentation. Without that information what I did was only a speculation, a guess.

reamon:

Side note: “proof of concepts” are experiences, no?

Referencing a past POC data is OK, but claiming something strongly just because you have more experience then someone else is not scientific. According to that logic oldest people should know everything. Its certainly not the case, as people grow older they tend to reject learning new things and they want to keep doing the same thing they have been doing the most. Thats why kids are really better with technology then most of us. This doesn’t indicate that as people grow older then don’t learn anything or they learn less. It is not, experience is important. But when there is a disagreement on a subject we need to stick with the facts. That’s how science makes progress.

reamon:

I have not used them in 10+, maybe 15+ years except for PE/PRT where it is used because it required for multi-node operation of PE/PRT. Other than that never had a need.

Exactly my point. How do you know it is still the same? Windows 95 runs on DOS. According to my past 30 years of windows experience can I claim windows still runs on DOS? People are still disabling swap space because they think it slows down the execution speed by using page file. May be it was the case when we had 64 mb of rams and for servers it wasn’t necessary. It certainly is not the case anymore. Same thing applies to using nolock in every tsql select query, or running services as transformers. We should update our knowledge and believes frequently. I asked that question to ask you all when was the last time you updated your knowledge on this subject.

My point is, if stateful clustering doesn’t do anything for reliability then, why implement it? What is the use case for it? Why add it to documentation? It can’t work without F5 anyway, what is the point having it according to you all? What do you think Devs were thinking when they implemented it?

I didn’t reply unrelated parts in order not to get off topic. I am not talking about check point pattern. Stateful clusters may or may not use that out of the box. I don’t have that information about it so no need to go off topic here.

If I sound rude or something, please excuse me. English is not my primary language, and I certainly am not too polite even when speaking my primary language. That certainly is not intentional. Just wanted to clarify it just in case.

#Integration-Server-and-ESB
#discussion
#webMethods
22. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 12:58 PM

Reply
Engin SARLAK:

claiming something strongly just because you have more experience then someone else is not scientific

True. And I hope that nothing I’ve said indicates “been doing this a long time” as proof. I’ve tried to be careful about that but if I’ve missed in a place or two, I apologize. Documentation or evidence is by far preferred and is what I’m looking for…

Engin SARLAK:

Exactly my point. How do you know it is still the same?
…According to my past 30 years of windows experience can I claim windows still runs on DOS?

This assumes that we’ve only been stating “my experience from long ago indicates X.” I don’t think anyone has made that claim without disclaimer or other info. Rather, “have never seen it and the docs don’t indicate that it has changed.” My understanding, based upon past exposure and document reviews over the years, is that IS clustering has not appreciably changed in a long time. Even when it changed to use Terracotta. I may have missed something – thus this thread exists.

Engin SARLAK:

if stateful clustering doesn’t do anything for reliability then, why implement it?

To allow multiple nodes to handle multiple interactions from a client in a stateful way. E.g. an interaction that requires a client to make multiple calls to achieve a “transaction” and the server can save session/state for the cluster so that any node can do the right thing given the state. It certainly aids in reliability so that if one node goes down, other nodes can still handle the calls without losing any existing state. But based upon docs and my understanding, an IS server instance does not hand things off to another node for execution. And if an IS instance goes down mid-execution, no other node will pick up that work where it left off on its own.

The session/state management is the same as any application server – stuff things in the session object (or elsewhere) that is accessible to all nodes so that any node can process the first, second, third…nth call in a given logical transaction. Not for an IS that is in middle of running a service to pass it off to another node mid-execution.

You do not sound rude. And hopefully no one feels any of the exchange has been rude or confrontational. I am finding it very helpful. And respectful.

#webMethods
#Integration-Server-and-ESB
#discussion
23. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 12:59 PM

Reply
Engin SARLAK:

My point is, if stateful clustering doesn’t do anything for reliability then, why implement it? What is the use case for it? Why add it to documentation? It can’t work without F5 anyway, what is the point having it according to you all? What do you think Devs were thinking when they implemented it?

The short answer to your question is here: Reverb (EDIT: note that “failover support” here refers to the use of the client Context/TContext Java classes as described in that section which I linked in my prior post)

If you need any of the features for which the second column states “No”, then you need a stateful cluster. If you don’t, then a stateless cluster is preferred due its simplicity.

The longer answer is that stateful clusters were required by many other features in the past (e.g. scheduled tasks, certain types of trigger joins, etc.), but slowly but surely, Software AG started removing some of those requirements because synchronization across the cluster opened the door to other problems, especially in the days prior to Coherence, where webMethods used its home grown repo for managing state. I worked as a Professional Services consultant for Software AG from 2008 to 2014 and I recall several conversations on topic back then as R&D started to make these changes. I’d be happy to dig up some older conversations if it will help provide more context, but in a nutshell, this is the reason.

Percio

#Integration-Server-and-ESB
#discussion
#webMethods
24. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 01:25 PM

Reply
Both of yours last replies make sense but I can’t claim they are true unless I see it myself. Like I said earlier, this will be one of my test cases for my upgrade project. I will also add symbolic steps with debuglog service. probably 10 or more and possibly add some delay in between some of the steps.

I will be glad to test it as well if you have a specific test case in your mind.

For this query, I want to build a 2 node automatic scaling IS cluster, enable the pipeline to save upon failure and do a load test. After seeing it is not scaling up anymore, I plan to destroy all of the nodes forcefully except one.

For test service, I plan to use a simple db insert query. Let me know if you think this will clarify this or not and if you have a better or parallel test scenario. It will certainly be helpful whether it works one way or another. I will share my test result in this topic.

#discussion
#webMethods
#Integration-Server-and-ESB
25. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 01:39 PM

Reply
I’m looking forward to your results. As for a test, I think a simple service that does the following would be plenty:

Log an initial message (e.g. BEGIN)

Sleep

Log a final message (e.g. END)

Then with a simple 2-node stateful cluster, invoke the service from any client on node 1 and kill that node while the service is sleeping. If server-side automatic failover is a real thing, you should see the same service automatically execute on the other server without a client retry. If step checkpointing is also a thing, you won’t see the BEGIN log statement in the 2nd node, you will only see the END statement.

If you want to validate whether the pipeline is shared across, you could even generate a GUID in the very beginning of the service and log it in steps #1 and #3. If the GUID generated in node 1 also appears in the log statement in node 2 (assuming automatic failover happens), then you know that the pipeline was shared across the nodes.

Good luck,
Percio

#webMethods
#discussion
#Integration-Server-and-ESB
26. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 01:46 PM

Reply
This is much simpler then what I thought, I will test this first but I need to build my environment for this first and unfortunately I am no kubernetes expert so it will take a while.

If this doesn’t work we can find out with my test case as well. Will keep this post updated in the future.

#discussion
#Integration-Server-and-ESB
#webMethods
27. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 01:58 PM

Reply
Engin SARLAK:

unfortunately I am no kubernetes expert so it will take a while.

I get it. For this test, it may be easier to just go with a simple Docker Compose file that uses the images for IS and Terracotta from the Software AG Container Registry to spin up a simple environment without the complications of Kubernetes. If that feels complicated too, then a simple bare metal install on your PC and then trashing it after the test may be the quickest path.

Percio

#webMethods
#Integration-Server-and-ESB
#discussion
28. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 03:28 PM

Reply
If I weren’t already building a POC environment and if I didn’t have a stateful test environment I would do that. Stateful cluster configuration is not super easy, I don’t want to waste my time. It needs db, load balancer etc. that I can’t build myself. I don’t have terracotta helm charts yet, it could take less time if I had them in my hand already. SoftwareAG was supposed to put them to helm repo 3 weeks ago, they still haven’t done it.

#Integration-Server-and-ESB
#webMethods
#discussion
29. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 01:49 PM

Reply
Percio Castro:

If the GUID generated in node 1 also appears in the log statement in node 2 (assuming automatic failover happens), then you know that the pipeline was shared across the nodes.

If this occurs, I’m very interested in how the client would ever get the response. When node 1 goes away, the client will get disconnected and fail. There is no way I’m aware of where a client that HTTPs a POST/GET to node 1 will get a response from node2.

#discussion
#webMethods
#Integration-Server-and-ESB
30. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Thu February 22, 2024 03:22 PM

Reply
This is possible with async calls but I don’t remember if IS actually supports async calls. As long as it uses a callback service, it is possible. Also it doesn’t have to return a response, my service wont be expecting a result, it will insert it to db.

#Integration-Server-and-ESB
#discussion
#webMethods
31. RE: Concurrency and cluster in wM IS

Like
webMethods Community Member
Posted Fri March 22, 2024 05:39 AM

Reply
reamon:

Side note: I cannot recall how to mark a topic as “this is a discussion so don’t pester me to mark a response as the ‘solution’”. Or if this needs to be in a different topic/forum area. I want to avoid being messaged/emailed and having the forums constantly prompting me to “mark it solved”. Any guidance about this would be appreciated.

Hi Rob,

For that please add the newly introduced tag discussion to topics for which you don’t want to get these “mark as solved” reminders.

#Integration-Server-and-ESB
#webMethods
#discussion

webMethods

webMethods

Concurrency and cluster in wM IS

webMethods Community MemberWed February 21, 2024 12:55 PM

webMethods Community MemberWed February 21, 2024 01:48 PM

webMethods Community MemberWed February 21, 2024 02:25 PM

webMethods Community MemberWed February 21, 2024 02:47 PM

webMethods Community MemberWed February 21, 2024 04:24 PM

webMethods Community MemberWed February 21, 2024 04:53 PM

webMethods Community MemberWed February 21, 2024 05:04 PM

webMethods Community MemberWed February 21, 2024 05:10 PM

webMethods Community MemberWed February 21, 2024 05:37 PM

webMethods Community MemberWed February 21, 2024 05:57 PM

webMethods Community MemberWed February 21, 2024 09:38 PM

webMethods Community MemberThu February 22, 2024 10:12 AM

webMethods Community MemberWed February 21, 2024 05:42 PM

webMethods Community MemberWed February 21, 2024 08:44 PM

webMethods Community MemberWed February 21, 2024 09:41 PM

webMethods Community MemberThu February 22, 2024 12:33 AM

webMethods Community MemberThu February 22, 2024 08:44 AM

webMethods Community MemberThu February 22, 2024 10:50 AM

webMethods Community MemberThu February 22, 2024 11:09 AM

webMethods Community MemberThu February 22, 2024 10:13 AM

webMethods Community MemberThu February 22, 2024 12:05 PM

webMethods Community MemberThu February 22, 2024 12:58 PM

webMethods Community MemberThu February 22, 2024 12:59 PM

webMethods Community MemberThu February 22, 2024 01:25 PM

webMethods Community MemberThu February 22, 2024 01:39 PM

webMethods Community MemberThu February 22, 2024 01:46 PM

webMethods Community MemberThu February 22, 2024 01:58 PM

webMethods Community MemberThu February 22, 2024 03:28 PM

webMethods Community MemberThu February 22, 2024 01:49 PM

webMethods Community MemberThu February 22, 2024 03:22 PM

webMethods Community MemberFri March 22, 2024 05:39 AM

1. Concurrency and cluster in wM IS

2. RE: Concurrency and cluster in wM IS

3. RE: Concurrency and cluster in wM IS

4. RE: Concurrency and cluster in wM IS

5. RE: Concurrency and cluster in wM IS

6. RE: Concurrency and cluster in wM IS

7. RE: Concurrency and cluster in wM IS

8. RE: Concurrency and cluster in wM IS

9. RE: Concurrency and cluster in wM IS

10. RE: Concurrency and cluster in wM IS

11. RE: Concurrency and cluster in wM IS

12. RE: Concurrency and cluster in wM IS

13. RE: Concurrency and cluster in wM IS

14. RE: Concurrency and cluster in wM IS

15. RE: Concurrency and cluster in wM IS

16. RE: Concurrency and cluster in wM IS

17. RE: Concurrency and cluster in wM IS

18. RE: Concurrency and cluster in wM IS

19. RE: Concurrency and cluster in wM IS

20. RE: Concurrency and cluster in wM IS

21. RE: Concurrency and cluster in wM IS

22. RE: Concurrency and cluster in wM IS

23. RE: Concurrency and cluster in wM IS

24. RE: Concurrency and cluster in wM IS

25. RE: Concurrency and cluster in wM IS

26. RE: Concurrency and cluster in wM IS

27. RE: Concurrency and cluster in wM IS

28. RE: Concurrency and cluster in wM IS

29. RE: Concurrency and cluster in wM IS

30. RE: Concurrency and cluster in wM IS

31. RE: Concurrency and cluster in wM IS

Additional Resources

Office

Quick Links

Additional
Resources