DataPower

 View Only
Expand all | Collapse all

Small LDAP Issues Cause Active Transaction Issue

  • 1.  Small LDAP Issues Cause Active Transaction Issue

    Posted Tue March 21, 2023 04:44 PM

    Hello,

    We are using DataPower on a VM with IDG.2018.4.1.24.

    We are using a load balancer that contains three domain controllers for our ldap connections for AAA.  Recently, there have been issues with one of the domain controllers -- a brief outage of just a few seconds.   This seems to cause issues with any proxy transactions that are active at the time of the outage.  It seems to cause all transactions for those proxies to build up.  Eventually, consumers hangup, but the proxies don't even follow the timeouts we have set.  I look at one transaction today that was active for 15+ minutes and the proxy was set to timeout at 120 seconds.  Today we had 17 proxys that were affected.  We have almost 900 proxys and the others seemed to be processing fine during this time.  

    Eventually, we have to reset the appliance.  When it comes back up, all is well.

    We contacted support and they advised fixpack IDG.2018.4.1.24.  They also said to lower our UserAgent timeout below the default -- we set it to 60, but have now lowered it to 15.  

    Does anyone have any other ideas.  Has anyone experienced this issue?  Any thoughts would be appreciated.

    Tracy Green

    Federated Insurance



    ------------------------------
    Tracy Green
    ------------------------------


  • 2.  RE: Small LDAP Issues Cause Active Transaction Issue

    IBM Champion
    Posted Wed March 22, 2023 09:22 PM

    Hi Tracy.  Yes, we  experienced this exact same thing several years ago.  Our solution turned out to be problems with DataPower firmware AND Windows Active Directory and the common manner in which many companies utilize a round-robin DNS for LDAP.   I cannot know, for sure, if yours is the same exact issue because I don't know what kind of LDAP you have.  However, as stated, our problem was three-fold, and I do believe this was experienced on the 2018 firmware (we are well beyond that now).

    The three conditions causing the problem were:

    1. Something happening, I don't really remember, with the LDAP itself.  For us, it was a Windows AD problem that was eventually fixed by Microsoft.   At some point, the LDAP server stopped responding or maybe stalled due to some kind of exhaustion.
    2. Our LDAP servers (3) were actually behind a round-robin DNS of 3 IPs.   Though common, it prevented any kind of health check on the DNS servers.  If any of the LDAP servers went down or stalled, the DNS is still round-robin, feeding every nth (3rd for us) LDAP request to a bad server.
    3. Something in the firmware of DataPower causing the DataPower appliance to develop some kind of affinity to the server not responding.  That is, at some point, the only server DataPower seemed to connect to was the bad LDAP server, and, of course, that caused everything to back up.

    Recycling LDAP or DataPower solved it.  At first, we were convinced it was DataPower because every time we recycled it, the problem seemed to go away.  The problem might disappear for days or weeks, and then it happened again.  At some point, the LDAP servers were restarted one at a time without a DataPower restart, and it went away, again for an indeterminate amount of time.  It took 6 months to finally nail down that the issue was related to both DataPower and LDAP. 

     We haven't experienced the problem since, but it did take a firmware update and a Microsoft update.  Though the DNS configuration was unfortunate and an exacerbating factor, it was some time later we put each of the LDAP servers into a DataPower LBG as well, bypassing the DNS config.  This gave us a proper health check the DNS round-robin configuration simply cannot.



    ------------------------------
    Joseph Morgan
    ------------------------------



  • 3.  RE: Small LDAP Issues Cause Active Transaction Issue

    Posted Thu March 23, 2023 12:14 PM

    Hi Joseph,

    This sounds exactly like what we are experiencing. Thank you so much for this explanation.  We are Windows AD as well. Our security team is pursuing the issue with Microsoft, and we are in the middle of an upgrade to 10.0 so it is good to know we are heading in the right direction.  

    I'm interested in your LBG configuration.  We had that set up, and one of our security engineers insisted we remove it because he could tell that once DataPower, once it connected to a DC, was using the same one rather than round robin.  Have you ever seen that issue?

    Thank you again for your response.  I really appreciate it.

    Tracy Green



    ------------------------------
    Tracy Green
    ------------------------------



  • 4.  RE: Small LDAP Issues Cause Active Transaction Issue

    IBM Champion
    Posted Thu March 23, 2023 12:41 PM

    We ended up using the "Least Connections" algorithm and "Try every server before failing". 

    I really wouldn't think that should matter a whole bunch.  Someone has turned off the health checks, but I had it set to half-TCP connection.



    ------------------------------
    Joseph Morgan
    ------------------------------



  • 5.  RE: Small LDAP Issues Cause Active Transaction Issue

    Posted Thu March 23, 2023 05:03 PM

    Ideally the DP as a client to an external service like AD/LDAP would not have to manage the availability of that external service using things like LBGs and "fail-over" schemes.  Those services should provide a VIP and manage their own availability and load balancing.  Leave DP as a consumer of their service, out of it :-) 



    ------------------------------
    Ivan Heninger
    ------------------------------



  • 6.  RE: Small LDAP Issues Cause Active Transaction Issue

    IBM Champion
    Posted Thu March 23, 2023 05:31 PM

    Ideally!  Right up to the moment it all comes tumbling down, as in the case of the post and something with which I had direct experience lasting at least 6 months of total mayhem on considerably expensive SLAs.



    ------------------------------
    Joseph Morgan
    ------------------------------



  • 7.  RE: Small LDAP Issues Cause Active Transaction Issue

    Posted Mon March 27, 2023 09:49 AM

    I agree as well.  Long term, that's where we want to be.  This particular issue has caused us many issues, though.  Like Joseph, this has caused multiple outages over several months so we're trying to determine what we can do both short term and long term to avoid these outages.  

    Thanks!

    Tracy Green



    ------------------------------
    Tracy Green
    ------------------------------



  • 8.  RE: Small LDAP Issues Cause Active Transaction Issue

    Posted Mon March 27, 2023 09:38 AM

    Hi Joseph,

    My security team is wondering if the Microsoft issue you saw was an error of  "Active Directory Web Services was unable to determine if the computer is a global catalog server". 

    Just to clarify, is the Least Connections setting in DataPower or in the load balancer for the domain controllers.  I'm assuming it's the latter. 

    Thanks!

    Tracy Green



    ------------------------------
    Tracy Green
    ------------------------------



  • 9.  RE: Small LDAP Issues Cause Active Transaction Issue

    IBM Champion
    Posted Mon March 27, 2023 04:46 PM

    Tracy, 

    As for the Microsoft side of things, I don't recall nor see that particular error looking back on what I have of that.  I didn't file the case, so, I don't know.  What I do recall, though, was the DC ran into some kind of run-away condition causing memory or CPU (or both) exhaustion.

    In our case, we didn't have a load balancer at all.  It was setup via DNS round robin only.  This is why we setup a DataPower load balancer, which we set to least connections to try and mitigate the strange affinity DataPower seemed to get with the "bad" server.

    The DP LBG saved us until we got the Microsoft issue fixed.  IBM gave us a special firmware, which was included in subsequent GA FW releases.



    ------------------------------
    Joseph Morgan
    ------------------------------



  • 10.  RE: Small LDAP Issues Cause Active Transaction Issue

    Posted Tue March 28, 2023 08:23 AM

    Joseph,

    Thank you again for your help and for the work around!

    Tracy Green



    ------------------------------
    Tracy Green
    ------------------------------



  • 11.  RE: Small LDAP Issues Cause Active Transaction Issue

    IBM Champion
    Posted Wed April 05, 2023 05:27 PM

    Ah, the joys of having DataPower talk to Active Directory (AD) via a DataPower Load Balancer Group (DP LBG). Yeah, this is sometimes a struggle for us as well. We are forced to use LBGs because DataPower does not respect (take advantage of?) of the virtual name that can front a pool of AD servers. You can read why in my RFE here: https://integration-development.ideas.ibm.com/ideas/DPGWY-I-161 Please vote for it even though IBM says its not under consideration.

    So we are stuck with DP LGBs for now. And they don't always seem to work as well as you would hope, especially when the AD guys are patching their servers and don't wanna hear about it because they are only doing one server at a time and why can't we use the Virtual name like everyone else.

    So one interesting thing we learned thru testing is DP LBGs are kinda dumb. If you ask them to connect to a port on a server (members of the group) that is literally all that they will try to do and nothing more.
    A. Can't reach the server (bad IP,  no DNS, no network, etc)? Try the next member in the group.
    B. Can reach the server but nothing listening on the port? Go to the next member in the group.
    C. Can reach the server and the port responded? Success! Return control back to the DP service that called the DP LBG.

    See the problem with C in the context of making an LDAP call to AD? The DP LGP (by default) does not care what responded on that port. It treats all of the following as success:
    1. AD responds with a LDAP response
    2. AD accepts the socket connection, but doesn't respond (AD stopping, AD initializing, or AD sick and hung)
    3. Something completely unrelated to AD is on that port and accepts the socket connection, obviously not responding with an LDAP answer.

    In all 3 cases, the DP LBG pats itself on the back, happy to have made a network connection, and returns control to the calling MPG. In case 2 and 3, the MPG is like what the heck, the LDAP call failed. Everyone wonders why didn't the LBG automatically try one of the many other AD servers that could have responded. The client sees an LDAP failure. The LBG is not concerned - it did its job of making a network connection.

    And for some reason IBM deprecated the LDAP option under "Health Check Type" on the Health tab of a Load Balancer Group.








    ------------------------------
    Peter Potkay
    ------------------------------