IBM Security Verify

 View Only
  • 1.  webseald spiking CPU on appliance and stops processing requests

    Posted Tue April 27, 2021 12:40 AM
    Rather unusual but has a very clear pattern.

    On ISAM 9.0.7.2..Every Monday at the start of the peak traffic, one or two of our webseals (out of 10 replicated webseals) gradually starts hitting 90 to 99% CPU and never recovers until the appliance is hard rebooted. Upon investigation we found that webseald process was consuming high CPU utilization however we fail to identify the cause for it. we do have a bunch of slow backends with high response times and large request size, for which we defined per-junction thread limits...And, any other misconfiguration with webseal config would cause the issue to occur on all days of the week instead of just Monday morning.
    Is it a capacity issue-- no, the same capacity works fine rest of the week...and the volume of traffic is same all days...issue is only on Monday morning and If Monday happens to be a holiday, then the issue would occur on Tuesday ..so some backend running some batch jobs/crons at that time? how do I find that rogue backend that is most likely tanking the CPU, and it could be some other reason. What other reasons I can think of for an issue with this pattern of occurrence. Before thinking of the likely root cause, please remember again the issue occurs only on Monday morning! Welcome all your inputs!!

    Thank you!
    -Raj.

    ------------------------------
    Rajkumar
    ------------------------------


  • 2.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Tue April 27, 2021 03:44 AM
    Hi Raj,

    If I had to make a wild guess, I would say the issue is most likely related to some connection that has gone stale over an extended period of inactivity.  This might not be the connection to the backend servers - it could be connection to a directory or database or other component in the architecture.  Sometimes you can get locks when a firewall enforces a connection timeout which is different from what the connection endpoints expect... they think the connection is valid but the firewall is dropping packets.  Usually these situations are recovered automatically but perhaps something else is going on in your specific case.

    I would imagine you'll need to open a support case to get to the bottom of this.  If this issue happens each Monday, it should be possible to gather some debug or stats or something which can help point in the right direction.

    Jon.

    ------------------------------
    Jon Harry
    Consulting IT Security Specialist
    IBM
    ------------------------------



  • 3.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Tue April 27, 2021 11:55 AM
    Thanks for your inputs Jon, Yes we do have an on-going support case however I just wanted have the benefit of this forum to seek some inputs as this
    appears to be a unique issue of its kind, and I hate to say we have been debugging this from almost 10 months now :) We did collect tons of data but no clues so far.
    The fact that the spiking instance does not auto-recover makes it more complex to arrive at any obvious conclusions.

    ------------------------------
    Rajkumar
    ------------------------------



  • 4.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Fri April 30, 2021 06:01 AM
    From what I understand, ISVA uses permanent connections. Which means that when you establish a connection with a remote server, unless it is specifically closed, ISVA will keep it.

    I'm not sure if you can execute netstat -an on ISVA. But if you could, you would be able to see all connections. Another alternative would be to generate a tcpdump and check what is going on. But I guess, from what you describe, you may already tried this one!

    I also understand that the CPU peak is also a symptom of this behavior on ISVA. There are limited options on ISVA configuration to set the keepalive parameter to disable permanent connections. For example, you may set this parameter for LDAP connections, but not sure if you can set the same for other type of external connections your ISVA may have.

    But I would focus on this issue, because from the symptoms you describe permanent connections might be the issue.

    Nevertheless, there are other reasons for that to happen. Here are a few:
    • Connection Leaks
    • Threads in deadlock situations
    But I don't think we have the capability of looking into these issues! I guess only IBM support would be able to check these possibilities.

    ------------------------------
    Joao Goncalves
    Pyxis, Lda.
    ------------------------------



  • 5.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Fri April 30, 2021 08:27 AM
    Hi Joao, I assume frontend persistent connection timeouts and the max persistent connection settings etc., (client connections) would take care of closing the connections, not sure though. Using REST API we could get the netstat output from LMI but couldn't get any clues from there. I am not sure on the option for the keepalive parameter. We have played around with persistent connection settings for client connections (frontend) and junctions (backend). With persistent connections enabled it would be hard to understand the threads deadlock situations.. like with persistent connections , it will be difficult to see who is using what and how long threads are open because persistent connection will stay open as long as it is configured in webseal and it will tie up the treads for that entire time.
    When CPU is pegged at 99%, active worker threads are 100% consumed., and we see hard limit hits. It is mostly likely a bad connection on Monday that gets hit to one lucky webseal and gets locked there and we can't get rid of that until we reboot the appliance. We are unable to find what that connection is... I am also not able to determine if it would be a frontend or a backend that needs further digging..

    ------------------------------
    Rajkumar
    ------------------------------



  • 6.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Fri April 30, 2021 08:36 AM
    Thread deadlock situations are unrelated to persistent connections. Their cause is usual related to programming code.

    Keepalive_time is kernel parameter, or can also be set when creating a full socket, for a particular connection. This is the parameter that can be used to configure "permanent connections".

    ------------------------------
    Joao Goncalves
    Pyxis, Lda.
    ------------------------------



  • 7.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Thu June 17, 2021 03:08 PM
    Thanks for clarifying Joao, sorry for the delay in my reply. I got consumed with debugging this CPU issue but still no luck identifying the root cause. I wish I have more information on the keepalive_time for ISAM. This is for sure a connection issue, permanent connections/stale connection(s) over a period of time. but I don't know how to identify that and not even able to find the right direction. Let me give some of patterns of how the CPU behaves on Monday --just hoping it would give anyone any ideas as  - one or two out of 10 webseals show a 5% higher % CPU as the traffic increases and the difference increases for every 10 min (like 2 webseals that spike that day start from 10% cpu to 90%cpu over 30 to 40 min of time with rise of 20% increase every 10min while other webeals are below 20%) and after about 40 to 60 min or so they gradually go beyond 70% and 90% and drops in 30 seconds then again hits above 90% and drops in 30 seconds, this continues until I pull the server out of the pool. Sometimes there won't be any drop when it hits above 90% every minute and drops etc.. so I pull out of the pool, reboot the appliances ( remember simply rebooting the instance is not working, it re-spikes after I put it in pool)., and the put it in pool and allow traffic.so, if I understand from your or Jon's feedback on this., permanent or stale connections are piling up over a time, gets locked or something resulting in a high cpu utilization..is that right? in any event., how can I debug this, a packet capture from F5 to webseal , Webseal to backend , and webseal to ldap/policy/federation servers would give any clues? connection issues mostly likely be debugged using a network capture right?.., I have no clue anyway., any ideas are welcome! Thank you!

    ------------------------------
    Rajkumar
    ------------------------------



  • 8.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Thu June 17, 2021 08:40 PM
    Oh and one more thing., when I look at the CPU graph in LMI., it is the system CPU that spikes and not the user CPU.

    ------------------------------
    Rajkumar
    ------------------------------



  • 9.  RE: webseald spiking CPU on appliance and stops processing requests

    Posted Mon August 22, 2022 04:34 AM
    We are facing a very similar situation.

    How has the problem evolved?

    Thanks


    ------------------------------
    Patrizio
    ------------------------------