DataPower

DataPower

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
  • 1.  Event Triggers

    Posted Mon April 14, 2025 02:38 AM

    Hi Team,

    There is a small use case, that i believe can be achieved using Event Triggers function in Log target.

    We want to basically Reboot an appliance once its memory utilization is more than 75%.

    Can you please share the require configuration to achieve this.





    ------------------------------
    Thanks
    ------------------------------


  • 2.  RE: Event Triggers

    Posted Mon April 14, 2025 09:49 AM

    I think this may be a bad idea.  Please follow along with me:

    The appliance can do this automatically.  This generally happens when the "Memory Terminate Threshold" is met.   Traffic is rejected at the "Memory Throttle Threshold" to possibly allow the memory to recover.  You'll find both of these values in the Throttle Settings of the appliance.   Therefore, if you set the throttle threshold at, say, 35 (65% memory used) and the terminate threshold at 25 (75% used), it should reload on its own.

    If, however, you are trying to mitigate a memory leak, this strategy may not work so well.   In that case, the box might reach the "Memory Throttle Threshold" to stop traffic, but then never recover down to accept traffic again, nor will it hit the "Memory Terminate Threshold" because it's not leaking anymore.   This is what I've come to call a "Zombie" appliance, which means it's alive, but effectively dead.  

    You may not can solve this with event triggers either because, if the appliance reaches Zombie mode, the "restart" memory limit may never be reached.   However, you could try to force it by setting the throttle threshold to, say, 20 (80% used) and then using the trigger for over 75% used.  Be sure to set the "Only Once" toggle "on" to prevent a race condition on restarts.   

    Also, you may not want the appliance restarting on short usage spikes.   A memory spike may last mere milliseconds.  You don't want to restart on every spike.  Calculating an average or number of consecutive times won't be possible in an event trigger, and any attempt to do so will be a terrible idea.  If you have a cluster (most likely, yes?), then you could cause a cascading effect on the others in the cluster as the first is restarting.   Definitely consider a "reload" rather than a "restart" for this, as a reload is much faster than a restart.

    So, the happy day scenario is to use the threshold settings.   If you cannot because of a memory leak, you're limited until you find the memory leak and fix it, or IBM can fix it in a future firmware.  

    My advice then (if you indeed have a memory leak).  Monitor the appliance.  Setup alerts via your monitoring tools, and, if it becomes a zombie because it stays in the gap, then restart or reload manually and, most importantly, safely.



    ------------------------------
    Joseph Morgan
    CEO - Independent
    ------------------------------



  • 3.  RE: Event Triggers

    Posted Mon April 14, 2025 12:46 PM
    Hi Joseph,
    Thanks for replying.

    As of now, we have BMC tracking utilization and we have alert placed at 80% Memory utilization and are rebooting manually.
     
    Incase if we go for throttle setting, we can apply below.. correct!
    Also, do you see any potential issues rebooting using the Throttle settings option!
     
    Memory Throttle At 22% (78% Utilization)
    Memory Terminate At 18% (82% Utilization)
     
     
    Or You suggest better to use the current approach , i.e manual reload after getting alerts.
    By ZOmbie mode, u mean the appliance is basically not receiving any further traffic, right.


    ------------------------------
    Sunil Chaurasia
    ------------------------------



  • 4.  RE: Event Triggers

    Posted Mon April 14, 2025 01:45 PM

    I still suggest you use the throttle settings.

    I only suggest an alternative is if you have a memory leak or there is some other problem with the appliances reloading properly.

    Zombie mode happens when: 

    1. There is a memory leak where the appliances don't recover on their own.
    2. You set the thresholds, say, as you have noted.
    3. The memory reaches, say, 80% utilization and stops receiving traffic.  It doesn't recover due to the memory leak, but it also is not receiving traffic because it is in the gap between throttle and terminate.  I won't reload on its own, but also won't recover.   Thus, like a zombie, it is alive but no way to talk to it

    As for what I do.   I do both wherever I configure appliances.    That is, I use the thresholds but also use external monitoring with alerting.  The alerts run external scripts which can keep up with how many times this occurs over some period of time, and, if those external thresholds are exceeded, then we have a script safely reload the appliance.



    ------------------------------
    Joseph Morgan
    CEO - Independent
    ------------------------------



  • 5.  RE: Event Triggers

    Posted Wed April 16, 2025 11:38 AM

    Hi Joseph,

    For the point 3.
    The memory reaches, say, 80% utilization and stops receiving traffic.  It doesn't recover due to the memory leak, but it also is not receiving traffic because it is in the gap between throttle and terminate. 

    If i understand correctly, this situation can be averted to some extent if the the gap between Throttle and Terminate is kept small.
    So that once its throttle it has less time left for Reboot, considering the memory growth.



    ------------------------------
    Sunil Chaurasia
    ------------------------------



  • 6.  RE: Event Triggers

    Posted Wed April 16, 2025 12:00 PM

    I would say, possibly.  It is worth a try to experiment with the gap. 

    Have I touched upon your problem?  That is, do you have a memory leak, and are you trying to safely resolve by regular reboots/restarts?

    The reason I ask is you likely don't want the gap too small if the appliances are just running fine.  You want the appliances to have a chance to recover on their own.  The "happy day" scenario is an attempt to prevent DOS type attacks.   If you set the gap too small when the appliances do not have a memory leak, you could to cause your own DOS.



    ------------------------------
    Joseph Morgan
    CEO - Independent
    ------------------------------



  • 7.  RE: Event Triggers

    Posted Thu April 17, 2025 10:58 AM

    Hi Sunil,

    just wanted to emphasize the point Joseph is making that you really need to find the root cause what is causing the whole thing. For me this type of behavior has always been some kind of "human error" like for example infinite loop, bad use of GatewayScript, etc. and never a memory leak in the appliance itself. Of course it is possible to have a memory leak but more likely it is something else.



    ------------------------------
    Hermanni Pernaa
    Solutions Architect
    Digia Plc
    Helsinki
    ------------------------------