The reason I ask is you likely don't want the gap too small if the appliances are just running fine. You want the appliances to have a chance to recover on their own. The "happy day" scenario is an attempt to prevent DOS type attacks. If you set the gap too small when the appliances do not have a memory leak, you could to cause your own DOS.
Original Message:
Sent: Wed April 16, 2025 11:38 AM
From: Sunil Chaurasia
Subject: Event Triggers
Hi Joseph,
For the point 3.
The memory reaches, say, 80% utilization and stops receiving traffic. It doesn't recover due to the memory leak, but it also is not receiving traffic because it is in the gap between throttle and terminate.
If i understand correctly, this situation can be averted to some extent if the the gap between Throttle and Terminate is kept small.
So that once its throttle it has less time left for Reboot, considering the memory growth.
------------------------------
Sunil Chaurasia
Original Message:
Sent: Mon April 14, 2025 01:44 PM
From: Joseph Morgan
Subject: Event Triggers
I still suggest you use the throttle settings.
I only suggest an alternative is if you have a memory leak or there is some other problem with the appliances reloading properly.
Zombie mode happens when:
- There is a memory leak where the appliances don't recover on their own.
- You set the thresholds, say, as you have noted.
- The memory reaches, say, 80% utilization and stops receiving traffic. It doesn't recover due to the memory leak, but it also is not receiving traffic because it is in the gap between throttle and terminate. I won't reload on its own, but also won't recover. Thus, like a zombie, it is alive but no way to talk to it
As for what I do. I do both wherever I configure appliances. That is, I use the thresholds but also use external monitoring with alerting. The alerts run external scripts which can keep up with how many times this occurs over some period of time, and, if those external thresholds are exceeded, then we have a script safely reload the appliance.
------------------------------
Joseph Morgan
CEO - Independent
Original Message:
Sent: Mon April 14, 2025 12:45 PM
From: Sunil Chaurasia
Subject: Event Triggers
Hi Joseph,
Thanks for replying.
As of now, we have BMC tracking utilization and we have alert placed at 80% Memory utilization and are rebooting manually.
Incase if we go for throttle setting, we can apply below.. correct!
Also, do you see any potential issues rebooting using the Throttle settings option!
Memory Throttle At 22% (78% Utilization)
Memory Terminate At 18% (82% Utilization)
Or You suggest better to use the current approach , i.e manual reload after getting alerts.
By ZOmbie mode, u mean the appliance is basically not receiving any further traffic, right.
------------------------------
Sunil Chaurasia
Original Message:
Sent: Mon April 14, 2025 09:48 AM
From: Joseph Morgan
Subject: Event Triggers
I think this may be a bad idea. Please follow along with me:
The appliance can do this automatically. This generally happens when the "Memory Terminate Threshold" is met. Traffic is rejected at the "Memory Throttle Threshold" to possibly allow the memory to recover. You'll find both of these values in the Throttle Settings of the appliance. Therefore, if you set the throttle threshold at, say, 35 (65% memory used) and the terminate threshold at 25 (75% used), it should reload on its own.
If, however, you are trying to mitigate a memory leak, this strategy may not work so well. In that case, the box might reach the "Memory Throttle Threshold" to stop traffic, but then never recover down to accept traffic again, nor will it hit the "Memory Terminate Threshold" because it's not leaking anymore. This is what I've come to call a "Zombie" appliance, which means it's alive, but effectively dead.
You may not can solve this with event triggers either because, if the appliance reaches Zombie mode, the "restart" memory limit may never be reached. However, you could try to force it by setting the throttle threshold to, say, 20 (80% used) and then using the trigger for over 75% used. Be sure to set the "Only Once" toggle "on" to prevent a race condition on restarts.
Also, you may not want the appliance restarting on short usage spikes. A memory spike may last mere milliseconds. You don't want to restart on every spike. Calculating an average or number of consecutive times won't be possible in an event trigger, and any attempt to do so will be a terrible idea. If you have a cluster (most likely, yes?), then you could cause a cascading effect on the others in the cluster as the first is restarting. Definitely consider a "reload" rather than a "restart" for this, as a reload is much faster than a restart.
So, the happy day scenario is to use the threshold settings. If you cannot because of a memory leak, you're limited until you find the memory leak and fix it, or IBM can fix it in a future firmware.
My advice then (if you indeed have a memory leak). Monitor the appliance. Setup alerts via your monitoring tools, and, if it becomes a zombie because it stays in the gap, then restart or reload manually and, most importantly, safely.
------------------------------
Joseph Morgan
CEO - Independent
Original Message:
Sent: Mon April 14, 2025 02:38 AM
From: Sunil Chaurasia
Subject: Event Triggers
Hi Team,
There is a small use case, that i believe can be achieved using Event Triggers function in Log target.
We want to basically Reboot an appliance once its memory utilization is more than 75%.
Can you please share the require configuration to achieve this.
------------------------------
Thanks
------------------------------