Informix

 View Only

 Waiting for a buffer

Jump to  Best Answer
  • IBMChampion
Dennis Melnikov's profile image
Dennis Melnikov posted Mon November 11, 2024 04:40 AM

Hi,

IBM Informix Dynamic Server Version 11.70.FC5XE

If session is waiting for a buffer, what does it mean?

How to prevent it?

Andreas Legner's profile image
Andreas Legner  Best Answer

I feel I've got to demur a little from, or expand beyond what's been said so far: buffer waits are not for a free buffer to become available for a new page to be read in, normally. (In such case one of the least recently used buffers would be picked and simply evicted, in worst case involving a foreground write, but that typically would not show as a buffer wait, if I'm not mistaken.)

A buffer, holding a certain page, can be accessed by multiple parties simultaneously as long as everyone only wants to read from the page, and each party will simply increment the share lock counter. On the other hand, anyone with the intention to modify the buffer content will have to gain exclusive access.

So buffer waits can occur if

  • one party holds the buffer exclusively while another one / others want to gain access too, shared or exclusive - they'll have to wait
  • some parties are holding the buffer in share mode while someone intends to modify it - he's got to wait, and so, I think, will have other share lockers arriving after him

A common special form of the first case is either a fellow session thread or a readahead thread in the course of populating the buffer, i.e. reading in the designated page, while others also already seek access to it. The typical sign for this occurring, esp. the readahead case, is a buffer owner -1 (0xffffffff....) instead of a thread address (rstcb).

You don't immediately see a buffer waiter's intention, x-lock or s-lock, but if the buffer currently is held in X mode then likely the waiters will be share-lockers and vice versa.

All this is perfectly valid and should not cause a problem unless those buffer accesses, for whatever reason, are taking too long.

If long buffer waits occur, or frequent ones, it's a sign of something not running fluently and possibly a bottleneck.
Sometimes it's a very specific (set of) buffer(s), with a lot of contention on it (mixed shared and exclusive) -> find out what pages; more often than not it turns out to be an index root node.  Also find out communalities, e.g. in SQL being executed, between sessions having to wait.
It could also be slow disk i/o, esp. in the readahead case.

One really has to find the pattern behind the buffer wait symptom.

HTH,

 Andreas

Mike Walker's profile image
Mike Walker

Hi Dennis,

A session waiting on a buffer simply means that it is waiting on a free spot in the bufferpool pages.  That could be because it is waiting for a dirty page to be written to disk, or just searching the LRU queue to find an empty slot to use.  Waiting on a buffer is completely normal, but if the waiting is excessive that it may impact performance.  Art Kagel recommends calculating the buffer wait ratio - use onstat -p, or onstat -g buf and calculate bufwaits/(pagreads + bufwrits) *100.  You would want to see a value less than 7%.

As with many things, reducing the buffer waits may be a combination of various parameters and there is no simple answer.  You can increase the number of buffers (BUFFERPOOL), and increasing the number of LRUs will help.  The buffers are distributed over the number of LRUs and if you have lots of buffers split over relatively few LRUs then it will take longer to search these queues for a free page.  Increasing the LRU min/max dirty may help as this will reduce the number of dirty pages in the queues.  BTW, check onstat -F to make sure that there are NO foreground writes.  Use Art's buffer turnover ratio (BTR) calculation to make sure that you are making good use of your bufferpools. - if your bufferpool is too small with high turnover then you'll see more things waiting on buffers as more stuff needs to be read in from disk.

Remember though that you shouldn't try to prevent all buffer waits, but first check to see that they are not excessive.

Mike 

Art Kagel's profile image
Art Kagel

Just a couple of addenda to Mike's response:

  • When Mike said you could "increase the LRU min/max dirty settings" he mean decrease those values so that dirty buffers are flushed more aggressively. If you do that, you may need to increase the CLEANERS parameter to allow for more parallelism during those flushes, especially if you have many LRU queues or many chunks.
  • Note on the BTR: I currently no longer use the original BTR because on any insert-heavy system the BTR value is artificially inflated. I currently use the newer BTR3 calculation which nearly always returns the same (or substantially identical) value as BTR for balanced systems but on insert-heavy systems it return a more accurate, lower, value which can be used to better predict if more buffers are required.
  • No need to calculate these and a few other metrics yourself. Just go to my web site (www.askdbmgt.com/my-utilities) and download the ratios.shr_ak package. It contains source for a stored procedure to be installed in your sysmaster database, and a shell script, newratios.ksh, that formats a report of the metric calculations.

Mike Walker's profile image
Mike Walker

Thanks Art - yes I meant to say LOWER the values for the lru min/max.  That way you will get more writes to disk between the checkpoints and hopefully increasing the number of free pages in the bufferpool.  Also, I am glad to see that I am not the only one to type "rations" instead of "ratios"!  :-) 

Dennis Melnikov's profile image
Dennis Melnikov

Mike, Art,

Thank you so much for your answers.

My issue is as follows,

Typical level of sessions waiting for a buffer does not exceed 1 (one). But at that unhappy moment those sessions reached 562. At the same time sessions waiting for a lock grew up to 44. And then in 8 minutes it all dropped back to 0-1.

We have no foreground writes (onstat -F). No record in OS's error log (AIX).
What else to check if the situation get back?

Art Kagel's profile image
Art Kagel

With around 600 concurrent queries running sometimes (as when the number of sessions waiting for buffers exceeds 560 and 44 sessions waiting for locks) it may be that you do not have enough LRU queues. For that level of concurrent sessions I would want each buffer pool to have about 256 LRU queues.

What happens when a session needs to read a new page into a buffer pool is that the session hashes to one of the LRU queues in that pool and checks to see if the queue is latched by another session trying to do the same. If so, it will spin on the lock for a few thousand loops then if it still cannot latch that queue it will rehash to another queue. If there are not enough lru queues this poor session will not be the only one that cannot get the latch so many sessions will chase each other around the buffer pool's lru queues essentially single threading access. More queues means there are more available latches and their queues. Some of them might not be locked reducing bufwaits (which are most often not buffer waits at all but ... lru latch waits). I would note that this very issue what the one that originally got me interested in performance tuning and to develop the metrics that I have used now for 30+ years.

Sometimes, yes, as Mike pointed out, having not enough buffers or not flushing aggressively enough (in the onstat -F report I want to see about 60-70% of all writes being LRU writes and only 30-40% chunk writes, if you flushes are REALLY to infrequent or tame, you will see FGwrites, which you are not seeing). The BTR3 will tell you if you have enough buffers (BTR3 < 10 turns/hour) and the onstat -F report will tell you if you are flushing aggressively enough.

If you cannot figure this one out, then I would suggest engaging one of the consulting groups that specialize in Informix, those being Mike Walker and Tom Beebe's xDB Systems, Paul Watson's Oninit, or myself at ASK Database Management. That's what we do.