Kernel AI/O tuning

View Only

Expand all | Collapse all

Kernel AI/O tuning

1. Kernel AI/O tuning

0 Like
Thomas Sherlock
Posted Wed March 31, 2021 12:36 PM

Reply
I have a question about tuning kernel asychronous I/O, specifically on Linux (RHEL 7). For a bit of background, kernel parameter aio-max-nr controls the maximum number of KAIO requests, divided among all CPU VPs. Informix environment variable KAIOON then determines how many request "slots" (my term) each CPU VP is allocated when the engine is started. Once running you can then see from kernel parameter aio-nr how many request slots Informix actually has across all CPU VPs.

My question is about how to tell whether there are enough slots and when disk I/O might be constrained by waiting for slots to become free. As far as I can see none of the Informix onstats, e.g. '-g iov', '-g ioq' etc., provides any insight into this. Any excessive waiting on the logical log buffer or bufferpools does not directly point to a lack of slots.

With AIO VPs you can look at how the number of operations tails off as you read down the output from 'onstat -g iov' but this method does not work for KAIO.
Apart from increasing aio-max-nr and KAIOON and doing some testing, has anyone got any suggestions?

------------------------------
Thomas Sherlock
------------------------------

#Informix
2. RE: Kernel AI/O tuning

0 Like
IBM Champion

Art Kagel
Posted Wed March 31, 2021 01:44 PM

Reply
Thomas:

I would look at the output from onstat -g ioh during a peak hour and look for chunks with high service times. Also look at the flush rate of logical and physical logs and dirty data during checkpoints as reported on the onstat -g ckp report during peak periods. If the flush rates fall off when the number of pages to be flushed (when the engine is being most efficient about how it flushes) that's a good sign that either there are not enough KAIO resources available or the disks are slow. You can eliminate the latter if you know the array's write throughput and compare that to the flush rate as well. If you can't flush up to the array's capabilities then it is a resource issue. If you are flushing at or slightly above the array's capabilities then you need a faster array.

------------------------------
Art S. Kagel, President and Principal Consultant
ASK Database Management Corp.
www.askdbmgt.com
------------------------------

Original Message
3. RE: Kernel AI/O tuning

0 Like
Thomas Sherlock
Posted Wed April 07, 2021 03:44 AM

Reply
Thanks Art, I'm going to take it there's no more direct instrumentation then.

------------------------------
Thomas Sherlock
------------------------------

Original Message
4. RE: Kernel AI/O tuning

0 Like
IBM Champion

Art Kagel
Posted Wed April 07, 2021 06:23 AM

Reply
Nothing of which I am aware.

Art

------------------------------
Art S. Kagel, President and Principal Consultant
ASK Database Management Corp.
www.askdbmgt.com
------------------------------

Original Message
5. RE: Kernel AI/O tuning

0 Like
IBM Champion

David Williams
Posted Mon May 03, 2021 03:58 PM

Reply
Hi,

Check the online.log for errors.

I would also check onstat -g ioq and see if the maxlen gets above 32 anywhere.

Regards,

David.

------------------------------
David Williams
------------------------------

Original Message
6. RE: Kernel AI/O tuning

0 Like
Benjamin Thompson
Posted Tue May 04, 2021 06:04 AM

Reply
Hi David,

Thanks for replying to this. (Tom is a colleague of mine.)

We did find this blog post we could adapt for Informix showing how you can monitor KAIO usage directly but it is not really for production:
https://blog.pythian.com/troubleshooting-ora-27090-async-io-errors/

33 (!), 48 and 64 are common values we see for maxlen in 'onstat -g ioq'. Curiously we never see a non-zero value in the len column.

We found a couple of ways the effects of a lack of KAIO resources can be observed indirectly. The nature of the problem is a limit on the number of IO requests that can be in-flight simultaneously so when under load:
* Parallel L0 backup durations shorten when more KAIO resources are allocated.
* collectd disk plugin reports higher latency times for storage access, especially when the system has a higher than normal number of ready threads. These are not caused by our storage and either go away or reduce significantly with more KAIO resources.

If anyone is interested we did do quite a bit of testing, including looking at AIO VPs and revisiting how they behave compared to KAIO. We found AIO VPs can be more effective than KAIO if KAIO lacks resources but, properly resourced, KAIO is faster and more efficient. Using AIO VPs also requires a lot more filehandles as each AIO VP needs a filehandle for every chunk you have, which can be issue if you need a lot of AIO VPs, have a lot of chunks or, worse, both. Instance start up time is slower with AIO VPs, although on many systems this may not be very noticeable.

Ben.

------------------------------
Benjamin Thompson
------------------------------

Original Message

Informix

Kernel AI/O tuning

Thomas SherlockWed March 31, 2021 12:36 PM

Art KagelWed March 31, 2021 01:44 PM

Thomas SherlockWed April 07, 2021 03:44 AM

Art KagelWed April 07, 2021 06:23 AM

David WilliamsMon May 03, 2021 03:58 PM

Benjamin ThompsonTue May 04, 2021 06:04 AM

1. Kernel AI/O tuning

2. RE: Kernel AI/O tuning

3. RE: Kernel AI/O tuning

4. RE: Kernel AI/O tuning

5. RE: Kernel AI/O tuning

6. RE: Kernel AI/O tuning