Forgive the long post, but I have been working on a system showing poor query performance and am at a bit of a loss. It's Informix 10, and an upgrade is not a possibility. This is on RHEL and running on a VM.
The system had been working well until a week or so ago, when performance suddenly went bad. No changes were made, no restarts, no nothing. Nothing in the Linux message log or boot log indicating any issues.
We have traced the main performance issue to queries that use ORDER BY or GROUP BY. A query that takes about 1 second, turns to 20-30 seconds with an order by, and it's only 50,000 rows that are returned. Monitoring the thread when running shows that it spends most of the time in "
cond wait bufcond", which has to do with a sort waiting on output buffers, but that's all I know on that!
If we set PSORT_DBTEMP=/tmp, the query runs very quickly. If we set PSORT_DBTEMP to the same filesystem as the (cooked) chunks, the query still runs quickly.
The CPUs do show a fairly high iowait% when even a single query is running, so while we suspect some sort of disk issue, there is nothing to validate that. A "dd" is slower than we would expect, but other IO seems very respectable. There are other VMs running on the host but they are quiet. Also no swapping going on. Linux reports not a whole lot of memory free, but it's being used for filesystem caching. We have restarted the VM, and the problem remains.
Today we also found the same performance hit with a query that was using an AUTOINDEX. The query, which had not been a problem before, is now taking 30+ seconds. I replaced the AUTOINDEX with a real index and it now runs in 0.02 seconds. I suspect again that the autoindex query was slow because of temp space.
Another example is this...
This is fast (1-2 sec):
create temp table mw_mytemptable
(
col1 char(30),
col2 integer,
col3 char(4),
col4 char(4),
col5 char(4),
col6 char(1),
col7 datetime year to second,
col8 datetime year to second
) extent size 4096 next size 4096 lock mode page;
insert into mw_mytemptable select * from mytable;
"mytable" has about 600,000 records.
But this is really slow (30+ seconds):
select * from mytable into temp tmp_mw1 with no log;
I haven't tried the first insert without the extent sizes, but suspect it wouldn't make a whole lot of difference here.
When using iotop to monitor IO, the first SQL shows
dbaccess as the process doing IO at several MB/sec. The second SQL shows multiple
oninit processes doing IO at 100-200 KB/sec.
I have added additional CPU VPs and AIO VPs, and it doesn't help. Checkpoint times have increased substantially since the problems started.
Again, sounds like disk, but then I get thrown off because of things like the insert shown above was so very fast, even the first time it was run, in case caching is playing a part. Also when using PSORT_DBTEMP, the output is very fast. Unloading the large table to a file is fast.
Has anybody seen anything like this before, or have any ideas?
Thanks,
Mike
------------------------------
Mike Walker
xDB Systems, Inc
www.xdbsystems.com------------------------------
#Informix