Informix

 View Only
Expand all | Collapse all

Slow sorting/temp space

  • 1.  Slow sorting/temp space

    IBM Champion
    Posted Sat January 22, 2022 05:05 PM
    Forgive the long post, but I have been working on a system showing poor query performance and am at a bit of a loss.  It's Informix 10, and an upgrade is not a possibility.  This is on RHEL and running on a VM.

    The system had been working well until a week or so ago, when performance suddenly went bad.  No changes were made, no restarts, no nothing.  Nothing in the Linux message log or boot log indicating any issues.

    We have traced the main performance issue to queries that use ORDER BY or GROUP BY.  A query that takes about 1 second, turns to 20-30 seconds with an order by, and it's only 50,000 rows that are returned.  Monitoring the thread when running shows that it spends most of the time in "cond wait bufcond", which has to do with a sort waiting on output buffers, but that's all I know on that!

    If we set PSORT_DBTEMP=/tmp, the query runs very quickly.  If we set PSORT_DBTEMP to the same filesystem as the (cooked) chunks, the query still runs quickly.

    The CPUs do show a fairly high iowait% when even a single query is running, so while we suspect some sort of disk issue, there is nothing to validate that.  A "dd" is slower than we would expect, but other IO seems very respectable.  There are other VMs running on the host but they are quiet.  Also no swapping going on.  Linux reports not a whole lot of memory free, but it's being used for filesystem caching.  We have restarted the VM, and the problem remains.

    Today we also found the same performance hit with a query that was using an AUTOINDEX.  The query, which had not been a problem before, is now taking 30+ seconds.  I replaced the AUTOINDEX with a real index and it now runs in 0.02 seconds.  I suspect again that the autoindex query was slow because of temp space.

    Another example is this...

    This is fast (1-2 sec):

    create temp table mw_mytemptable
    (
    col1 char(30),
    col2 integer,
    col3 char(4),
    col4 char(4),
    col5 char(4),
    col6 char(1),
    col7 datetime year to second,
    col8 datetime year to second
    ) extent size 4096 next size 4096 lock mode page;
    insert into mw_mytemptable select * from mytable;

    "mytable" has about 600,000 records.

    But this is really slow (30+ seconds):

    select * from mytable into temp tmp_mw1 with no log;

    I haven't tried the first insert without the extent sizes, but suspect it wouldn't make a whole lot of difference here.

    When using iotop to monitor IO, the first SQL shows dbaccess as the process doing IO at several MB/sec. The second SQL shows multiple oninit processes doing IO at 100-200 KB/sec.

    I have added additional CPU VPs and AIO VPs, and it doesn't help.  Checkpoint times have increased substantially since the problems started.

    Again, sounds like disk, but then I get thrown off because of things like the insert shown above was so very fast, even the first time it was run, in case caching is playing a part.  Also when using PSORT_DBTEMP, the output is very fast.  Unloading the large table to a file is fast.

    Has anybody seen anything like this before, or have any ideas?

    Thanks,

    Mike




    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------

    #Informix


  • 2.  RE: Slow sorting/temp space

    Posted Sun January 23, 2022 02:07 AM
    Preface my comments with:
    * I have no experience with or access to Informix 10.
    * I have not seen this specific issue before but offer general suggestions.

    > The system had been working well until a week or so ago, when performance suddenly went bad.

    How long had it been working well?  days, weeks, months?
    Try to determine if this is a recent problem or something that has been occurring
    for "a while" and someone just started to notice it.

    How long has the VM been up?  Have you tried a reboot of the VM? 
    Many times the old one/zero switch does wonders.

    Can you export the VM and import it on another standalone system, or one that
    is not shared with other VMs to see if the problem persists?

    What other processes are running on the system at the time performance degrades?
    Check with top(1) or "ps -efl".
    Are other users on the system at the time?  What are they doing?
    Maybe a cron job is running at the same time?
    A disk intensive process like updatedb to update the locate(1) database,
    or a backup running?

    Is Informix using file systems or raw disk?
    If file system, what type?
    I remember reading that journaling file systems should be avoided with Informix
    because the database does it's own journaling.  IBM JFS was supposedly OK
    (surprise--IBM product), but avoid ext3/ext4, xfs and others.

    Are any disk partitions full or near full?  specifically:
    /tmp, /dev/shm, or the partition with the Informix chunks

    Hope those ideas help.
    scot


  • 3.  RE: Slow sorting/temp space

    IBM Champion
    Posted Sun January 23, 2022 06:06 PM
    Scot - thank you for your suggestions.

    The system has been working well for months, if not years.  The problem is new...performance suddenly just fell off a cliff.  It's not just a problem that slowly got worse.

    The VM has been restarted a couple of times now, and the problem remains.  The host has not yet been restarted though.  Exporting the VM may be possible, but we need to be fairly confident that will help before that option is pursued.

    We have confirmed that the performance issues persist even when nobody else is on the system and Informix is the only thing running.  "top" shows an idle system until we start running some SQL.  When we run SQL using temp space, we do see iowait jump up.

    Informix chunks are using cooked files.  I will check the filesystem type when I'm able to get on next time, but this is a recent issue, and no change to fs type.

    One thing that I do need to take a look at is that some filesystems are about 94% used.  I do know that can be a problem for some OS and some SANs.  This may be worth investigating further.

    Again, thank you for your ideas.

    Mike





    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 4.  RE: Slow sorting/temp space

    Posted Mon January 24, 2022 05:03 AM

    Hello Mike,

    there are differences between explicit and implicit temporary tables depending on settings
    like DBSPACETEMP (onconfig variable / onint, dbaccess environment variable).

    Have you tried CREATE TEMP TABLE ... WITH NO LOG; INSERT INTO ...   ?

    Which disk is showing iowaits (iostat -xmz 1 1000) ?

    /tmp is memory/ramdisk on linux, so sorts are much faster here with PSORT_DBTEMP set to /tmp..

    Cheers,

    Markus



    ------------------------------
    Markus Holzbauer
    ------------------------------

    #Informix


  • 5.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 10:44 AM
    Thanks for your feedback.

    I did not try creating the explicit temp table with no log and then the insert.  But without the "with no log", it was super fast.  

    I am unable to get on the system at the moment, but I think that there was just the one physical disk, and that was showing all of the iowaits.

    I should have said that we set PSORT_DBTEMP to various values, including the same filesystems as each of the temp dbspace chunks, and the queries were very fast regardless.  Just having it set improved performance tremendously over just using DBSPACETEMP.

    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 6.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 06:42 AM

    HI,

    For difficult issues I normlaly start from the bottom up, validate each layer is working fine before checking the next layer.

    Has the storage been checked?

    Check for failed disks or RAID failures/rebuilds.

    Sounds like battery backed cache where the battery died and the write cache is no longer operative.

    Check any SAN switches for ports erroring or flapping.

    Also check host os error logs for cpu/memory errors.

    Once the hardware is validated check for any firmware or software changes.

    Regards,
    David.

    ------------------------------
    David Williams
    ------------------------------