Informix

 View Only
Expand all | Collapse all

Slow sorting/temp space

  • 1.  Slow sorting/temp space

    IBM Champion
    Posted Sat January 22, 2022 05:05 PM
    Forgive the long post, but I have been working on a system showing poor query performance and am at a bit of a loss.  It's Informix 10, and an upgrade is not a possibility.  This is on RHEL and running on a VM.

    The system had been working well until a week or so ago, when performance suddenly went bad.  No changes were made, no restarts, no nothing.  Nothing in the Linux message log or boot log indicating any issues.

    We have traced the main performance issue to queries that use ORDER BY or GROUP BY.  A query that takes about 1 second, turns to 20-30 seconds with an order by, and it's only 50,000 rows that are returned.  Monitoring the thread when running shows that it spends most of the time in "cond wait bufcond", which has to do with a sort waiting on output buffers, but that's all I know on that!

    If we set PSORT_DBTEMP=/tmp, the query runs very quickly.  If we set PSORT_DBTEMP to the same filesystem as the (cooked) chunks, the query still runs quickly.

    The CPUs do show a fairly high iowait% when even a single query is running, so while we suspect some sort of disk issue, there is nothing to validate that.  A "dd" is slower than we would expect, but other IO seems very respectable.  There are other VMs running on the host but they are quiet.  Also no swapping going on.  Linux reports not a whole lot of memory free, but it's being used for filesystem caching.  We have restarted the VM, and the problem remains.

    Today we also found the same performance hit with a query that was using an AUTOINDEX.  The query, which had not been a problem before, is now taking 30+ seconds.  I replaced the AUTOINDEX with a real index and it now runs in 0.02 seconds.  I suspect again that the autoindex query was slow because of temp space.

    Another example is this...

    This is fast (1-2 sec):

    create temp table mw_mytemptable
    (
    col1 char(30),
    col2 integer,
    col3 char(4),
    col4 char(4),
    col5 char(4),
    col6 char(1),
    col7 datetime year to second,
    col8 datetime year to second
    ) extent size 4096 next size 4096 lock mode page;
    insert into mw_mytemptable select * from mytable;

    "mytable" has about 600,000 records.

    But this is really slow (30+ seconds):

    select * from mytable into temp tmp_mw1 with no log;

    I haven't tried the first insert without the extent sizes, but suspect it wouldn't make a whole lot of difference here.

    When using iotop to monitor IO, the first SQL shows dbaccess as the process doing IO at several MB/sec. The second SQL shows multiple oninit processes doing IO at 100-200 KB/sec.

    I have added additional CPU VPs and AIO VPs, and it doesn't help.  Checkpoint times have increased substantially since the problems started.

    Again, sounds like disk, but then I get thrown off because of things like the insert shown above was so very fast, even the first time it was run, in case caching is playing a part.  Also when using PSORT_DBTEMP, the output is very fast.  Unloading the large table to a file is fast.

    Has anybody seen anything like this before, or have any ideas?

    Thanks,

    Mike




    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------

    #Informix


  • 2.  RE: Slow sorting/temp space

    Posted Sun January 23, 2022 02:07 AM
    Preface my comments with:
    * I have no experience with or access to Informix 10.
    * I have not seen this specific issue before but offer general suggestions.

    > The system had been working well until a week or so ago, when performance suddenly went bad.

    How long had it been working well?  days, weeks, months?
    Try to determine if this is a recent problem or something that has been occurring
    for "a while" and someone just started to notice it.

    How long has the VM been up?  Have you tried a reboot of the VM? 
    Many times the old one/zero switch does wonders.

    Can you export the VM and import it on another standalone system, or one that
    is not shared with other VMs to see if the problem persists?

    What other processes are running on the system at the time performance degrades?
    Check with top(1) or "ps -efl".
    Are other users on the system at the time?  What are they doing?
    Maybe a cron job is running at the same time?
    A disk intensive process like updatedb to update the locate(1) database,
    or a backup running?

    Is Informix using file systems or raw disk?
    If file system, what type?
    I remember reading that journaling file systems should be avoided with Informix
    because the database does it's own journaling.  IBM JFS was supposedly OK
    (surprise--IBM product), but avoid ext3/ext4, xfs and others.

    Are any disk partitions full or near full?  specifically:
    /tmp, /dev/shm, or the partition with the Informix chunks

    Hope those ideas help.
    scot


  • 3.  RE: Slow sorting/temp space

    IBM Champion
    Posted Sun January 23, 2022 06:06 PM
    Scot - thank you for your suggestions.

    The system has been working well for months, if not years.  The problem is new...performance suddenly just fell off a cliff.  It's not just a problem that slowly got worse.

    The VM has been restarted a couple of times now, and the problem remains.  The host has not yet been restarted though.  Exporting the VM may be possible, but we need to be fairly confident that will help before that option is pursued.

    We have confirmed that the performance issues persist even when nobody else is on the system and Informix is the only thing running.  "top" shows an idle system until we start running some SQL.  When we run SQL using temp space, we do see iowait jump up.

    Informix chunks are using cooked files.  I will check the filesystem type when I'm able to get on next time, but this is a recent issue, and no change to fs type.

    One thing that I do need to take a look at is that some filesystems are about 94% used.  I do know that can be a problem for some OS and some SANs.  This may be worth investigating further.

    Again, thank you for your ideas.

    Mike





    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 4.  RE: Slow sorting/temp space

    Posted Mon January 24, 2022 05:03 AM

    Hello Mike,

    there are differences between explicit and implicit temporary tables depending on settings
    like DBSPACETEMP (onconfig variable / onint, dbaccess environment variable).

    Have you tried CREATE TEMP TABLE ... WITH NO LOG; INSERT INTO ...   ?

    Which disk is showing iowaits (iostat -xmz 1 1000) ?

    /tmp is memory/ramdisk on linux, so sorts are much faster here with PSORT_DBTEMP set to /tmp..

    Cheers,

    Markus



    ------------------------------
    Markus Holzbauer
    ------------------------------

    #Informix


  • 5.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 10:44 AM
    Thanks for your feedback.

    I did not try creating the explicit temp table with no log and then the insert.  But without the "with no log", it was super fast.  

    I am unable to get on the system at the moment, but I think that there was just the one physical disk, and that was showing all of the iowaits.

    I should have said that we set PSORT_DBTEMP to various values, including the same filesystems as each of the temp dbspace chunks, and the queries were very fast regardless.  Just having it set improved performance tremendously over just using DBSPACETEMP.

    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 6.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 06:42 AM

    HI,

    For difficult issues I normlaly start from the bottom up, validate each layer is working fine before checking the next layer.

    Has the storage been checked?

    Check for failed disks or RAID failures/rebuilds.

    Sounds like battery backed cache where the battery died and the write cache is no longer operative.

    Check any SAN switches for ports erroring or flapping.

    Also check host os error logs for cpu/memory errors.

    Once the hardware is validated check for any firmware or software changes.

    Regards,
    David.

    ------------------------------
    David Williams
    ------------------------------



  • 7.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 10:49 AM
    Thanks David.  It hasn't been easy to get the storage checked, and hope to have some info on that later today.  Also some info on the host, as the client is showing that all is well.

    I was thinking that a rebuild of the RAID disk may be happening and impacting the write speed, but don't know the answer to this...yet.  I think some of the confusion is why some writes are very fast, including the explicit temp table, and including output to disk, and when PSORT_DBTEMP is set to a filesystem, but the implicit temp table with an order by or group by is so painfully slow.

    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 8.  RE: Slow sorting/temp space

    IBM Champion
    Posted Wed February 02, 2022 07:56 PM
    We have a winner!! :-)

    I just got an update on the performance problem.  It turned out that it was indeed that the batteries for the write cache had failed, and so the write cache was disabled.  The batteries were replaced, the system restarted, and performance has returned to normal.  Thank you David for your very specific suggestion!

    It was interesting how some writes were fine, but writes to Informix temp spaces using implicit temporary tables were poor.  That threw me, but glad that the problem has been solved.

    Thank you everyone for all of your suggestions.


    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 9.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 03:31 AM
    Hi Mike,

    you try many things :-)  And I agree Informix10 should be fast.

    Maybe two more things to try: PSORT_NPROCS=nn  and play with buffered mode...

    Good Luck
    Henri

    ------------------------------
    Henri Cujass
    leolo IT, CTO
    Germany
    IBM Champion 2021
    ------------------------------



  • 10.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 10:51 AM
    Thank Henri.  We haven't played with PSORT_NPROCS much as as soon as we set PSORT_DBTEMP, all is well.  PSORT_NPROCS isn't set in the environment, so doesn't explain why things ran fine one day, and not the next.  It's a strange one...


    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 11.  RE: Slow sorting/temp space

    Posted Mon January 24, 2022 04:01 AM
    Hi Mike,
    looks like the problem is related to operations using the tempdbspace(s). As they're cooked files, maybe they've got to a state causing perf. issues (fragmentation or whatever). How about dropping them and re-create from scratch?
    Cheers, -tz-

    ------------------------------
    -tz-
    https://www.linkedin.com/in/tomzahr/
    ------------------------------



  • 12.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 10:54 AM
    I have been thinking about doing just this, and wish that I had tried it already so that I could provide some feedback!  Recreating the temp space may be my next thing to try if the disks and hardware report as okay.  Thank you.

    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 13.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 04:44 AM
    So is "with no log" making the difference?

    Or what else would be the difference between

    insert into mw_mytemptable select * from mytable;
    vs.
    select * from mytable into temp tmp_mw1 with no log;

    So how would this run:

    select * from mytable into temp tmp_mw1;

     Andreas

    ------------------------------
    Andreas Legner
    ------------------------------



  • 14.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 10:56 AM
    Thanks Andreas - I do need to try this.  I don't have access to the system at the moment to try the with and without the logging, and also with and without the extent sizes when I used the explicit temp table.  When I get a chance, I will report back.  Thank you.

    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 15.  RE: Slow sorting/temp space

    Posted Mon January 24, 2022 06:58 AM
    Hi Mike,

    The evidence you describe does point to an I/O problem. All writing to disk sounds slow except where the filesystem cache can mask the issue.

    Can you get someone to look at the host? If the problem is there, I would not expect to see anything in the system message log on the guest. You say you didn't change anything.

    One other thought: I don't think v10 has DS_NONPDQ_QUERY_MEM parameter so you are restricted to 128 kB sort memory. Did the sort of the 50000 row table just become too large to fit in memory?

    Ben.

    ------------------------------
    Benjamin Thompson
    ------------------------------



  • 16.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 07:23 AM
    Ben:

    DS_NONPDQ_QUERY_MEM came in in v10.00 so ....

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 17.  RE: Slow sorting/temp space

    IBM Champion
    Posted Mon January 24, 2022 11:01 AM
    Hi Ben - and thank you for the response.

    We have somebody looking at the host configuration and the disks today.  I am hoping that it will reveal the issue.

    Informix 10 does indeed have DS_NONPDQ_QUERY_MEM.  It is set to a low value, and while I will recommend that it's made a bit bigger, I think that there's more to it.  I like your idea that we just crossed a threshold where we have shifted from in-memory sorting to sorting on disk, but it's not just this one query - there are quite a few (all with order bys) that have slowed down.  Also another query that used an AUTOINDEX went from under 1 second to over 30 seconds - so temp space, or at least some disk writes, appear to be the common denominator here.

    Thanks!

    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 18.  RE: Slow sorting/temp space

    Posted Mon January 24, 2022 10:33 PM
    According to my past experience, it seems to be related to the performance of virtual machines.
    Because in the virtual machine environment, the performance and indicators of the operating system we see may not be true, just for reference.
    It is recommended that you observe disk IOPs from the virtualization level. Or try to migrate the virtual machine to an idle physical machine.

    ------------------------------
    ZhiWei Cui
    GBASE
    ------------------------------