Informix

 View Only
  • 1.  Optimizer question

    Posted Tue October 24, 2023 04:17 PM

    In an Informix class many years ago, back in the 7.x days, the instructor told us that the optimizer would always do a table scan when accessing tables 8 pages or less.  There may have been some exceptions if the access was to check a referential integrity constraint or something like that, but if you were simply doing a "SELECT ... FROM some_small_table", it would not access any indexes that might exist on the table.  The reason given was that it would be faster to read the 8 2k pages into memory in a single I/O operation and scan that 16kb than to do one or two index reads to identify the rowid and then fetch the correct data page and return the desired row to the client. 

    Does anyone know whether that still is the case?  If so, is the cutoff still 8 pages?  Or given that we can now have different page sizes, is it a certain number of kbytes?  

    Thanks.



    ------------------------------
    mark collins
    ------------------------------


  • 2.  RE: Optimizer question

    IBM Champion
    Posted Tue October 24, 2023 04:36 PM

    Hi Mark,

    That is not the case anymore.  Even though it makes sense to scan a small table, I find that if there is an index on even a small table, the index will be used.  Perhaps not always but most of the time.

    My thought is that because the optimizer relies on statistics to choose a good query plan, then it now takes a "better safe than sorry" approach when accessing tables that it "thinks" are tiny.  So, if you have an enormous table with indexes, but the stats for the table say that the table is close to empty, the penalty of scanning the table is much worse than using the index and performing a few extra reads against a genuinely tiny table.  That's just what I think though...I know that there is much more cleverness going into the decision making process that the optimizer does. 



    ------------------------------
    Mike Walker
    xDB Systems, Inc
    www.xdbsystems.com
    ------------------------------



  • 3.  RE: Optimizer question

    Posted Tue October 24, 2023 06:44 PM

    Hi Mike,

    Thanks for the response.  I figured I could run a few test cases to see how it behaved in our environment, but I suspected that there would always be the possibility that there would be some edge case that I had not considered, so I didn't want to make too many blanket statements.  Your experience with small tables with indexes seems to bear that out.



    ------------------------------
    mark collins
    ------------------------------



  • 4.  RE: Optimizer question

    IBM Champion
    Posted Tue October 24, 2023 06:15 PM
    Mark:

    There is no hard and fast rule. It will always depend on what the optimizer decides is the most efficient way to accomplish the query. For example, here is a query against a table with 7184 rows on 36 pages, 898 rows each with an "id" column value between 1 and 8. Doing a SELECT * from tb_test2 where id = 8; will perform a sequential scan. However, another table has two rows on a single page and doing select * from child where key = 3 performs an index lookup. Witness:


    QUERY: (OPTIMIZATION TIMESTAMP: 10-24-2023 17:53:01)
    ------
    select * from child where key = 3

    Estimated Cost: 1
    Estimated # of Rows Returned: 1

     1) art.child: INDEX PATH

       (1) Index Name: art.child_two_cols
           Index Keys: key two   (Key-Only)  (Serial, fragments: ALL)
           Lower Index Filter: art.child.key = 3  


    Query statistics:
    -----------------

     Table map :
     ----------------------------
     Internal name     Table name
     ----------------------------
     t1                child

     type     table  rows_prod  est_rows  rows_scan  time       est_cost
     -------------------------------------------------------------------
     scan     t1     1          1         1          00:00.00   2        


    QUERY: (OPTIMIZATION TIMESTAMP: 10-24-2023 18:05:00)
    ------
    select  * from tb_test2 where id = 8

    Estimated Cost: 252
    Estimated # of Rows Returned: 898

     1) informix.tb_test2: SEQUENTIAL SCAN

           Filters: informix.tb_test2.id = 8  


    Query statistics:
    -----------------

     Table map :
     ----------------------------
     Internal name     Table name
     ----------------------------
     t1                tb_test2

     type     table  rows_prod  est_rows  rows_scan  time       est_cost
     -------------------------------------------------------------------
     scan     t1     898        898       7184       00:00.00   253      





    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 5.  RE: Optimizer question

    Posted Tue October 24, 2023 06:41 PM

    Hi Art,

    I can certainly see why the 7184 row table would be scanned, as an index with only 8 distinct values isn't very selective.  And if I understand correctly, the 8 distinct values each has the same number of rows (898), so distributions wouldn't be a factor in optimizing.  Unless the index was a clustering index, in which case the optimizer might think that it could eliminate some pages from the scan.  Like if it thought it could scan just 5 pages rather than the entire 36.  However, the clustering could get messed up by update activity after the UPDATE STATISTICS had been run, so as Mike said above, "better safe than sorry."

    I'm curious about your two row table.  How many columns are in the table?  The column is named child_two_cols, so I was thinking that if the table only had two columns, an index scan could be favored so that it could do the key-only scan, which is noted in the explain output.  

    The table that prompted the original posting has 11 columns, 163 rows, taking up 9 pages.  I was wondering whether the original "8 pages or less" rule had been expanded so that this one would fall under the new rule.

    I subsequently dug into the main queries against this table and found that they were filtering on a non-indexed column.  Unfortunately, an index on that column would be even less selective than your 36 page example, as there are only two values for that column.  I found that I could eliminate the scans by either creating a composite index and rewriting the query to use that new index, or by rewriting the query another way that did not require any new indexes.



    ------------------------------
    mark collins
    ------------------------------