Hi Art,
I can certainly see why the 7184 row table would be scanned, as an index with only 8 distinct values isn't very selective. And if I understand correctly, the 8 distinct values each has the same number of rows (898), so distributions wouldn't be a factor in optimizing. Unless the index was a clustering index, in which case the optimizer might think that it could eliminate some pages from the scan. Like if it thought it could scan just 5 pages rather than the entire 36. However, the clustering could get messed up by update activity after the UPDATE STATISTICS had been run, so as Mike said above, "better safe than sorry."
I'm curious about your two row table. How many columns are in the table? The column is named child_two_cols, so I was thinking that if the table only had two columns, an index scan could be favored so that it could do the key-only scan, which is noted in the explain output.
The table that prompted the original posting has 11 columns, 163 rows, taking up 9 pages. I was wondering whether the original "8 pages or less" rule had been expanded so that this one would fall under the new rule.
I subsequently dug into the main queries against this table and found that they were filtering on a non-indexed column. Unfortunately, an index on that column would be even less selective than your 36 page example, as there are only two values for that column. I found that I could eliminate the scans by either creating a composite index and rewriting the query to use that new index, or by rewriting the query another way that did not require any new indexes.
------------------------------
mark collins
------------------------------
Original Message:
Sent: Tue October 24, 2023 06:14 PM
From: Art Kagel
Subject: Optimizer question
Mark:
There is no hard and fast rule. It will always depend on what the optimizer decides is the most efficient way to accomplish the query. For example, here is a query against a table with 7184 rows on 36 pages, 898 rows each with an "id" column value between 1 and 8. Doing a SELECT * from tb_test2 where id = 8; will perform a sequential scan. However, another table has two rows on a single page and doing select * from child where key = 3 performs an index lookup. Witness:
QUERY: (OPTIMIZATION TIMESTAMP: 10-24-2023 17:53:01)
------
select * from child where key = 3
Estimated Cost: 1
Estimated # of Rows Returned: 1
1) art.child: INDEX PATH
(1) Index Name: art.child_two_cols
Index Keys: key two (Key-Only) (Serial, fragments: ALL)
Lower Index Filter: art.child.key = 3
Query statistics:
-----------------
Table map :
----------------------------
Internal name Table name
----------------------------
t1 child
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t1 1 1 1 00:00.00 2
QUERY: (OPTIMIZATION TIMESTAMP: 10-24-2023 18:05:00)
------
select * from tb_test2 where id = 8
Estimated Cost: 252
Estimated # of Rows Returned: 898
1) informix.tb_test2: SEQUENTIAL SCAN
Filters: informix.tb_test2.id = 8
Query statistics:
-----------------
Table map :
----------------------------
Internal name Table name
----------------------------
t1 tb_test2
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t1 898 898 7184 00:00.00 253
Art S. Kagel, President and Principal Consultant
ASK Database Management
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
Original Message:
Sent: 10/24/2023 4:17:00 PM
From: mark collins
Subject: Optimizer question
In an Informix class many years ago, back in the 7.x days, the instructor told us that the optimizer would always do a table scan when accessing tables 8 pages or less. There may have been some exceptions if the access was to check a referential integrity constraint or something like that, but if you were simply doing a "SELECT ... FROM some_small_table", it would not access any indexes that might exist on the table. The reason given was that it would be faster to read the 8 2k pages into memory in a single I/O operation and scan that 16kb than to do one or two index reads to identify the rowid and then fetch the correct data page and return the desired row to the client.
Does anyone know whether that still is the case? If so, is the cutoff still 8 pages? Or given that we can now have different page sizes, is it a certain number of kbytes?
Thanks.
------------------------------
mark collins
------------------------------