Malek,
Logically, nearly every DISTINCT could be replaced internally by a GROUP BY on all columns of the result set in most cases. However DB2 does not do that for us. It can be that a subquery contains OLAP specs, as example, that may cause GROUP BY to be executed before OLAP, but DISTINCT afterwards on the results of the OLAP, in this case making DISTINCT different to GROUP BY. Also it can be that the SQL is coded to specify far more columns than necessary to uniquely identify rows.
Uniqueness actually has less requirements than GROUP BY because it does not need the result data to be in any particular sequence.
It would be quite hard to be very definitive about particular SQLs as to why GROUP BY performs better without looking at the specific SQL and access path.
I have noticed that Column Store Databases can have nasty access paths for some things better suited to Row store databases, like subqueries with FETCH FIRST 1 ROWS ONLY as an example. Things like this may find an optimization for GROUP BY that is not found for DISTINCT. This may not be issue in your queries, however sometimes queries need to be designed to suit the platform being run on.
I would comment that DISTINCT is a heavily over used "technique" in SQLs often not really needed, but used in subqueries or Common Table Expressions to force materialise an intermediate result set (applicable to row store databases more so)
On a column store database, normal use of DISTINCT ought to perform O.K. just doing a sort to workfile, perhaps degraded if far too many columns are included in the DISTINCT.
Having a really bad performing DISTINCT, one needs to analyse the access path to determine why it is so bad, and if there is a reason it might need to behave differently to a GROUP BY in that case.
------------------------------
Michael Hannan
------------------------------
Original Message:
Sent: Thu December 05, 2024 01:36 PM
From: malek shabou
Subject: distinct vs group by
Hi,
we are moving many workflow from DB2 iSeries to DB2 LUW (with BLU) and we have some queries that dont finish on DB2 LUW
We have noticed that DB2 BLU some time is not able to execute complex queries with many subqueries that include DISTINCT, query take hours without finishing or fill all the temps TBS.
changing the DISTINCT by a GROUP BY solve the probleme.
is distinct and group by produce differents execution plan ?
are there any special setup (db2set) so that DB2 execute distinct like a group by.
------------------------------
malek shabou
------------------------------