Introduction
This article is going to highlight some of the different ways Db2 provides administrators to compare and analyze workloads, for general SQL statement analysis and optimization, as well as some real-time monitoring tools and different ways you can approach managing a databases performance. By taking full control of your databases performance monitoring tools you can optimize your statements and environment to have the most efficient possible workloads running that can have huge impacts on your bottom line.
Example Tuning Scenario
Let’s run through a sample tuning scenario from beginning to end to see how it’s actually done. Let’s work with the SAMPLE database in db2, (run db2sampl to create it if you haven’t before)
We will work with the following SQL statement:
`Select e.empno, e.lastname from emp e where e.empno in
(select d.mgrno from dept d where d.deptno like ‘D%’) ;`
If one were to actually just run this statement in the sample database you should receive the following two rows of output:
EMPNO LASTNAME
------ ---------------
000060 STERN
000070 PULASKI
So in more linguistic terms, this is a simple query that is basically just looking at two tables, the employee and department table, and is trying to identify any employee who is the manager of a department with a department number/manager number that starts with D. So perhaps these are the managers of the Data and the Demo department, or something, hypothetically.
So first, let’s actually run the EXPLAIN on the initial query we’re analyzing:
To explain and compile a statement from the command line use the following:
db2 SET CURRENT EXPLAIN MODE EXPLAIN
db2 –tvf <sqlfile>
db2exfmt –d <dbname> -1 -o <output_filename>
db2 SET CURRENT EXPLAIN MODE NO (when finished)
- The statement db2 SET CURRENT EXPLAIN MODE EXPLAIN causes all subsequent statements to only be explained and not executed
- db2exfmt can either format the most recently explained statement or you can identify the statement by the timestamp
- db2exfmt option -1 specifies the most recently explained statement
- db2exfmt option –w <timestamp> specifies the timestamp of the explained statement to be formatted.
To do that, first turn on EXPLAIN mode with the command:
“db2 set current explain mode EXPLAIN”
Then simply run the SQL once through db2 with
“db2 –tvf <sql_file_name>”
Then run db2exfmt and direct the output to whatever desired filename you like, and examine it:
“db2exfmt -d sample -1 -o jms_q1_explain.1.out”
The access plan looks like this for the initial query:
Reading the EXPLAIN plan:
The EXPLAIN plan is read from bottom to top. (But you read each individual from top to bottom). This just means that the operations at the very bottom are executed first. If there is a branch or a split and 2 operations are on the same horizontal level, then they will execute simultaneously (or essentially simultaneously)
There are 5 main parameters of each operation, the most important things to consider are probably just the ROWS RETURNED and the Cumulative Cost of each. The parameters are as seen in the example ACCESS PLAN chart of a sample join statement below. (Ignore the actual output, the following chart is only for LABELLING purposes only)
So we can see the TOTAL COST is estimated to be 95.5182.
We can see the query is using 2 different branches in order to execute this access plan. Perhaps we can rewrite it without the IN clause? (which can be notoriously inefficient sometimes)
Let’s try that, let’s rewrite the query like this now:
Select e.empno, e.lastname from emp e where exists (select 1 from dept d where d.mgrno = e.empno and d.deptno like ‘D%’)
Doing so, and re-running the steps to generate the EXPLAIN access plan, we immediately see a drastic performance improvement by about a factor of EIGHT! With a new total query cost being only 13.63, as compared to 95 before:
Where did this increase come from? From the looks of it, 80 out of the 95 units of work of the initial query was all taken up by the TABLESCAN of the Employee table. By changing the IN to a WHERE EXISTS clause, we greatly minimize the processing cost for the query, because we have vastly reduced that necessary TABLESCAN.
Minimizing TABLESCANS is a HUGE part of tuning and optimizing complex SQL statements.
SQL Performance Tuning Concepts
Db2 provides features designed to help database engineers evaluate the efficiency of the execution of their SQL, by displaying visual graphs of the fundamental operations taking place, in order to help compare different performance plans between different queries.
It uses an arbitrary unit-of-work cost estimation of each operation in order to do this. This is what’s further referred to as the operation “cost”. It is just an arbitrary value assigned to gage how much processor power and time is needed to execute each operation, as well as the total cost of the query. These are the fundamental numbers one should be most concerned about when making evaluations of performance plans.
Db2 performs all the behind-the-scenes compiling of queries into their necessary operations and access plans by what is termed the Db2 Optimizer. Just know this for any future references later.
Db2’s Optimizer compiles Source SQL and chooses an Execution Plan.
The following chart is the way the Db2 Optimizer goes about generating and executing SQL below the hood of Db2.
Db2 EXPLAIN
Db2 provides several tools by which you can access and display execution plans.
- Visual Explain – a Data Studio tool, with a user friendly GUI
- db2exfmt – command-line function, comprehensive explain tool (requires explain tables), works in conjunction with db2expln
- db2expln – a simple, line-mode explain
- dynexpln – a simple, dynamic line-mode explain - calls db2expln
- Explainable statements include:
- INSERT, UPDATE, DELETE
- MERGE
- REFRESH
- SELECT
- SET INTEGRITY
- VALUES
- You must create the explain tables prior to issuing the EXPLAIN statement.
- Special registers CURRENT EXPLAIN MODE or CURRENT EXPLAIN SNAPSHOT control the behavior of the explain facility.
Db2exfmt is my preferred tool for accessing the explain plans on the command line.
Finally, let’s just discuss some other general principles of SQL tuning in general.
SQL Tuning: Optimization Concepts
By tweaking your statements, and perhaps certain database parameters like altering relevant tablespaces or bufferpools, or adding indexes, and comparing access plan charts it is possible to GREATLY optimize your queries and drastically improve your databases performance analytics.
You may not think there is a whole lot of performance considerations to ponder over with such a simple query, but in fact there are.
First there are a number of questions to ask one can ask about the statement you want to tune:
- Any Table Scans? What’s causing it?
- Any Index Scans? What’s causing it?
- Any Partition Scans? What’s causing it?
- Which Index? Matching columns? Screening?
- Any Sorts? What’s causing it? How big is the sort?
- Any Join sorts? What other queries join to that table?
- Any subqueries? Can they be rewritten?
And finally and most importantly, “Can the query be RE-written at all?”
The final, easiest and most common way to improve query performance is to RUN STATISTICS on the relevant tables. That, and creating indexes, are the most obvious and effective built in tools for optimizing query performance.
The primary use for explain information is the analysis of access paths for query statements. There are a number of ways in which analyzing the explain data can help you to tune your queries and environment.
Consider the following kinds of analysis:
The proper indexes can significantly benefit performance. Using explain output, you can determine whether the indexes that you have created to help a specific set of queries are being used. Look for index usage in the following areas:
- Join predicates
- Local predicates
- GROUP BY clause
- ORDER BY clause
- WHERE XMLEXISTS clause
- The select list
You can also use the explain facility to evaluate whether a different index or no index at all might be better. After you create a new index, use the RUNSTATS command to collect statistics for that index, and then recompile your query. Over time, you might notice (through explain data) that a table scan is being used instead of an index scan. This can result from a change in the clustering of the table data. If the index that was previously being used now has a low cluster ratio, you might want to:
- Reorganize the table to cluster its data according to that index
- Use the RUNSTATScommand to collect statistics for both index and table
- Recompile the query
To determine whether reorganizing the table has improved the access plan, examine explain output for the recompiled query.
Analyze the explain output, and look for data access types that are not usually optimal for the type of application that you are running. For example:
- Online transaction processing (OLTP) queries
OLTP applications are prime candidates for index scans with range-delimiting predicates, because they tend to return only a few rows that are qualified by an equality predicate against a key column. If your OLTP queries are using a table scan, you might want to analyze the explain data to determine why an index scan is not being used.
The search criteria for a browse type query can be very vague, resulting in a large number of qualifying rows. If users usually look at only a few screens of output data, you might specify that the entire answer set need not be computed before some results are returned. In this case, the goals of the user are different than the basic operating principle of the optimizer, which attempts to minimize resource consumption for the entire query, not just the first few screens of data.
For example, if the explain output shows that both merge scan join and sort operators were used in the access plan, the entire answer set will be materialized in a temporary table before any rows are returned to the application. In this case, you can attempt to change the access plan by using the OPTIMIZE FOR clause on the SELECT statement. If you specify this option, the optimizer can attempt to choose an access plan that does not produce the entire answer set in a temporary table before returning the first rows to the application.
If a query joins two tables, check the type of join being used. Joins that involve more rows, such as those in decision-support queries, usually run faster with a hash join or a merge join. Joins that involve only a few rows, such as those in OLTP queries, typically run faster with nested-loop joins. However, there might be extenuating circumstances in either case-such as the use of local predicates or indexes-that could change how these typical joins work.
In conclusion, by utilizing these tools and others, database administrators can take full control of the queries happening in their database environment. It is essential to have some kind of understanding on how queries are actually executed on a more fundamental operation level, in order to optimize more complex queries. Doing so can have drastic impacts on your databases performance, your time, and your bottom line.
#Db2