Figure 5.
Let's re-write the query to sort the transaction history by the dates (see Listing 3).
select th.TransactionID, th.TransactionDate, th.Quantity, pr.[Name] from Production.TransactionHistory th inner join Production.Product pr on th.ProductID=pr.ProductID where th.TransactionID between 100000 and 100060 order by th.TransactionDate; |
Listing 3
And the next execution plan contains an external sort operation (see Figure 6 ).
Figure 6.
Now let me describe a simple two-step process to tune the performance when executing several SQL statements against the database system. The first step is to identify the most costly statement in the sequence of statements. The second step is to focus on the most costly operation within the costly identified statement.
Now let's see how to get detailed information concerning the execution plan using the T-SQL statements: SET SHOWPLAN_ALL and SET SHOWPLAN_TEXT. The SET SHOWPLAN_ALL returns a report of the query execution plan in a tabular form with multiple rows and columns. The SET SHOWPLAN_TEXT returns a report in a single column with multiple rows for each operation. Let's execute the complex query in Listing 3 using the SET SHOWPLAN_ALL commands to see the execution plan (see Listing 4).
set showplan_all on go
select th.TransactionID, th.TransactionDate, th.Quantity, pr.[Name] from Production.TransactionHistory th inner join Production.Product pr on th.ProductID=pr.ProductID where th.TransactionID between 100000 and 100060 order by th.TransactionDate; go
set showplan_all off go |
Listing 4
And the result resembles the Figure 7.
Figure 7.
In addition to the SHOWPLAN commands, SQL Server 2005 provides two important commands which are used to get query and batch execution statistics: SET STATISTICS IO, and SET STATISTICS TIME. SET STATISTICS IO returns disk activity to return a result set. SET STATISTICS TIME returns the number of milliseconds to parse, compile, and execute SQL statements. Let's test these two commands by using the complex query in Listing 3. In the first example, SET STATISTICS IO is enable to report where IO hot spots are occurring (see Listing 5).
set statistics io on go
select th.TransactionID, th.TransactionDate, th.Quantity, pr.[Name] from Production.TransactionHistory th inner join Production.Product pr on th.ProductID=pr.ProductID where th.TransactionID between 100000 and 100060 order by th.TransactionDate; go
set statistics io off go |
Listing 5
And the result is shown in Listing 6.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Product'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'TransactionHistory'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. |
Listing 6
You can see that physical read values (reads from the disk) and logical read values (reads from the cache). You can see a pseudo-table named Worktable in the report which is associated to GROUP BY, ORDER BY, hash joins, and UNION operations. This table is created on the tempdb database for the duration of the query, and it's automatically removed when the SQL statement finishes.
Now let's see an example with the SET STATISTICS TIME command (see Listing 7).
set statistics time on go
select th.TransactionID, th.TransactionDate, th.Quantity, pr.[Name] from Production.TransactionHistory th inner join Production.Product pr on th.ProductID=pr.ProductID where th.TransactionID between 100000 and 100060 order by th.TransactionDate; go
set statistics time off go |
Listing 7
And the underlying report is shown in Listing 8.
SQL Server Execution Times: CPU time = 10 ms, elapsed time = 150 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. |
Listing 8
In this case, the parse and compile time is 0 because this plan was stored in the database system cache. But you can measure the amount of time the query takes to execute. To parse and compile the query again, you need to include the dbcc freeproccache command (see Listing 9).
dbcc freeproccache go
set statistics time on go
select th.TransactionID, th.TransactionDate, th.Quantity, pr.[Name] from Production.TransactionHistory th inner join Production.Product pr on th.ProductID=pr.ProductID where th.TransactionID between 100000 and 100060 order by th.TransactionDate; go
set statistics time off go |
Listing 9
Now we're able to see the parse and compile time (see Listing 10).
DBCC execution completed. If DBCC printed error messages, contact your system administrator. SQL Server parse and compile time: CPU time = 10 ms, elapsed time = 14 ms.
(61 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 123 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. |
Listing 10
Now let's discuss how statistics are used for efficient query processing and performance, allowing the database systems to correctly choose the physical operations when generating an execution plan. This statistics on table and view are manually and automatically generated and used to create efficient execution plans. The statistics enable the database system to choose from various auxiliary structures such as indexes taking into account the actual content of the database object which might be used in the execution plan.
I'll show how to work directly with statistics in SQL Server, but it's remarkable to say that SQL Server manages automatically the creation and update of statistics whenever possible. So, you should use these T-SQL statements for especial scenarios, for example, when you change a significant amount of data.
To manually create statistics, you need to use the CREATE STATISTICS statement (Listing 11).
CREATE STATISTICS statistics_name ON {table|view}(column[,…n]) [WITH [FULLSCAN|SAMPLE number {PERCENT|ROWS}|STATS_STREAM][NORECOMPUTE]] |
Listing 11
Let's see an example by creating the statistics on the Color attribute of the Production.Product table (see Listing 12).
create statistics stats_product_color on Production.Product(Color) with fullscan; |
Listing 12
Whenever you want to update the new statistics, you need to use the UPDATE STATISTICS statement. For example, if you have inserted a lot of new rows and updated others and you want to update the statistics stats_product_color, you run the following statement (see Listing 13).
update statistics Production.Product stats_product_color with fullscan; |
Listing 13
In the case, you want to generate statistics across all the tables in a database (not running the CREATE STATISTICS and UPDATE STATISTICS statements for each table), you have to use the sp_createstats system stored procedure.
Once you have generated the statistics, you may want to see the statistics. You can use the DBCC SHOW_STATISTICS command (see Listing 14).
DBCC SHOW_STATISTICS ('table_name'|'view_name', statistics_name) |
Listing 14
For example, to see the statistics stats_product_color, you have to run the following command (see Listing 15).
dbcc show_statistics('Production.Product',stats_product_color); |
Listing 15
The results of this operation is shown in Listing 16. Name Updated Rows Rows Sampled Steps Density Average key length String Index -------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ stats_product_color Oct 10 2008 10:09PM 504 504 9 0 5.178571 YES
(1 row(s) affected)
All density Average Length Columns ------------- -------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.1 5.178571 Color
(1 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS --------------- ------------- ------------- -------------------- -------------- Black 0 93 0 1 Blue 0 26 0 1 Grey 0 1 0 1 Multi 0 8 0 1 Red 0 38 0 1 Silver 0 43 0 1 Silver/Black 0 7 0 1 White 0 4 0 1 Yellow 0 36 0 1 |
Listing 16
Finally, if you want to remove your own statistics, you have to use the DROP STATISTICS command (see Listing 17).
drop statistics Production.Product.stats_product_color; |
Listing 17
Oracle database systems
Now let's discuss the principles and techniques of performance tuning in Oracle database. Initially, the Oracle database used a rule-based optimizer but this type of optimizer has some drawbacks, and in version 7, Oracle introduced the cost-based optimizer. The Oracle optimizer has two basic modes of operation: the FIRST_ROWS mode and the ALL_ROWS mode. The FIRST_ROWS mode's goal is to get the information back to the application as fast as possible. The ALL_ROWS mode's goal is to complete all the work as fast as possible. You can set the optimizer mode using the OPTIMIZER_MODE initialization parameter for a session (see Listing 18).
ALTER SESSION SET OPTIMIZER_MODE=mode |
Listing 18
The same way as in SQL Server, Oracle database defines three basic types of operations that can be part of the execution plan: data access, join operations and other operations such as sorting, aggregation and windowing.
Data access can be achieved in two ways: By accessing the data blocks in a table directly or by using an index to retrieve data through pointers in the index. As well as there are various ways to access a cluster and noncluster table. The main physical operations in data access are:
- Full table scan. This operation scans the entire table.
- Partition. This operation performs one or more access operations for each partition needed in the query.
- ROWID. This operation is the fastest way to access data, because ROWID is an identifier which includes information to go directly to the row inside a particular data block. Indexes include a ROWID for each row represented by the index entry, thus this operation follows the use of indexes.
- Cluster. This operation enables access to rows in a cluster table.
In the case of accessing data through an index, you can have several operations as well:
- Unique scan. Uses the index to retrieve a single row.
- Range scan. Uses the index to retrieve the ROWID for a range of index values.
- Full scan. This operation performs a full scan on the index.
- Fast full scan. This operation performs a full scan with multi block reads.
- Skip scan. This operation is used if the query uses the later columns of the index without using the leading columns, the Oracle can skip some index blocks if the leading columns contain the same value. For example, if an index contains a search key on the columns x,y,z of a table, and a query uses a selection criterion of z={some value}; the index skip scan would look for the first value of column x and y, and then see if there was a value for column z equal to {some value}. If the entry was not found, the Oracle could skip index blocks where the values of columns x and y are identical.
- Bitmap. This operation uses a bitmap index.
- Partition. It's similar to the partition operation on tables.
And finally, one important operation is the join operations. Joins work by taking a row from one table, called the driver table, and checking the value of the join key with another table. The Oracle optimizer will try and pick the driver table that will produce the smallest number of rows to reduce the overhead. There are several physical operations such as hash join.
Now let's talk about statistics in Oracle. The Oracle database gathers various statistics on database objects. For columns, the statistics are: number of distinct values for the column, average length of the column. For indexes, the statistics are: number of rows, number of distinct values, height of the index, number of leaf blocks and etc. For tables, the statistics are: number of rows, average of row length and the total number of blocks.
To generate statistics on database objects in Oracle, you have to use the built-in package DBMS_STATS. For example, to gather statistics on objects in a schema, you have to use the GATHER_SCHEMA_STATS procedure. If you are logged into the system as SCOTT, you can gather statistics on the database objects on the schema SCOTT as shown in Listing 19.
exec dbms_stats.gather_schema_stats(ownname=>'scott'); |
Listing 19
If you want to collect statistics for a particular table, you must use the dbms_stats.gather_table_stats. For example, to gather statistics associated to the emp table on the SCOTT schema (see Listing 20).
begin dbms_stats.gather_table_stats(user,'emp'); end; / |
Listing 20
Collecting statistics is a task of database administrators, and typically statistics are collected at non-peak times in order not to overload the system. Oracle 10g automatically collects statistics during the maintenance windows, which is by default between 10 P.M. and 6 A.M. every day.
To illustrate with examples, we're going to use database objects within the SCOTT schema in the default ORCL database. To get information concerning execution plan, you have to use the autotrace capability which turns on automatic tracing of SQL statements as well as enables displays information of the execution plan (see Listing 21).
Listing 21
The EXPLAIN PLAN functionality is the basis of the AUTOTRACE, because it's used to say the Oracle optimizer to persist the results of its execution plan in a table called PLAN_TABLE in the user's schema. The syntax is as shown in Listing 22.
EXPLAIN PLAN FOR sql_statement; |
Listing 22
Now let's see some examples. To test the examples, you must be logged in using a user with privileges to gather statistics and display execution plans. We're going to base our examples on tables within the SCOTT schema. The first example will create a table based on the SCOTT.emp table, then create an associated index and execute a query that uses this index to see the execution plan (see Listing 23).
create table tb_emp as select * from scott.emp; create unique index tb_emp_uidx on tb_emp(empno,ename); begin dbms_stats.gather_table_stats(user,'tb_emp'); end; set autotrace on; select * from tb_emp where empno=7788; |
Listing 23
Sometimes it is tricky to understand the execution plan by selecting all the columns from the plan_table table. Thus, we need to use the built-in package DBMS_XPLAN (see Listing 24).
delete from plan_table; explain plan for select * from scott.emp; select * from table(dbms_xplan.display); |
Listing 24
Conclusion
In this article, we've covered the principles and techniques you can use to help troubleshoot and evaluate a query performance by illustrating examples in Microsoft SQL Server and Oracle database. Now you can apply these techniques to your own scenario.