This series of articles will discuss SQL server performance. In the title, I use SQL Performance because the articles could include SQL statement (Query) optimization and also include SQL Server performance issues, and probably include stored procedure performance issues.
Introduction
Performance tuning SELECT statements can be a time-consuming task that follows the Pareto principle’s 80/20 rule: 20% effort is likely to give you an 80% performance improvement. To get another 20% performance improvement you probably need to spend 80% of the time.
In this article, we will emphasize the 20% effort with 80% benefits to improve performance before we move to query plans and some complicated or detailed issues that I leave to the second article, SQL Performance (2), Optimized SELECT Query (B). In this way, it will be easier for both me and the readers to follow.
So, this article will be the easy part, but the important part for tuning the SQL query to improve the performance.
The Causes affecting query run time
Obviously, the factors affecting query run time are the size of the query return and the complexity of the query,
- Size
The larger the table (more columns), the longer the records (more rows), and the worse the query performance will be.
- Complexity
The more complex the query, the worse the query performance will be. Some major factors for query complexity as,
- Joins
If your joins substantially increase the row count of the result set, your query is likely to be slow.
- Aggregations
Combining multiple rows to produce a result requires more computation than simply retrieving those rows.
The Major Improvement for Tuning Query
Obviously reducing the size and avoiding the complexities of the query will be the solutions,
- Reduce Size. Limit the Size of the query Set
- Limit Columns: Only Select Fields You Need
- Limit Rows: Select Reasonable Record length
- Remove Unnecessary Tables in the query
- Remove OUTER JOINS
- Enforce Indexes to Improve Performance
- Make indexes on all fields used in the WHERE and JOIN portions of the SQL statement
- Remove Calculated Fields in JOIN and WHERE Clauses
- Reduce Complexity. will be discussed in detail in SQL Performance (2), Optimized SELECT Query (B)
- Run your query during off-peak hours
- Selecting from large tables (>1,000,000 records)
- Cartesian Joins or CROSS JOINs
- Looping statements
- SELECT DISTINCT statements
- Nested subqueries
- Wildcard searches in long text or memo fields
- Multiple schema queries
We will discuss the 20% effort to improve the performance here and leave the 80% work in the next article, SQL Performance (2), Optimized SELECT Query (B).
1. Limit the Size of the query Set
a. Limit Columns. Only Select the Fields You Need
SELECT fields instead of using SELECT *.
Do not use,
SELECT *
FROM Customers
Use,
SELECT FirstName, LastName, Address, City, State, Zip
FROM Customers
Assume the Customers table has 10 columns with 1 million rows, if the second query used 2 seconds including the network traffic to get UI, the first one probably will cost double the time.
b. Limit Rows: Select reasonable Record Length
For very long return records, say, thousands of lines, such as,
SELECT Product, Price, Amount, Total
FROM Orders
you should get a reasonable length by either adding a restricting condition, such as,
SELECT Product, Price, Amount, Total
FROM Orders
WHERE OrderDate >= '2014-03-01' AND OrderDate < '2014-04-01'
or by using a LIMIT statement,
SELECT Product, Price, Amount, Total
FROM Orders
LIMIT 1000
TOP in SQL Server,
SELECT TOP 1000 Product, Price, Amount, Total
FROM Orders
c. Remove Unnecessary Tables in the query
During development, it is possible that you add tables to the query that may not finally have any impact on the data returned. Removing the JOINS to these unnecessary tables reduce the time of processing the extra data, such as,
SELECT c.FirstName, c.LastName, c.Address, c.City, c.State, c.Zip
FROM Customers c INNER JOIN Orders o
ON c.ID = o.CustomerID
The Orders table finally is not used at all, that should be eliminated from the Query.
2. Enforce Indexes
a. Make indexes on all fields used in the WHERE and JOIN portions of the SQL statement
There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Reason
When fields are not indexed, SQL Server will typically do a full table scan and this may reduce performance. Unless the table is very small, a table scan tends to yield the worst performance out of all the types of database reads.
b. Remove Calculated Fields in JOIN and WHERE Clauses
This can be done by creating a field with the calculated values used in the join on the table. See below,
FROM sales a JOIN budget b
ON ((year(a.sale_date) * 100) + month(a.sale_date)) = b.budget_year_month
Performance can be improved by adding a column with the year and month in the sales table. The updated SQL statement would be as follows,
FROM sales a JOIN budget b
ON a.sale_year_month = b.budget_year_month
Reason
When calculated fields are used, SQL must do an on-the-fly computing of the field before it can do the comparison even if the fields are indexed.
Summary
According to Pareto principle’s 80/20 rule, we put 20% effort here to discuss the minimum actions we need to do for SQL performance tuning SELECT statements, and hopefully to get 80% performance improvement; while in the next article, SQL Performance (2), Optimized SELECT Query (B), we will discuss the 80% effort to get another 20% performance improvement.