SQL Performance (1), Optimized SELECT Query (A)

George
2y
11.6k
0
4

Article

This series of articles will discuss SQL server performance. In the title, I use SQL Performance because the articles could include SQL statement (Query) optimization and also include SQL Server performance issues, and probably include stored procedure performance issues.

SQL Performance (0), Summary
SQL Performance (1), Optimized SELECT Query (A) - this article
SQL Performance (2), Optimized SELECT Query (B)
SQL Performance (3), Optimized Data Modifying Query
SQL Performance (4), Tune the SQL Server Database

Introduction

Performance tuning SELECT statements can be a time-consuming task that follows the Pareto principle’s 80/20 rule: 20% effort is likely to give you an 80% performance improvement. To get another 20% performance improvement you probably need to spend 80% of the time.

In this article, we will emphasize the 20% effort with 80% benefits to improve performance before we move to query plans and some complicated or detailed issues that I leave to the second article, SQL Performance (2), Optimized SELECT Query (B). In this way, it will be easier for both me and the readers to follow.

So, this article will be the easy part, but the important part for tuning the SQL query to improve the performance.

The Causes affecting query run time

Obviously, the factors affecting query run time are the size of the query return and the complexity of the query,

Size
The larger the table (more columns), the longer the records (more rows), and the worse the query performance will be.
Complexity
The more complex the query, the worse the query performance will be. Some major factors for query complexity as,
- Joins
  If your joins substantially increase the row count of the result set, your query is likely to be slow.
- Aggregations
  Combining multiple rows to produce a result requires more computation than simply retrieving those rows.

The Major Improvement for Tuning Query

Obviously reducing the size and avoiding the complexities of the query will be the solutions,

Reduce Size. Limit the Size of the query Set
- Limit Columns: Only Select Fields You Need
- Limit Rows: Select Reasonable Record length
- Remove Unnecessary Tables in the query
- Remove OUTER JOINS
Enforce Indexes to Improve Performance
- Make indexes on all fields used in the WHERE and JOIN portions of the SQL statement
- Remove Calculated Fields in JOIN and WHERE Clauses
Reduce Complexity. will be discussed in detail in SQL Performance (2), Optimized SELECT Query (B)
- Run your query during off-peak hours
  - Selecting from large tables (>1,000,000 records)
  - Cartesian Joins or CROSS JOINs
  - Looping statements
  - SELECT DISTINCT statements
  - Nested subqueries
  - Wildcard searches in long text or memo fields
  - Multiple schema queries

We will discuss the 20% effort to improve the performance here and leave the 80% work in the next article, SQL Performance (2), Optimized SELECT Query (B).

1. Limit the Size of the query Set

a. Limit Columns. Only Select the Fields You Need

SELECT fields instead of using SELECT *.

Do not use,

SELECT *    
FROM Customers

Use,

SELECT FirstName, LastName, Address, City, State, Zip    
FROM Customers

Assume the Customers table has 10 columns with 1 million rows, if the second query used 2 seconds including the network traffic to get UI, the first one probably will cost double the time.

b. Limit Rows: Select reasonable Record Length

For very long return records, say, thousands of lines, such as,

SELECT Product, Price, Amount, Total    
FROM Orders

you should get a reasonable length by either adding a restricting condition, such as,

SELECT Product, Price, Amount, Total      
FROM Orders   
WHERE OrderDate  >= '2014-03-01'  AND OrderDate <  '2014-04-01'

or by using a LIMIT statement,

SELECT Product, Price, Amount, Total          
FROM Orders      
LIMIT 1000

TOP in SQL Server,

SELECT TOP 1000 Product, Price, Amount, Total            
FROM Orders

c. Remove Unnecessary Tables in the query

During development, it is possible that you add tables to the query that may not finally have any impact on the data returned. Removing the JOINS to these unnecessary tables reduce the time of processing the extra data, such as,

SELECT c.FirstName, c.LastName, c.Address, c.City, c.State, c.Zip      
FROM Customers c INNER JOIN Orders o  
ON c.ID = o.CustomerID

The Orders table finally is not used at all, that should be eliminated from the Query.

2. Enforce Indexes

a. Make indexes on all fields used in the WHERE and JOIN portions of the SQL statement

There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.

Reason

When fields are not indexed, SQL Server will typically do a full table scan and this may reduce performance. Unless the table is very small, a table scan tends to yield the worst performance out of all the types of database reads.

b. Remove Calculated Fields in JOIN and WHERE Clauses

This can be done by creating a field with the calculated values used in the join on the table. See below,

FROM sales a JOIN budget b
ON ((year(a.sale_date) * 100) + month(a.sale_date)) = b.budget_year_month

Performance can be improved by adding a column with the year and month in the sales table. The updated SQL statement would be as follows,

FROM sales a JOIN budget b
ON a.sale_year_month = b.budget_year_month

Reason

When calculated fields are used, SQL must do an on-the-fly computing of the field before it can do the comparison even if the fields are indexed.

Summary

According to Pareto principle’s 80/20 rule, we put 20% effort here to discuss the minimum actions we need to do for SQL performance tuning SELECT statements, and hopefully to get 80% performance improvement; while in the next article, SQL Performance (2), Optimized SELECT Query (B), we will discuss the 80% effort to get another 20% performance improvement.