Performance optimization on SQL Server is difficult. A vast array of information already exists explaining how to address performance problems in general. However, there is not much information on the specifics, and there is even less information on how to apply that specific knowledge to your own environment.
In this, I will discuss 10 points that I think you should know about SQL Server performance, in two separate posts. Each item is a nugget of practical knowledge I got from Patrick O’Keeffe, and that can be immediately applied to your environment.
THE TAO OF INDEXES
SQL Server 2005 gives us some very useful new data on indexes.
dm_db_index_operational_stats
sys.dm_db_index_operational_stats contains information on current low-level I/O, locking, latching and access method activity for each index.
Use this DMV to answer the following questions: • Do I have a ‘hot’ index? Do I have an index on which there is contention? –
The row_lock_wait_in_ms/page_lock_wait_in_ms columns can tell us whether there have been waits on this index.
- Do I have an index that is being used inefficiently? Which indexes are currently I/O bottlenecks?
– The page_io_latch_wait_ms column can tell us whether there have been I/O waits while bringing index pages into the buffer cache, a good indicator that shows there is a scan access pattern.
- What sort of access patterns are in use?
The range_scan_count and singleton_lookup_count columns can tell us what sort of access patterns are used on a particular index.
THE MYSTERY OF THE BUFFER CACHE
The buffer cache is a large area of memory used by SQL Server to optimize physical I/O. No SQL Server query execution reads data directly off the disk. The database pages are read from the buffer cache. If the sought-after page is not in the buffer cache, a physical I/O request is queued. Then, the query waits and the page is fetched from the disk.
Changes made to the data on a page from a DELETE or an UPDATE operation are also made to pages in the buffer cache. These changes are later flushed out to the disk. This whole mechanism allows SQL Server to optimize physical I/O in several ways.
Multiple pages can be read and written in one I/O operation
- Read ahead can be implemented. SQL Server may notice that for certain types of operations, it could be useful to read sequential pages—the assumption being that right after you read the page requested, you will want to read the adjacent page There are two indicators of buffer cache health.
- MSSQL$Instance:Buffer Manager\Buffer cache hit ratio – This is the ratio of pages found in cache to pages not found in cache. Thus, the pages need to be read off disk. Ideally, you want this number to be as high as possible. It is possible to have a high hit ratio but still experience cache thrashing.
- MSSQL$Instance:Buffer Manager\Page Life Expectancy – This is the amount of time that SQL Server is keeping pages in the buffer cache before they are evicted. Microsoft says thata page life expectancy greater than five minutes is fine. If the life expectancy falls below this, it can be an indicator of memory pressure (not enough memory) or cache thrashing.
Cache thrashing is the term used when a large table or index scan is occurring. Every page in the scan must pass through the buffer cache. This is very inefficient because the cache is being used to hold pages that are not likely to be read again before they are evicted.
Since every page must pass through the cache, other pages need to be evicted to make room. A physical I/O cost is incurred because the page must be read off disk. Cache thrashing is usually an indication that large tables or indexes are being scanned. To find out which tables and indexes are taking up the most space in the buffer cache, you can examine the cache sys cache objects on SQL Server 2000 or sys.dm_os_buffer_descriptors on SQL Server 2005.
The example query below illustrates how to access the list of tables/indexes that are consuming space in the buffer cache on SQL Server 2005.
- select
- o.name,
- i.name,
- bd.*
- from
- sys.dm_os_buffer_descriptors bd inner join sys.allocation_units a on bd.allocation_unit_id = a.allocation_unit_id inner join sys.partitions p on (a.container_id = p.hobt_id and a.type in (1,3)) or (a.container_id = p.partition_id and a.type = 2 ) inner join sys.objects o on p.object_id = o.object_id inner join sys.indexes i on p.object_id = i.object_id and p.index_id = i.index_id
You can also use the new index DMVs to find out which tables/indexes have large amounts of physical I/O.
PLAN REUSE – RECYCLING for SQL
Before executing a SQL statement, SQL Server first creates a query plan. This defines the method SQL Server will use to satisfy the query. Creating a query plan requires significant CPU. Thus, SQL Server will run more efficiently if it can reuse query plans instead of creating a new one each time a SQL statement is executed.
There are some performance counters available in the SQL Statistics performance object that will tell you whether you are getting good plan reuse.
(Batch Requests/sec – SQL Compilations/sec) / Batch Requests/sec This formula tells you the ratio of batches submitted to compilations. You want thi number to be as small as possible. A 1:1 ratio means that every batch submitted is being compiled, and there is no plan reuse at all.
It’s not easy to pin down the exact workload that is responsible for poor plan reuse, because the problem usually lies in the client application code that is submitting queries.
You therefore may need to look at the client application code that is submitting queries. Is it using prepared parameterized statements? Using parameterized queries not only improves plan reuse and compilation overhead, but it also reduces the SQL injection attack risk involved with passing parameters via string concatenation.
Bad
Good
Figure 7 shows two code examples. Though they are contrived, they illustrate the difference between building a statement through string concatenation and using prepared statements with parameters.
SQL Server cannot reuse the plan from the ‘Bad’ example. If a parameter had been a string type, this function could be used to mount an SQL injection attack.
The ‘Good’ example is not susceptible to a SQL injection attack because a parameter is used, and SQL Server is able to reuse the plan..
THE HORROR OF CURSORS (AND OTHER BAD T-SQL)
There is a blog I read every day — http://www.thedailywtf.com (wtf stands for Worse Than Failure, of course). Readers post real experiences they had with bad organizations, processes, people and code. In it, I found this gem:
- DECLARE PatientConfirmRec CURSOR FOR
- SELECT ConfirmFlag
- FROM Patient where policyGUID = @PolicyGUID
- OPEN PatientConfirmRec
- FETCH NEXT FROM PatientConfirmRec
- WHILE @@FETCH_STATUS = 0
- BEGIN
- UPDATE Patient
- SET ConfirmFlag = ‘N’
- WHERE CURRENT OF PatientConfirmRec
- FETCH NEXT FROM PatientConfirmRec
- END
- CLOSE PatientConfirmRec
- DEALLOCATE PatientConfirmRec
This is real code in a real production system. It can actually be reduced to,
- UPDATE Patient SET ConfirmFlag = ‘N’
- WHERE PolicyGUID = @PolicyGUID
This refactored code of course will run much more efficiently, allow the optimizer to work its magic and take far less CPU time. In addition, it will be far easier to maintain. It’s important to schedule a code review of the T-SQL in your applications, both stored code and client side, and to try to refactor such nonsense.
Bad T-SQL can also appear as inefficient queries that do not use indexes, mostly because the index is incorrect or missing. It’s important to learn how to tune queries using query plans in SQL Server Management Studio. Figure 6 shows an example of a large query plan.
ZEN AND THE ART OF NEGOTIATING WITH YOUR SAN ADMINISTRATOR
Storage area networks (SANs) are fantastic. They offer the ability to provision and manage storage in a simple and easy way. Even though SANs can be configured for fast performance from a SQL Server perspective, they often aren’t. Organizations usually implement SANs for reasons such as storage consolidation and ease of management, not for performance. To make matters worse, generally you do not have direct control over how the provisioning is done on a SAN. Thus, you will often find that the SAN has been configured for one logical volume where you have to put all the data files.
Having all the files on a single volume is generally not a good idea if you want the best I/O performance. As an alternative, you will want to,
- Place log files on their own volume, separate from data files. Log files are almost exclusively written and not read. So you would want to configure for fast write performance
- Place tempdb on its own volume. tempdb is used for myriad purposes by SQL Server internally, so having it on its own I/O subsystem will help To further fine-tune performance, you will first need some stats. There are, of course, the Windows disk counters, which will give you a picture of what Windows thinks is happening (don’t forget to adjust raw numbers based on RAID configuration). Also, SAN vendors often have their own performance data available. SQL Server also has file level I/O information available in the form of a function fn_virtualfilestats. From this function, you can.
- Derive I/O rates for both reads and writes • Get I/O throughput
- Get average time per I/O
- Look at I/O wait times
Figure 5 shows the output of a query using this function ordered by IoStallMS, which is the amount of time users had to wait for I/O to complete on a file.
Using these numbers, you can quickly narrow down which files are responsible for consuming I/O bandwidth and ask questions such as,
- Is this I/O necessary? Am I missing an index?
- Is it one table or index in a file that is responsible? Can I put this index or table in another file on another volume?
I will update the next five things in the next post tomorrow
Hope this helps !!