When running workshops on the subject of SQL Query Performance the first question I always ask is how do we measure the performance of a query? Or to look at it another way – how do we measure the resources it consumes?
Attendees have come up with a range of answers from “With a stopwatch” (which I like a lot) to the slightly more technical “Using Profiler”. I like the first answer just because that’s where we all start. We run something from an application and we literally time how long it takes, or we run something in SSMS and we use the counter near the bottom right of our query window to tell us how long it took. We may even run something 100 times in a loop and capture the overall time so we can take an average.
SQL Server Profiler is also good when trying to capture what is actually happening in a given environment.
But there is a better way…
Using STATISTICS commands
When we’re working on a query in SSMS there is an easier, more accurate, and more detailed approach than any of the others available, which is to use the following two commands:
SET STATISTICS TIME ON;
This command tells SSMS to capture the consumed CPU time and elapsed time related to any queries you subsequently run in that query window. The output is sent to the Messages tab.
SET STATISTICS IO ON;
This command tell SSMS to capture the amount of data read from disk or memory by any subsequent queries in that window. Again the output is sent to the Messages tab.
Let’s look at some quick examples. For these I’ll use the StackOverflow database (here’s a post from Brent Ozar on how you can download this database for your own use https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/):
SET STATISTICS TIME ON;
SELECT COUNT(*) FROM dbo.Users;
Output:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 422 ms, elapsed time = 2296 ms.
You can see we now have a fairly accurate measure of how long that query took to run (and how long to parse and compile). The CPU time is how much CPU is used, the elapsed time is how much time the query took overall. You’ll see the elapsed time is much longer the the CPU time, this generally means SQL is waiting for something the rest of the time, perhaps for data to be read from disk or memory, or perhaps something else (like locking and blocking).
Now let’s look at STATISTICS IO for the same query.
SET STATISTICS IO ON;
SELECT COUNT(*) FROM dbo.Users;
Output:
Table ‘Users’. Scan count 5, logical reads 24914, physical reads 1, read-ahead reads 24770, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
You’ll see that there’s a bit more info here, let’s run through each piece in order:
- Table – First we have the name of the table the reads are being generated from – when we have a bunch of tables referenced this can help us to work out where to focus our tuning efforts.
- Scan Count – In simple terms this is the amount of times the table was accessed. This isn’t always a massively useful metric. In can also be Zero in some specifics scenarios such as searching for one specific value in a clustered index.
- Logical Reads – this is the number of data “pages” read from memory. A data page is 8KB in size, so this query has read about 200MB from the User table (which is basically the whole table).
- Physical Reads – This is the number of data pages read from disk.
- Read-Ahead Reads – These are also physical reads. In this case pages read from disk, but in theory read ahead of time e.g. while the processor is doing something else and doesn’t need them yet. In practice SQL has started to read these before it needs them, but is still reading them as the query is consuming them so we do end up waiting for the disk to supply most of the data. If we look back at the CPU time and elapsed time above, it is a pretty reasonable assumption that the time taken up waiting for these physical reads accounts for much of the difference between the CPU time and elapsed time.
- LOB – The columns are repeated for LOB data, which is “Large Object” data such as varbinary(max) columns and other such “blobs”. These get stored outside of the main data page. We’re not going to worry about these too much for the moment.
Physical vs Logical Reads
It is worth knowing that SQL (in general) will try to retain data it has read in memory for as long as possible. This is because Logical Reads (from memory) are much much faster than Physical Reads (from disk). You may have heard people saying that SQL Server is very aggressive about using memory, or heard complaints like “SQL is consuming all the memory on the server”. This is why. SQL want to do as much querying as possible from memory and save on those expensive physical reads. We allocate a maximum amount of memory to SQL Server and over time it will use as much of that allowance as it can. This is also one of the reasons why querying slows down after you restart SQL Server, all your queries are going to go back to reading from disk until SQL has that data in memory once more.
It’s also good to understand that in the numbers above SQL is not reading one set of data from disk and another set from memory. Rather it is a two stage process, required data is read from disk into memory (the physical/read-ahead reads) and then the whole set is read from memory (the logical reads). So the number of logical reads includes the number of pages read first from disk.
We can see that effect if we run the query again. This time I’ll include both commands:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
SELECT COUNT(*) FROM dbo.Users;
Output:
Table ‘Users’. Scan count 5, logical reads 24982, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 546 ms, elapsed time = 164 ms.
You’ll see our physical reads ( including read-ahead reads) have now dropped to zero. This is because we’ve already read this data from disk the first time we ran the query and SQL has retained the data in memory. Logical reads have stayed about the same. You’ll hopefully also see that our CPU time is reasonably similar at around 500ms, but our elapsed time has dropped dramatically as we no longer had to wait for slow reads from disk.
You might also notice that the elapsed time is now actually shorter than the CPU time which can be a bit confusing. The query has consumed 546ms of CPU time, but only took 164ms from beginning to end. How than this be? The answer is that our query has gone parallel over several CPU cores. Counting all the records in a large table is actually a pretty resource hungry operation as you can see, so SQL has decided to save time by breaking the task into pieces and running those sub tasks concurrently.
I should probably add a caveat here in that I have actually seen STATISTICS TIME return results where the elapsed time is marginally shorter than the CPU time even when the query is single-threaded. I queried this with a performance tuning specialist from Microsoft who I was working with at one of our clients and he admitted that these figures weren’t always 100% accurate – they are however the most accurate you’ll get.
Flushing Data From Memory
You might sometimes want to see how your query would perform if it’s having to read all its data from disk. This might be more appropriate to how it is likely to be working in the wild. In that case you can run the following command which will flush all the data from memory (so don’t run it on a production environment!).
DBCC DROPCLEANBUFFERS;
If you execute this and then were to re-run the above query and capture the IO you’d see that the physical/read-ahead reads are back.
In SQL Server the area of memory where data is stored is referred to as the Buffer Cache. Within that Cache we have clean pages where the data in memory is the same as the data on disk, and dirty pages which are pages that contain modifications that have yet to be written to disk. The above command only flushes (or drops) the clean pages (or buffers). Just in case you were wondering why it was called DROPCLEANBUFFERS…
A Slightly More Complex Example
The above example with just the one table should make it clear what these commands do, but let’s look at a query that goes across a couple of tables and does a bit more work so you can see the sort of output you’ll get:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
SELECT TOP 10
u.Id,
u.DisplayName,
COUNT(p.Id) AS NumberOfPosts
FROM dbo.Users u
INNER JOIN dbo.Posts p
ON u.Id = p.OwnerUserId
GROUP BY u.Id, u.DisplayName
ORDER BY NumberOfPosts DESC;
Output:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 54 ms.
(10 row(s) affected)
Table ‘Posts’. Scan count 5, logical reads 51352, physical reads 1, read-ahead reads 51192, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Users’. Scan count 0, logical reads 94, physical reads 45, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 336, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 8002 ms, elapsed time = 5597 ms.
You can see here that the IO for both tables referenced in the query has been captured. This is very useful in telling us where the bulk of the work is being done.
Notable here also are the references to “Worktable” and “Workfile”. These often appear in the output to STATISTICS IO for more complex queries. In general it is worth thinking of these as internal objects SQL uses to store temporary resultsets while processing your query. The difference between the two is the type of resultsets they store and for which particular operations. This can make it a bit more challenging to work out which part of your query is generating the most work when the highest number of reads appear against these. You end up needing to look at the execution plan and trying to work out what operations these could be related to. We may get to more of that in a later post.
Using this information to Tune your Queries
To close, I just want to mention how you should use this information to focus your efforts when tuning queries. Intuitively we all think in terms of time when we think in terms of query performance, but in practice it is much better to focus on the output of STATISTICS IO when you are trying to make improvements.
The IO gives you much more information, and while figures for both CPU and IO will vary depending on what is going on on the server, the IO figures are more static and therefore you can have more confidence that changes you see are the results of your tuning efforts. It also allows you to focus on tables one at a time as you tune multi-table queries.
Both IO and TIME are measures of work being done on a server. If you can reduce the number of reads generated by your query then that is generally the best measure of improvement and will likely have similar benefits to the overall time taken by the query.
So in your tuning efforts, focus on the logical and physical reads and try to get the totals for these down as low as possible. Generally you’ll start with operations on the table that generates the highest number of reads and go on from there.