Introduction
One of the keys to SQL Server performance is ensuring that you have the proper indexes on a table so that any queries written against this table can run efficiently. There are more articles written about designing indexes, choosing columns, etc for optimizing performance, so I will refrain from repeating most of what is written elsewhere. I have included a few resources at the end of this article for this topic.
However once you have built the indexes, there is still work to be done. As your data sets grow over time, SQL Server will continue to rebuild indexes and move data around as efficiently as possible. This happens in a number of ways, but the result is that you may need to perform maintenance on your indexes over time despite all of the automatic tools built into SQL Server. This article will discuss some of the issues with data growth over time as well as a technique to find tables in need of maintenance and how to perform this maintenance.
NOTE: I am including some features of SQL 2000 in this article, but the techniques are still useful in SQL v7. and v6.5 with minor modifications.
What happens over time?
If SQL Server includes auto statistic updating, a query optimizer that can learn to be more efficient with your queries, etc., why do we need to perform maintenance? Well, let's examine what happens over time.
When you build an index on a table (let's assume a clustered index), SQL Sever parcels the data across pages and extents. With v7.x and above, extents can be shared between objects (with v6.5 extents contain a single object). As a result, let's assume you create a table with rows that are < of a page in size. If you have 20 rows, then you have 5 pages worth of data. Is your data stored on 5 pages? Only if your FILLFACTOR is 100%. The fillfactor determines how much, percentage wise, your pages are filled. let's assume a FILLFACTOR of 50%, then you would have 10 pages of data allocated to this table. This is getting complicated quickly, but let's examine it a bit more.
If you assume that we expand this example over time, we may grow to 100 pages of data. These (at a minimum) require 7 extents if this object does not share any extents. Each page within the extents links to another page with a pointer. The next page in the chain, however, may not be in the same extent. Therefore as we read the pages, we may need to "switch" to another extent.
The simplest example is assume we take 3 consecutive pages of data in the following order:
Extent 1 Extent 2 Page n Page n + 1 Page n + 2
These are any three pages where page n links to page n+1 next, then to page
n+2 and so on. To read these three pages we read extent 1, then switch to extent
2, then switch back to extent 1. These "switches" do not necessarily
entail physical I/O, but all of these switches add up. They may not be a big deal on
your local server or even a lightly loaded server, a web application that has
hundreds or thousands of users could see a large performance impact from
repeated scans of this table. Why does the table end up looking like this? This
is how the table is designed to function over time. SQL Server will allocate
space for each row based on the space available at that time. As a result, while
a clustered index stores the data in physical order on a page, the pages may not
be in physical order. Instead each page has a linkage to the next page in the
sequence. Just as your hard disk can become fragmented over time as you delete
and insert files, the allocations of pages for a table can be fragmented over
time across extents as the data changes.
So why doesn't SQL Server just rebuild the indexes? I am
not sure if I would even want it to do so. I would hate for this to occur right
after a large web marketing campaign! Instead the engineers in Redmond have left
it up to the DBA to track this fragmentation and repair it as necessary. How do
we remove this fragmentation? Read on...
Running DBCC SHOWCONTIG
Prior to SQL Server 2000, you had to first get the object ID using the following
command
select object_id('<object name>')
For the user table, I ran
select object_id('user')
This returned me some long number (from sysobjects) that means nothing to me, but the
SQL team in Redmond must use this often and did not feel like including the join
in their code. I guess someone complained long and loud enough because in SQL
2000 you can use the name of the object in dbcc showcontig like this:
dbcc showcontig (user)
This produces the following statistics on your indexes:
DBCC SHOWCONTIG scanning 'User' table... Table:'User' (962102468); index ID: 1, database ID: 7 TABLE level scan performed. -Pages Scanned................................: 899 -Extents Scanned..............................: 121 -Extent Switches..............................: 897 -Avg. Pages per Extent........................: 7.4 -Scan Density [Best Count:Actual Count].......: 12.58% [113:898] -Logical Scan Fragmentation ..................: 99.89% -Extent Scan Fragmentation ...................: 80.99% -Avg. Bytes Free per Page.....................: 2606.5 -Avg. Page Density (full).....................: 67.80%
Let's decode this output:
Pages Scanned - Gives the # physical pages in the database scanned in this index. Not
really relevant, but gives you the total size occupied by this index ( each
page is 8k)
Extents scanned - An extent is 8 pages. So this should be pretty close to Pages Scanned
/ 8. In this example we have 121 extents which is 968 pages. Since the index is
only 899 pages, we have a number of shared extents. Not necessarily a bad thing,
but this gives you an idea that you are slightly fragmented. Of course, you do
not know how much physical fragmentation this is which can contribute to longer
query times. The minimum number for the 899 pages above would be 113. (899/8)
Extent Switches - # times the scan forced a switch from one extent to another. As this gets
close to # pages, you have pretty high fragmentation. . If you see number close
to # pages, then you may want to rebuild the index. See a Detailed Example.
Average Pages/Extent - Gives the math of Pages Scanned / Extents Scanned. Not of any great value
other than you don't have to run Calculator to get the number. Fully populated
extents would give a value of 8 here. I guess this is good for me
Scan Density [Best Count:Actual Count].......: 12.58% [113:898]
This is the tough one. This shows a percentage and 2 numbers separated by
a colon. I explain this as I missed it the first two times around. The
percentage is the result of dividing number 1 (113) by number 2 (898). So what
are the two numbers?
The first number is the ideal number of extent changes if everything was
linked in the a contiguous chain. The second number is the number of extents
moved through which is 1 more than the number of extent switches (by
definition). This is really another view of fragmentation. 100% would be minimal
(I hate to say zero) fragmentation. As you can see, this table is fairly
fragmented. The scan is constantly switching back and forth from one extent to
another instead of finding a link from one page to another within an extent.
Logical Scan Fragmentation ..................: 99.89%
The official definition from Books Online (v7.x and 2000 Beta 2) is:
"Percentage of out-of-order pages returned from scanning the leaf
pages of an index. This number is not relevant to heaps and text indexes. An out
of order page is one for which the next page indicated in an IAM is a different
page than the page pointed to by the next page pointer in the leaf page."
I am still not sure what this means. I have not gotten a good explanation
of this anywhere, so here is my best interpretation. This shows how many pages
(as a percentage) in the index which have a pointer to the next page that is
different than the pointer to the next page that is stored in the leaf (data)
page. This is only relevant for clustered indexes as the data (leaf pages)
should be physically in the order of the clustered index.
So how do you use this? If you figure it out, let me know. Since this number is
high for me and other items lead me to think this index is fragmented, I think
this is bad. So try for a low number in OLAP systems and a medium number in OLTP
systems.
Extent
Scan Fragmentation ...................: 80.99%
Again, here is the official BOL explanation (v7.x and 2000 Beta 2).
Percentage of out-of-order extents in scanning the leaf pages of an
index. This number is not relevant to heaps. An out-of-order extent is one for
which the extent containing the current page for an index is not physically the
next extent after the extent containing the previous page for an index.
This shows the percentage of pages where the next page in the index is
not physically located next to the current page. This tells me the I/O system
must move fairly often (80% of the time) when scanning the index to find the
next page. A Detailed Explanation is given below.
Avg. Bytes Free per Page.....................: 2606.5
This tells you (on average) how many bytes are free per page. Since a
page is 8096 bytes, it appears that I have on average, filled about 68% of the
pages. This can be good or bad. If this is an OLTP system with frequent inserts
to this table, then with more free space per page, there is less likely going to
be a page split when an insert occurs. You want to monitor this on tables with
heavy activity and periodically rebuild this index to spread out the data and
create free space on pages. Of course you do this during periods of low activity
(read as 3am) so that there is free space and page splits are minimal during
periods of high activity (when everyone can yell at you for a slow database).
Since this is an OLTP system, I am in good pretty shape.
If this were an OLAP system, then I would rather have this be closer to
zero since most of the activity would be read based and I would want the reads
to grab as much data as possible from each page (to reduce the time it takes to
read the index). As your OLAP table grows, this becomes more critical and can
impact (substantially) the query time for a query to complete.
(build test data of 10,000,000 rows and test index of 99% v 1%
fillfactor).
Avg. Page Density (full).....................: 67.80%
This gives the percentage based on the previous number (I calculated the
number above as 1 - (2606.5 / 8096) and rounded up.
So what does this all mean?
Well, to me this says I need to defragment this table. There are a large
number of extent switches that occur, each of which could potentially cause a
large I/O cost to queries using this table and index.
Defragmenting Indexes
In previous versions, and I guess in this one you can still rebuild the
clustered index which causes the server to read this clustered index and then
begin moving the data to new extents and pages which should start putting
everything back in physical order and reduce fragmentation. There is another
way:
In SQL 2000, the SQL developers added another DBCC option which is INDEXDEFRAG.
This can defragment both clustered and nonclustered indexes which (according to
BOL) should improve performance as the physical order will match the logical
order and (theoretically) reduce the I/O required by the server to scan the
index.
A couple of caveats about this: If your index spans files, then it defragments
each file separately and does NOT move pages between files. Not a good thing if
you have added a new filegroup and allowed objects to grow across files. If you need to move a table to a single filegroup, you need to move the clustered index.
A good thing that is way, way, way, extremely, absolutely, without-a-doubt long
overdue is the reporting of progress by DBCC INDEXDEFRAG as it works. Every 5
minutes this will report the estimated progress back to the user. Of course many
of us who have installed software with a feedback progress bar often wonder why
the bar moves quickly to 99% and remains there for 80% of the total install
time. So time will tell whether this is of any use, but I think some feedback is
better than none.
Another addition that is way, way, way, (you get the idea) overdue is the ability to stop the DBCC. I cannot tell you how many late nights I wished I could do this in v6.5. In fact I often held off on running DBCC until the latest possible time since I could not stop it once it started. (well, there was that O-N-O-F-F switch.)
Still one further addition, that ranks above the other two is that this is an online operation. Let me repeat that. Its an ONLINE operation. It does not hold locks on the table since it operates as a series of short transactions to move pages. It also operates more quickly than a rebuild of a new index and the time required is related to the amount of fragmentation for the object. Of course this means that you must have extensive log space if this is a large index. Something to keep in mind and watch the log growth when you run this to see how much space it eats up. Everything's a trade-off though.
Conclusion
Maintaining indexes still requires DBA intervention for optimal performance
for your database server. How often do you have to maintain these indexes? That
depends (like everything else in the RBDMS world) on your system. This is still
an art and requires some practice, testing, and careful notes over time.
While Microsoft continues to improve this process over time and automate some
of the work, it will probably still require some DBA intervention for the
foreseeable future. Practice with tuning indexes is an often overlooked
maintenance item, but one which will become more important to your users as
their numbers swell.
Good luck and as always, please include yout comments, questions, or suggestions at the bottom in the "Your Opionion" section.
References
Here are some of the references for Indexing on the Internet and in publication. I have used
these in the past, though not necessarily for this article. These are the ones I recommend.
- Professional
SQL Server 7 Programming
by Robert Vieira. Chapter 9 devoted to indexes. - The Gurus Guide To Transact SQL by Ken Henderson. See my
book review page for more information.
- Index Optimizaton
Tips
by Alexander Chigrik on Swynk. A concise list of tips for building good indexes. - Planning and Creating
Indexes
from Microsoft SQL Server v7.0 Database Implementation Training Kit. Thisis a good explanation of how indexes are structured in the server.
- Performance Tuning
Guide
from Microsoft. Everyone should read this at leastonce and refer to it whenever you are trying to improve performance.
- Most SQL Server books include basic information in indexes which is useful IF you read it.
Indexing is definitely an art and you have to practice it to get better.
Return to Steve Jones' Home