Recently, I read the articles Stop Worrying About SQL Server Fragmentation and When Does Index Fragmentation Matter?, and that got me starting to think about how index fragmentation in Azure SQL Database compares to index fragmentation in on-premise versions of SQL Server. I also started to wonder, "Does Fragmentation matter in SQL Azure"? I decided to compare index fragmentation in Azure SQL Database vs an on-premise SQL Server using the same techniques used in the articles just mentioned.
Method of Comparison
In order to compare fragmentation, I used the script written by Jeremiah Peschka in the article When Does Index Fragmentation Matter? with a few modifications. This script basically does the following:
- Creates a table with a Primary Key and a non-clustered index
- Inserts 100000 records into the table
- Deletes 70% of the records in order to fragment the non-clustered index
- Rebuilds the indexes
If you want to get more detail as to how the script works, read Jerimiah's excellent article. Along the way, the size of the indexes is shown in pages and KB to show how each of these operations affect the size of the indexes. In order for the script to run in SQL Azure, I modified the method used to create random numbers (because master.dbo.spt_values does not exist in Azure SQL Database). I also increased the number of records that were inserted in order to have a better picture of the changes in size involved. The entire script (with my amendments) is shown below. There is a link to download the script at the end of the article.
For SQL Server, I used SQL Server 2012 Standard. For Azure SQL Database, I experimented on a variety of different Service Tiers. All of the results of the Fragmentation script came out the same, although the insert of 100000 records did happen faster at higher Service Tiers due to the increase in Database Throughput Units (DTU's).
Results of Running the Tests - SQL Server Standard
- Baseline Results - This result shows the size of the indexes in pages and KB on the truncated table. As expected, both are 0.
- 100000 Records Inserted - This result shows that there were 100000 records inserted into the table.
- 100000 record index size - This result shows the size of the index after inserting the 100000 records.
- 70% Deleted - This result shows the size of the index after deleting 70% of the records and fragmenting the non-clustered index.
- After Defragmentation - This result shows the size of the index after defragmentation.
- Baseline Results - This is the same. Empty databases have empty indexes.
- 100000 Records Inserted - Same. 100000 rows inserted.
- 100000 record Index Size - The thing to notice here is that the size of both the non-clustered and clustered indexes are much larger in Azure than the size of indexes in SQL Server Standard. The size of the non-clustered index is 129% larger while the clustered index is 37% larger than in SQL Server Standard.
- 70% Deleted - This result is interesting as well in that it shows that the non-clustered index dramatically grows in size as it gets fragmented. This result is completely at odds with the result from SQL Server Standard where the indexes remain the same size. It also doesn't make a lot of sense. If you think about the analogy of indexes as phone books (for those of you old enough to remember phone books), deleting the records should be removing the pages from the phone book. The size of the phone book shouldn't grow because we have removed pages. The other interesting thing here is that the size of the fragmented index is now 347% larger than in SQL Server Standard.
- After Defragmentation - Once the Azure non-clustered index has been defragmented, it is only 14% larger than the non-clustered index in SQL Server Standard.
Does Fragmentation matter in SQL Azure?
Pretty interesting results, but is it time to panic? I don't think so, but I am going to sit on the fence on this one. Realistically, your can pretty much assume that you will not have sequential access to the disk for your database in Azure SQL Database so having a fully defragmented index is not going to improve the performance of your application greatly. That said, it is definitely concerning that the index grew by 347% as it became fragmented, while in SQL Server Standard, the index remained the same size. Personally, I'm going to keep an eye on things on my production databases and defragment where I think its necessary. I would be interested if anyone has any theories as to why this is happening or how others are handling the problem.