September 5, 2013 at 8:21 pm
KTD (9/4/2013)
However the new server creates massive IO and doesn't make use of some indexes that it should be using.
The first thing to say is that your index on RecordDate does not, in fact, support ordering by RecordDate. This might seem counter-intuitive, but the index is partitioned by ClaimID (just like the base table is).
The nonclustered index records are in RecordDate key order, but only within each partition. Effectively, the nonclustered index key is (PartitionID, RecordDate) because of the partitioning.
This explains why you see an expensive sort on 194,656,980 rows even on SQL Server 2005. The partitioned nonclustered index cannot return rows in RecordDate order when more than one partition is involved, so SQL Server has to perform a full sort.
The situation is logically the same in SQL Server 2005 and SQL Server 2008 R2, but the physical implementation is different. SQL Server 2005 used an APPLY implementation model; it processes one partition at a time in the query plan. The Constant Scan you see in the SQL Server 2005 plan is a list of partition IDs. The Index Scan operation on the inner side of the Nested Loops Join following the Constant Scan operates on a single partition at a time, with the current partition ID as the correlated join parameter.
One problem with the 2005 implementation was that it did not always distribute work across parallel threads very well. You can see an example of this in the plan you submitted using the @random value. For that particular test run, all the qualifying rows happened to be in a single partition. If you look at the per-thread actual rows, you will see that one thread did all the work. The plan is effectively a serial plan, just with some extra overhead for the additional unused threads and the parallelism operators!
The physical implementation for partitioned tables completely changed in SQL Server 2008. The APPLY model (Constant Scan + Nested Loops Join, per-partition processing) is no longer used. Instead, the effective index keys (PartitionID, RecordDate) are processed directly, exactly as if that were the real index definiton. You can find more details about the implementation and "enhancements" for SQL Server 2008 in the following TechNet article (please do read this):
Query Processing Enhancements on Partitioned Tables and Indexes
Unfortunately, dramatic changes to query processing always mean significant query plan changes. SQL Server 2008 (and all later versions) can no longer use the same query plans for partitioned tables that SQL Server 2005 did, because the internal implementation is so radically different.
You will see all-new query plans for partitioned table queries when upgrading from SQL Server 2005. Many of these changes will result in improved performance, but some will not, and some will even be much worse. This means that real, detailed testing is needed before upgrading from 2005. As a result of that, you will end up rewriting some partitioned queries to produce good query plans.
The particular case here is one example of a performance regression. Without an index hint, SQL Server seeks to the partitions and ClaimIDs using the clustered index and then sorts by RecordDate. The clustered index is much, much, wider than the nonclustered index, so you see a huge amount of I/O for this compared with the 2005 plan.
When the nonclustered index is forced, an unfortunate side-effect results in two sorts being needed. As Erland mentioned, a general-purpose index optimization pre-sorts rows on ClaimID keys for the lookup into the clustered index. The set is then re-sorted on RecordDate later on.
This is a performance regression, and undoubtedly unfortunate, but the Top operation cannot be performed earlier because the rows are not in RecordDate order before the explicit sort (they are in PartitionID, RecordDate order). The SQL Server 2005 plan just decided to sort on RecordDate alone, and take the expected random-I/O hit when performing lookups into the base table.
One way to rewrite the test query for better performance is:
WITH QualifyingRows AS
(
SELECT TOP (10000)
C.ClaimID,
C.RecordDate
FROM dbo.Claims AS C
WHERE
C.ClaimID > 100781325
ORDER BY
C.RecordDate ASC
)
SELECT C.*
FROM dbo.Claims AS C
WHERE
EXISTS
(
SELECT 1
FROM QualifyingRows
WHERE
QualifyingRows.ClaimID = C.ClaimID
AND QualifyingRows.RecordDate = C.RecordDate
)
ORDER BY
C.RecordDate;
The query plan produced uses a single Top-N Sort on RecordDate, while still making effective use a parallelism:
You are very likely to encounter similar query plan issues with some types of aggregate query:
http://sqlmag.com/t-sql/max-and-min-aggregates-against-partitioned-tables
September 6, 2013 at 1:05 am
Paul White (9/5/2013)
When the nonclustered index is forced, an unfortunate side-effect results in two sorts being needed. As Erland mentioned, a general-purpose index optimization pre-sorts rows on ClaimID keys for the lookup into the clustered index. The set is then re-sorted on RecordDate later on.
I am stunned... Thanks Paul for stepping in and providing a great answer. I should have realised that it could not use the index directly because of the partitioning, but a question: shouldn't the optimizer be able to come with the same plan as for your rewritten query? That is, it could first to a TOP N sort on the small set, and then sort for the key lookup?
I noticed that in the SQL 2005 plan it does something like this, but the first sort produces some 15000 rows and not 10000 rows, which I guess because there are rows with the same RecordDate.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
September 6, 2013 at 3:22 am
Erland Sommarskog (9/6/2013)
a question: shouldn't the optimizer be able to come with the same plan as for your rewritten query? That is, it could first to a TOP N sort on the small set, and then sort for the key lookup?
Yes, it could, and I would argue it should. Conor would say the optimizer can't consider every possible transformation, there haven't been enough customer requests for it, etc. etc.
More to the point, MIN and MAX are hard work on 2008+ partitioned indexes too, and that is an even more obvious optimization that is still missing.
By the by, the idea of manual rewrites to select keys first and then perform a small number of expensive lookups is a good general technique. I used the same basic idea for my SQL Server Central paging article:
http://www.sqlservercentral.com/articles/paging/69892/
I noticed that in the SQL 2005 plan it does something like this, but the first sort produces some 15000 rows and not 10000 rows, which I guess because there are rows with the same RecordDate.
The extra rows are due to nested loops prefetching, something I blogged about just a few days ago:
September 6, 2013 at 6:19 am
Wow I am so impressed. This is of great help for me in knowing what is going on. I really need to read this forum more I have learned a great deal just searching for information on this issue. I thank you both so much for taking the time to help.
I don't always test my SQL scripts, but when I do, I test in Production.
September 6, 2013 at 6:39 am
Your version of the query ran in 30 seconds without any high disk latencies observed.
I don't always test my SQL scripts, but when I do, I test in Production.
September 6, 2013 at 8:42 am
KTD (9/6/2013)
Your version of the query ran in 30 seconds without any high disk latencies observed.
Good to know, thanks. Just for a bit of fun (and because other people might wonder) you might find the following performs even better:
SELECT Result.*
FROM
(
-- Top 10,000 keys over all partitions
SELECT TOP (10000)
AllPartitions.ClaimID,
AllPartitions.RecordDate
FROM
(
-- Top 10,000 keys per partition
SELECT
TopPerPartition.ClaimID,
TopPerPartition.RecordDate
FROM
(
-- Partition ID list
VALUES
(01),(02),(03),(04),(05),(06),(07),(08),(09),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),
(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),
(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),
(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),
(51)
) AS PartitionIDs (ID)
CROSS APPLY
(
-- Per-partition query
SELECT TOP (10000)
C.ClaimID,
C.RecordDate
FROM dbo.Claims AS C
WHERE $PARTITION.PF(C.ClaimID) = PartitionIDs.ID
AND PartitionIDs.ID >= $PARTITION.PF(100781325)
AND C.ClaimID > 100781325
ORDER BY C.RecordDate ASC
) AS TopPerPartition
) AS AllPartitions
ORDER BY
AllPartitions.RecordDate ASC
) AS TopKeys
CROSS APPLY
(
-- Add extra columns
SELECT C.*
FROM dbo.Claims AS C
WHERE C.ClaimID = TopKeys.ClaimID
AND C.RecordDate = TopKeys.RecordDate
) AS Result
ORDER BY
Result.RecordDate ASC;
September 30, 2013 at 12:14 pm
Great Insights, Paul.
- See you at your PreCon at PASS Summit
😉
Andreas
---------------------------------------------------
MVP SQL Server
Microsoft Certified Master SQL Server 2008
Microsoft Certified Solutions Master Data Platform, SQL Server 2012
www.insidesql.org/blogs/andreaswolter
www.andreas-wolter.com
September 30, 2013 at 7:04 pm
Andreas.Wolter (9/30/2013)
Great Insights, Paul.- See you at your PreCon at PASS Summit 😉
Thanks. See you there - make sure you introduce yourself to me at some stage 🙂
October 1, 2013 at 3:59 am
Paul White (9/5/2013)
As Erland mentioned, a general-purpose index optimization pre-sorts rows on ClaimID keys for the lookup into the clustered index.
Paul,
Thanks for this detailed informatrion . i read it almost 5 times to grab as much as i can but . i have one confusion on above quoted text. Why the clustered index data will be sorted here as clustered index is sorted by nature.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
October 1, 2013 at 5:17 am
Bhuvnesh (10/1/2013)
Paul White (9/5/2013)
As Erland mentioned, a general-purpose index optimization pre-sorts rows on ClaimID keys for the lookup into the clustered index.Paul,
Thanks for this detailed informatrion . i read it almost 5 times to grab as much as i can but . i have one confusion on above quoted text. Why the clustered index data will be sorted here as clustered index is sorted by nature.
The rows arriving at the Key Lookup (a singleton seek into the clustered index) will not be in clustered index order without the sort. Sorting the lookups into clustered key order promotes a sequential access I/O pattern. See the following link for more information:
http://blogs.msdn.com/b/craigfr/archive/2009/02/25/optimizing-i-o-performance-by-sorting-part-1.aspx
October 1, 2013 at 11:30 am
To provide some followup on what is going on with this. I removed the table from the partitioning and test the query with much better results. In fact it seems as though there is an all around performance gain. I think this comes down to the way the queries were written and the choice of a partitioning column.
I don't always test my SQL scripts, but when I do, I test in Production.
October 4, 2013 at 8:56 am
Well it sounds like you've been "around the block a few times" on these types of issues.
I've taken over the responsibility of 20+ SQL Server boxes and have found a variety of issues/problems that would affect overall performance.
Things I look at are:
1) Are the Data, Logs & TEMPDB on RAID 1/0? or 5? and on separate LUNs?
2) Is tempdb split out into 1 file per processor? or something close to it?
3) I tend to throttle back memory usage for SQL Server allowing room for the OS. However with such a large machine I'm not sure you really need to do that.
4) You say your indexes & stats are up to date, it may benefit you to update them for the 1 table just to be sure.
I'd also have the boys in Network Operations double check the IO saturation levels and RAIDs of your LUNs. There is some magic that can be performed on the LUNs by introducing more cache as well as SSDs to increase their performance. There is no reason why things should perform poorly if all of the pieces are correctly configured.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
October 9, 2013 at 6:06 am
Kurt W. Zimmerman (10/4/2013)
Things I look at are:1) Are the Data, Logs & TEMPDB on RAID 1/0? or 5? and on separate LUNs?
Which are you looking for? RAID 10 or RAID 5?
...if you can comment on my question about separate disks for data / logs / tempdb in this forum, I'd appreciate it. I'm trying to learn if there is an advantage to having separate disk arrarys on the same RAID controller as opposed to aggregating the I/O across a larger array.
October 9, 2013 at 8:03 am
inevercheckthis2002 (10/9/2013)
Kurt W. Zimmerman (10/4/2013)
Things I look at are:1) Are the Data, Logs & TEMPDB on RAID 1/0? or 5? and on separate LUNs?
Which are you looking for? RAID 10 or RAID 5?
...if you can comment on my question about separate disks for data / logs / tempdb in this forum, I'd appreciate it. I'm trying to learn if there is an advantage to having separate disk arrarys on the same RAID controller as opposed to aggregating the I/O across a larger array.
I've adopted my own best practice for configuring a SQL Server box for the best performance. These best practices are a culmination of various white papers along with knowledgeable SQL Server experts.
1) Disk storage is going to be one of the crucial areas of concern for overall performance. I recommend that all Data, Logs and TempDB are on RAID 1/0 to start.
2) System LUN can reside on a RAID 5.
3) In a virtual environment it is quite possible to allocate solid state drives to a LUN. If that were the case then moving TempDB to that LUN is ideal.
4) System, Data, Logs & TempDB should be on their separate LUNs.
5) Depending on the level of transactions and sizes of databases, the more memory you can add to a machine the better. Starting with SQL Server 2008 R2 I've been throttling SQL Server Max memory usage to about 2-3 gb of memory less than the total machine memory. This gives the OS some breathing room to run.
6) If you have 16 or less CPU cores in the machine then allocate 1 Tempdb data file per core. Greater than 16 cores then again it all depends on the volume of data passing into/out of the database(s). I'd consider 1/core up to 32 cores, otherwise cores/2 or cores/3 depending on the number of cores.
7) I always pre-allocate each TempDB file (both data & Log). There is a best practice on setting up TempDB worth googling for and reading. Typically I choose a size based on how much room I have for the TempDB data Drive and would allocate about 70% of the drive to TempDB. For the TempDB Log file I will make it double the size of a single TempDB data file. So if I'm making 2gb files at 16 files I'd make a 4gb log file. (I tend to over provision TempDB, but I have never had any kind of latency due to problems with TempDB).
8) Finally I try to determine how much is going on with the database(s) that are going to reside on the server. If there are going to be a large volume of transactions I guesstimate growth and pre-allocate the data & log files.
This should be a good starting point. I'm sure there are others that may feel differently than what I've suggested here. However I have never had any issues with all of the machines that I've provisioned. I can't say the same for machines that were set up before I started.
Machines that I've inherited often run into troubles at some point in the future. Case in point, one box I inherited was one of the largest boxes I've ever managed. However there were some major latency issues that no one could resolve. I found that the person who configured the server did not properly allocate TempDB, which I have found to be a very common problem. After TempDB was reconfigured the machine has been performing quite well with no issues. Performance is where it was to be expected with such a large box. In general after I've applied my "touch" to problematic machines I've never had any issues with them.
So I feel this is a good standard to live by.
All the best.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
October 9, 2013 at 11:40 am
Kurt W. Zimmerman (10/9/2013)
4) System, Data, Logs & TempDB should be on their separate LUNs.
With sincere apologies the thread hijacking... I'm trying my best to gain insight to the best use of the hardware at hand. It will be in use next week and I'm trying to get it right from the start. I'd follow best practices and recommedations whenever I am able, but sometimes there are limited resources.
I have 16 disks - direct attached storage, one RAID controller. The OS is already on a 2 disk RAID1.
Which is best?
A.) One RAID 10 array across 14 disks, giving me the best I/O from an IOPS perspective. Install data, logs, tempdb - could be separate partitions, if that matters.
B.)One RAID 10 array across 12 disks for Data and Logs (could be separate partitions). One RAID1 (2 disks) for tempdb files, sized as you describe above.
C.) One RAID 10 array across 10 disks for Data. One RAID 1 (2 disks) for logs (simple recovery model). One RAID1 (2 disks) for tempdb files.
It's the same controller and the same disks - wouldn't option A be best?
Viewing 15 posts - 16 through 30 (of 35 total)
You must be logged in to reply to this topic. Login to reply