September 17, 2010 at 9:22 am
/* setup */
CREATE TABLE X (
HistoryID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
ObjID INT NOT NULL,
Date1 DATETIME,
Date2 DATETIME,
Info1 CHAR(400),
Info2 CHAR(400),
Int1 INT,
Int2 INT,
Int3 INT
);
/* create test data */
DECLARE @i INT; SET @i = 0;
DECLARE @objid INT; SET @objid = 0;
WHILE (@i < 4000000)
BEGIN
INSERT INTO X VALUES(@objid, GETDATE(), CURRENT_TIMESTAMP, 'Info1','Info2',1,2,3);
SET @i = @i + 1;
SET @objid = @objid + 1
IF (@objid > 5000) SET @objid = 0;
END;
CREATE INDEX X_IX1 ON X (ObjID);
/* end setup */
/* timed query */
DBCC DROPCLEANBUFFERS;
WITH maxed AS
(
SELECT
MAX(HistoryID) AS MaxID
FROM X
GROUP BY ObjID
)
SELECT
X.*
FROM X
INNER JOIN maxed X2
ON X.HistoryID = X2.MaxID;
/* end timed query */
The code from and including DBCC onwards runs in two seconds on a laptop class machine, sql 2005 x64 dev edition sp3. In certain situations I've found cte's helpful. Without it your original query does take a longer time.
hth
September 17, 2010 at 1:35 pm
This question has turned into an excellent learning experience for me. I took the opportunity to run the new queries against Paul White's optimized schema on both my laptop and the server using the sample data. The times were calculated, as before, using SET STATISTICS TIME ON.
SELECT
X1.*
FROM X X1
WHERE HistoryID = (
SELECT
MAX(HistoryID) AS MaxID
FROM X X2
WHERE X2.ObjID = X1.ObjID
GROUP BY ObjID
)
Laptop:
Table 'Y'. Scan count 1, logical reads 32780, physical reads 6, read-ahead reads 10779, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1607 ms, elapsed time = 24411 ms.
Server:
Table 'Y'. Scan count 1, logical reads 33065, physical reads 5, read-ahead reads 10779, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1235 ms, elapsed time = 8044 ms.
SELECT
X1.*
FROM X X1
WHERE HistoryID = (
SELECT
MAX(HistoryID) AS MaxID
FROM X X2
WHERE X2.ObjID = X1.ObjID
GROUP BY ObjID
)
OPTION (FAST 100)
Laptop:
Table 'Y'. Scan count 1, logical reads 160692, physical reads 399, read-ahead reads 160335, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3089 ms, elapsed time = 32309 ms.
Server:
Table 'Y'. Scan count 1, logical reads 160880, physical reads 618, read-ahead reads 160335, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2703 ms, elapsed time = 12883 ms.
;with cte as( select distinct objid
from dbo.X
)
select ca.*
from cte
cross apply (select top 1 *
from dbo.X
where objid = cte.objid
order by HistoryID desc) ca
Laptop:
Table 'Y'. Scan count 5001, logical reads 42886, physical reads 16, read-ahead reads 10779, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1560 ms, elapsed time = 22924 ms.
Server:
Table 'Y'. Scan count 5001, logical reads 43061, physical reads 8, read-ahead reads 10779, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 968 ms, elapsed time = 7574 ms.
The CROSS APPLY method seems to be the top performer and that seems to follow a point in Paul's blog post...
"Nevertheless, the APPLY plan might be the optimal choice if a list of distinct bins and shelves is available from another table (eliminating the DISTINCT) and if there are relatively few shelf and bin combinations (so limiting the number of index seeks)."
In this case we only have 5000 unique objIDs and perhaps that is why it performs so well.
The downside is that the original clustered index is there for a reason. The majority of searching is done on HistoryID while this query is more infrequent use case. That said, it makes me wonder if having an indexed view clustered on ObjID and HistoryID might be the solution to the problem.
Thanks again for everyone's assistance.
September 17, 2010 at 1:49 pm
If you can afford the disk space and maintenance load, specialized covering indexes are always on the path to optimal query performance.
Thanks for posting your results 8kb, and good luck to you.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 17, 2010 at 9:22 pm
The Dixie Flatline (9/17/2010)
Paul: I wasn't aware this was a Speed Phreak challenge. 😀
:laugh: No, but you beat me to my favourite solution (APPLY) so I thought I'd have a go with the alternatives.
There are a number of very interesting features of these 'top n per group' problems, probably too much for a forum post. I did intend to return to this topic on my blog (after the Segment Top entry) and this thread has prompted me to do just that. I'll post a link as soon as it is up, in case anyone is interested.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
September 17, 2010 at 10:36 pm
Please do, Paul. I hear your blog is well written and enjoys a HUGE following in Hyderabad. 😉
I am behind on my reading but the articles you've posted recently are bookmarked for when I have time to catch up.
Using TOP 1/ORDER BY instead of MAX() is the key here. Last year in his Friday post-Summit presentation Itzik Ben-Gan demonstrated how the optimizer walks the index logically row by row for the number of times specified by TOP. It follows that,with the proper supporting index in place, TOP 1 / ORDER BY can supply a MIN or MAX value with only one logical read per "group", in this case the objid. That's why the scan count is always the number of objIDs, plus 1 for the DISTINCT scan.
The DISTINCT CTE was an efficient way to get the list of groups (objIDS) because of the clustered index structure you set up. But I was just testing a simple nonclustered index on (objID, historyID descending) and it runs even faster. The nonclustered index is tighter, and the clustered index seeks are the same no matter where the list of ObjIDs comes from.
By the way, I've rebuilt the 4 million row X table with 50,000 and 500,000 distinct objIDs and the cross apply technique is still holding fairly well. At a half million differtent ObjIDs it is running in about 3 seconds on our server.
8kb: You might try building such a non-clustered index on objid, historyID over your existing table schema. It should work just fine with your historyID clustered index and the CROSS APPLY.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 18, 2010 at 8:02 pm
The Dixie Flatline (9/17/2010)
Please do, Paul. I hear your blog is well written and enjoys a HUGE following in Hyderabad.
:laugh: Well as long as they just read it 😉
I am behind on my reading but the articles you've posted recently are bookmarked for when I have time to catch up.
The latest four-part series is just copied from my blog.
Using TOP 1/ORDER BY instead of MAX() is the key here. Last year in his Friday post-Summit presentation Itzik Ben-Gan demonstrated how the optimizer walks the index logically row by row for the number of times specified by TOP. It follows that,with the proper supporting index in place, TOP 1 / ORDER BY can supply a MIN or MAX value with only one logical read per "group", in this case the objid. That's why the scan count is always the number of objIDs, plus 1 for the DISTINCT scan.
Yeah, I explained that to Jeff on a very interesting thread back in May: http://www.sqlservercentral.com/Forums/FindPost917491.aspx
The DISTINCT CTE was an efficient way to get the list of groups (objIDS) because of the clustered index structure you set up. But I was just testing a simple nonclustered index on (objID, historyID descending) and it runs even faster. The nonclustered index is tighter, and the clustered index seeks are the same no matter where the list of ObjIDs comes from.
Yup. Just to be clear on this, the APPLY solution is the one I normally post - but do try changing the outer SELECT * to SELECT objID, HistoryID instead; with a non-clustered index on (objID DESC, HistoryID DESC), and see how APPLY compares to Segment Top...:-P
By the way, I've rebuilt the 4 million row X table with 50,000 and 500,000 distinct objIDs and the cross apply technique is still holding fairly well. At a half million differtent ObjIDs it is running in about 3 seconds on our server.
For sure. The worst case for APPLY is when there are a large number of groups (objID), very few rows per group (HistoryID) and none of the data pages are in memory yet. Be sure to run CHECKPOINT (to flush dirty pages to disk) and DBCC DROPCLEANBUFFERS (to drop the clean pages) before every test run.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
September 18, 2010 at 8:10 pm
8kb (9/17/2010)
The CROSS APPLY method seems to be the top performer and that seems to follow a point in Paul's blog post..."Nevertheless, the APPLY plan might be the optimal choice if a list of distinct bins and shelves is available from another table (eliminating the DISTINCT) and if there are relatively few shelf and bin combinations (so limiting the number of index seeks)."
In this case we only have 5000 unique objIDs and perhaps that is why it performs so well.
Absolutely, yes! The other thing is the outer SELECT star. The Segment Top really flies when there's a covering non-clustered index for the final output - I had to play games with the optimizer (the XML plan) to get it to use the clustered index. Even so, it's not ideal, for all sorts of good reasons.
Changing the outer SELECT to just the ObjID and History columns (with an n/c index on (ObjID, HistoryID)) makes all the difference: no hints, no XML, just a super-fast single scan. Guaranteed to beat APPLY 🙂
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
September 18, 2010 at 9:44 pm
Paul White NZ (9/18/2010)
Yeah, I explained that to Jeff on a very interesting thread back in May: http://www.sqlservercentral.com/Forums/FindPost917491.aspx
Actually, the triangular join thread that leads to is even more interesting. I'm still not done playing with that one.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2010 at 10:04 pm
Jeff Moden (9/18/2010)
Paul White NZ (9/18/2010)
Yeah, I explained that to Jeff on a very interesting thread back in May: http://www.sqlservercentral.com/Forums/FindPost917491.aspxActually, the triangular join thread that leads to is even more interesting. I'm still not done playing with that one.
I remember! In fact there were two interesting 'triangle' threads:
http://www.sqlservercentral.com/Forums/Topic896583-338-1.aspx
http://www.sqlservercentral.com/Forums/Topic911849-392-1.aspx
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply