March 25, 2011 at 8:46 am
Good morning,
I was wondering if there is a way to optimize a query that has a MAX() function. Here are the detailed information:
Table:
[ID] INT NOT NULL,
[Date] SMALLDATETIME NOT NULL
PK Clustered Index:
[ID] ASC,
[Date] ASC
Query:
SELECT [ID], MAX([Date]) AS [Max_Date]
FROM [dbo].[Table]
GROUP BY [ID]
The query plan is doing a clustered index scan using the PK index mentioned above and is scanning the entire table (20 million total records, returns 400 records). I know that because the query is using the aggregate function, it has to execute that function on every record. Is there no other way(s) to optimize this query?
Thank you in advance.
March 25, 2011 at 8:48 am
That's already as optimal as it will get.
Since the table has 2 columns only, that clustered index scan is returning the minimum amount of information possible to execute this query. It's returning it in an order optimal for a stream aggregate, the fastest aggregate type.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 27, 2011 at 12:40 pm
Assuming that you have a finite list of ID's used, you can use cross apply with TOP, as demonstrated at the end of this article http://www.sqlservercentral.com/articles/T-SQL/71571/
As you want 400 out of the 20 million , it should improve performance.
March 28, 2011 at 9:26 am
is this a partitioned table perchance? there is a very nasty bug for those for max/min.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 29, 2011 at 11:14 am
Dave Ballantyne (3/27/2011)
Assuming that you have a finite list of ID's used, you can use cross apply with TOP, as demonstrated at the end of this article http://www.sqlservercentral.com/articles/T-SQL/71571/As you want 400 out of the 20 million , it should improve performance.
Dave,
Even if this produces a Segment/Top type plan, each row would have to be touched upon to do it since there are only 2 columns in the table and they are both in the clustered index. I doubt that it would change the performance. There's really no way around hitting each row in the table.
Todd Fifield
March 29, 2011 at 1:37 pm
tfifield (3/29/2011)
Dave Ballantyne (3/27/2011)
Assuming that you have a finite list of ID's used, you can use cross apply with TOP, as demonstrated at the end of this article http://www.sqlservercentral.com/articles/T-SQL/71571/As you want 400 out of the 20 million , it should improve performance.
Dave,
Even if this produces a Segment/Top type plan, each row would have to be touched upon to do it since there are only 2 columns in the table and they are both in the clustered index. I doubt that it would change the performance. There's really no way around hitting each row in the table.
Todd Fifield
My assumption is that as this is looking like a child table, there should exist a table which has ID as a unique PK somewhere.
If that exists then
Select PK,MaxDate.Date
from <Theoretical Table > TT Cross apply(Select top(1) Date
from Table
where Table.ID = TT.ID
order by date desc) as MaxDate(Date)
400 seeks Should be faster that the single scan through 20 million rows
Take a look under 'the caveat' section right at the end of the article i referenced ( and wrote ).
Same thing here..
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply