November 25, 2009 at 11:25 am
I need to order 2 columns on a 10m+ row table with no indexes, and the query I have runs about 8 mins, how can I speed it up. thanks.
create table tmpT
(
model char(2),
date1 datetime,
date2 datetime,
date3 datetime
)
insert into tmpT
select 'm1', getdate() - 7, getdate() - 5, getdate() -4 union all
select 'm1', getdate() - 6, getdate() - 5 , getdate() -4 union all
select 'm1',getdate() - 5, getdate() - 5 , getdate() -4 union all
select 'm1',getdate() - 4, getdate() - 5 , getdate() -4 union all
select 'm1',getdate() - 3, getdate() - 5 , getdate() -4 union all
select 'm1',getdate() - 2, getdate() - 5 , getdate() -4 union all
select 'm1',getdate() - 1, getdate() - 5 , getdate()-4 union all
select 'm2', getdate() - 7, getdate() - 5, getdate() -4 union all
select 'm2', getdate() - 6, getdate() - 5 , getdate() -4 union all
select 'm2',getdate() - 5, getdate() - 5 , getdate() -4 union all
select 'm2',getdate() - 4, getdate() - 5 , getdate() -4 union all
select 'm2',getdate() - 3, getdate() - 5 , getdate() -4
select * from tmpT
select
t.model,
t.date1,
t.date2,
t.date3,
case when t.date1 < t.date2
then 0
when t.date1 = t.date2
then 1
else row_1
end as c1,
case when t.date1 < t.date3
then 0
when t.date1 = t.date3
then 1
else row_2
end as c2
from tmpT t left join (
select row_number() over (partition by model order by date1) + 1 as row_1, * from tmpT where date1 > date2
) t1
on t.model = t1.model and t.date1 = t1.date1
left join (
select row_number() over (partition by model order by date1) + 1 as row_2, * from tmpT where date1 > date3
) t2
on t.model = t2.model and t.date1 = t2.date1
November 25, 2009 at 2:36 pm
Create a clustered index on (model, date1) ?
That alone would likely have a massive performance impact.
November 25, 2009 at 2:50 pm
Perhaps this isn't the actual SQL you're using? Your 't1' and 't2' sub-queries don't contain a 'Model' column - and neither alias is used in the rest of the query. As it stands, it appears the two left-joins can be removed. Additionally, the 'row_1' and 'row_2' columns referred to in the CASE statements are not defined anywhere in the query. Or, am I missing something?
November 25, 2009 at 3:12 pm
You're missing the else's in the case statements that point to the rownumbers in the left joins. You can actually copy/paste/run that in SSMS to see what it does(although the 'why' is very much still a mystery to me as well).
November 25, 2009 at 10:28 pm
shanghaigirl-1125377 (11/25/2009)
I need to order 2 columns on a 10m+ row table with no indexes, and the query I have runs about 8 mins, how can I speed it up. thanks.
If it needs an index, either add one yourself or get the DBA to do it. Just make sure the query is "pat" before you do.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 27, 2009 at 9:26 am
1) you have no where clause - do you REALLY need to return all 10+M rows?
2) can you please provide expected output? I can't for the life of me figure out why you have multiple left joins here for a simple order by need...
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 3, 2009 at 7:17 pm
thanks for all your replies, and sorry for my late response, I was away for a few days.
the left join query gave me the exact result that I wanted. It needs to compare 2 dates partitioned by each model, if date1 is earlier than date2 then assign 0, otherwise start ordering 1,2,3... same as date3. yes, real query has to order 10m records. I have added an index model + date1, it worked much faster, now is 1 min 20sec.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply