February 5, 2004 at 10:51 am
This query is part of a DTS package that never seems to complete (after 3 days we canceled):
SELECT distinct Prcs_dte
FROM tbl_Tracking
WHERE prcs_dte not in (Select distinct prcs_dte FROM Acct_Stats)
tbl_Tracking has 8 million rows, Accts_stats has 3 million rows. Both tables have Prcs_dte indexes.
If I put the outer query results in a temp table and the inner query results in another temp table and then select the records not appearing in both, the query executes in 10 sec.
I'm thinking that maybe the above query is generating an intermediate table that is a cross join of the two tables before coming up with a result. That would be 24 trillion rows. That might be why it never completes.
Does anyone have any ideas? I guess I know how it can be fixed... I just don't understand why such an apparently simple query is causing such a problem.
February 5, 2004 at 11:47 am
Try outer join syntax?
SELECT distinct tr.Prcs_dte
FROM tbl_Tracking tr left outer join Acct_Stats asts
on tr. prcs_dte = asts. prcs_dte
where asts. prcs_dte is null
This may work faster.
Russel Loski, MCSD
Russel Loski, MCSE Business Intelligence, Data Platform
February 5, 2004 at 12:20 pm
I'm a firm believer in the use of temp tables, especially when using stored procedures in datawarehouse/decision support environments, where you always have tables with rows in the millions, often hundreds of millions, or even billions. Why try and right one query that runs for days when 2 querys that build temp talbes and 1 query joining them runs in practically no time at all.
February 5, 2004 at 4:30 pm
February 5, 2004 at 10:32 pm
Every row in tbl_Tracking will scan through Acct_Stats. thus
8 million * 3 million at least.
Not to mention distint's impact.
Corellated join is an option.
February 6, 2004 at 5:05 am
Ran into a very similar problem here. Since you have the correct indexes, try using:
SELECT distinct Prcs_dte
FROM tbl_Tracking
WHERE NOT EXISTS (Select prcs_dte FROM Acct_Stats where prcs_dte = tbl_Tracking.Prcs_dtr)
I made a similar change to some of our queries here (very small tables), and execution time came down by factors of 100.
I think it's because the NOT IN has to look through the whole list, while the not exists query just has to do a quick index seek.
Greg Walker
DBA, ExpenseWatch.com
February 6, 2004 at 5:24 am
use " NOT EXISTS " ... that's the way to do it !
speed it up by adding an index on table/view Acct_Stats column prcs_dte.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
February 6, 2004 at 7:14 am
Check out this previous thread on NOT IN vs NOT EXISTS vs LEFT OUTER JOIN. (read through to the end.)
http://www.sqlservercentral.com/forum/topic.asp?TOPIC_ID=18745&FORUM_ID=8&CAT_ID=1&Forum_Title=T%2DSQL&Topic_Title=Select+Statement+Using+Not+In+and+Null+Values
We had similar problems with NOT IN queries on a table with a half million records taking FOREVER to finish and have started switching them to LEFT OUTER JOINs (even if you have to draw pictures then squint and look at them sideways to understand LOJs the first couple of times!!). They are much harder to understand and much harder to read & figure out what they do 6 months later, but the speed difference makes it worth the effort for larger tables.
Someone in this thread said he tested LEFT OUTER JOIN vs NOT EXISTS querries and found the LEFT OUTER JOIN performed slightly faster.
February 6, 2004 at 8:32 am
I appreciate all of your input on this.
I have tried the different approaches suggested. I have found that the temp table approach is the fastest in my situation. Here is the code I settled on:
SELECT DISTINCT Prcs_dte
INTO #A_Dates
FROM tbl_Tracking
SELECT DISTINCT Prcs_dte
INTO #B_Dates
FROM Acct_Stats
SELECT A.Prcs_dte
FROM #A_Dates A
LEFT OUTER JOIN #B_Dates B
ON A.Prcs_dte = B.Prcs_dte
WHERE B.Prcs_dte IS NULL
I get a result in about 10 secs.
Thanks
February 6, 2004 at 8:40 am
replace the distinct by a group by and you'll gain more !
SELECT Prcs_dte
INTO #A_Dates
FROM tbl_Tracking group by Prcs_dte
SELECT Prcs_dte
INTO #B_Dates
FROM Acct_Stats group by Prcs_dte
Still I'd suggest you'd profile it and compare execution-plans for all approches.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
February 8, 2004 at 7:55 pm
Dear Experts,
As the forum become "hot" here, I want to come out with some kind of similar problem. This query run quite long and sometimes caused my DTS failed.
SELECT * FROM TBL1 WHERE FLD1 = (SELECT MIN(FLD1) FROM TBL1)
Is there any way to make this query run with best performance?
Thanks in advance.
Regards,
kokyan
February 8, 2004 at 8:21 pm
What are you trying to do with this query?
Russ
Russel Loski, MCSE Business Intelligence, Data Platform
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply