Why does query run so long?

  • This query is part of a DTS package that never seems to complete (after 3 days we canceled):

    SELECT distinct Prcs_dte

    FROM tbl_Tracking

    WHERE prcs_dte not in (Select distinct prcs_dte FROM Acct_Stats)

    tbl_Tracking has 8 million rows, Accts_stats has 3 million rows.  Both tables have Prcs_dte indexes.

    If I put the outer query results in a temp table and the inner query results in another temp table and then select the records not appearing in both, the query executes in 10 sec.

    I'm thinking that maybe the above query is generating an intermediate table that is a cross join of the two tables before coming up with a result.  That would be 24 trillion rows.  That might be why it never completes.

    Does anyone have any ideas?  I guess I know how it can be fixed... I just don't understand why such an apparently simple query is causing such a problem.

     

     

  • Try outer join syntax?

     

    SELECT distinct tr.Prcs_dte

    FROM tbl_Tracking tr left outer join Acct_Stats asts

    on tr. prcs_dte = asts. prcs_dte

    where asts. prcs_dte is null

     

    This may work faster.

    Russel Loski, MCSD

    Russel Loski, MCSE Business Intelligence, Data Platform

  • I'm a firm believer in the use of temp tables, especially when using stored procedures in datawarehouse/decision support environments, where you always have tables with rows in the millions, often hundreds of millions, or even billions.  Why try and right one query that runs for days when 2 querys that build temp talbes and 1 query joining them runs in practically no time at all.

     

  • DISTINCT does not take advantage of Indexes!

    Try using GROUP BY Instead with an INDEX on Prcs_dte


    Kindest Regards,

  • Every row in tbl_Tracking will scan through Acct_Stats. thus

    8 million * 3 million at least.

    Not to mention distint's impact.

    Corellated join is an option.

  • Ran into a very similar problem here. Since you have the correct indexes, try using:

    SELECT distinct Prcs_dte

    FROM tbl_Tracking

    WHERE NOT EXISTS (Select prcs_dte FROM Acct_Stats where prcs_dte = tbl_Tracking.Prcs_dtr)

     

    I made a similar change to some of our queries here (very small tables), and execution time came down by factors of 100.

    I think it's because the NOT IN has to look through the whole list, while the not exists query just has to do a quick index seek.

    Greg Walker

    DBA, ExpenseWatch.com

     

     


    Greg Walker
    DBA, ExpenseWatch.com

  • use " NOT EXISTS " ... that's the way to do it !

    speed it up by adding an index on table/view Acct_Stats column prcs_dte.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Check out this previous thread on NOT IN vs NOT EXISTS vs LEFT OUTER JOIN.  (read through to the end.)

    http://www.sqlservercentral.com/forum/topic.asp?TOPIC_ID=18745&FORUM_ID=8&CAT_ID=1&Forum_Title=T%2DSQL&Topic_Title=Select+Statement+Using+Not+In+and+Null+Values

    We had similar problems with NOT IN queries on a table with a half million records taking FOREVER to finish and have started switching them to LEFT OUTER JOINs (even if you have to draw pictures then squint and look at them sideways to understand LOJs the first couple of times!!).  They are much harder to understand and much harder to read & figure out what they do 6 months later, but the speed difference makes it worth the effort for larger tables. 

    Someone in this thread said he tested LEFT OUTER JOIN vs NOT EXISTS querries and found the LEFT OUTER JOIN performed slightly faster.

  • I appreciate all of your input on this.

    I have tried the different approaches suggested.  I have found that the temp table approach is the fastest in my situation.  Here is the code I settled on:

    SELECT  DISTINCT Prcs_dte

    INTO #A_Dates

    FROM tbl_Tracking

    SELECT  DISTINCT Prcs_dte

    INTO #B_Dates

    FROM Acct_Stats

    SELECT A.Prcs_dte

    FROM  #A_Dates A

    LEFT OUTER JOIN  #B_Dates B

    ON A.Prcs_dte = B.Prcs_dte

    WHERE  B.Prcs_dte IS NULL

    I get a result in about 10 secs.

    Thanks

  • replace the distinct by a group by and you'll gain more !

    SELECT  Prcs_dte

    INTO #A_Dates

    FROM tbl_Tracking group by Prcs_dte

    SELECT  Prcs_dte

    INTO #B_Dates

    FROM Acct_Stats group by Prcs_dte

     

    Still I'd suggest you'd profile it and compare execution-plans for all approches.

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Dear Experts,

    As the forum become "hot" here, I want to come out with some kind of similar problem. This query run quite long and sometimes caused my DTS failed.

    SELECT * FROM TBL1 WHERE FLD1 = (SELECT MIN(FLD1) FROM TBL1)

    Is there any way to make this query run with best performance?

    Thanks in advance.

     



    Regards,
    kokyan

  • What are you trying to do with this query?

     

    Russ

    Russel Loski, MCSE Business Intelligence, Data Platform

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply