Unmatched Records query performance problems

  • I think this should be an easy thing, but I am not really sure.

    Using SQL Server 2008r2, Core I7 950 @ 4.0 ghz, 12GB memory, and primary storage is a RevoR2 Solid State PCI device.

    I have table A and table B

    Table A has all records. Table B has a subset of those records. (Aproximatly 150,000 records in each table)

    I need to find all records in table A that do not have a coresponding entry in B.

    I have indexes on the comparison field.

    I did a Join where the matching field is null. Average run time is 5 minutes. Execution plan shows proper parallelism and index scans.

    I did a join where not in subselect. Average run time is 3 minutes (same data). Execution plan shows a full table scan an no parallelism.

    I was expecting the join on null to run faster, especially since the execution plan shows a full table scan.

    I was also expecting the entire thing to run faster all around.

  • Since you're in SQL 2k8, look up the syntax for the EXCEPT clause in a SELECT. It's like the LEFT JOIN/NULL setup, but built in and optimized, you may find better performance out of that.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • For that data volume it seems far too slow either way.

    Can you post the DDL of both tables, specifically what data type is/are the join columns? What constraints are there on the joining column(s), and what else do you have in the query - any functions, multiple OR conditions etc. Best bet - post the full text of the query.

    Mike

  • I think we found the issue. The select statement was wrapped in an IF that was behaving erraticly. We corrected the IF statement and it runs sub-second times now.

    A bit of testing lead us to use the except statement. Thank you for the help!

  • Depending on the width of the join columns (e.g. if it's just on a PK field or a large combination of columns you're performing the match on) you could potentially get the best performance from using NOT EXISTS.

    E.g.

    Select ID, column1, column2 from Table1

    WHERE NOT EXISTS (SELECT 1 FROM TABLE2 WHERE TABLE2.ID=TABLE1.ID)

    As always, worth testing a few different ways.

  • There are many ways to write these types of query - not all of which are strictly the same. There are frequently slight semantic differences which mean that LEFT JOIN is asking a different question from NOT EXISTS, or EXCEPT for example. Usually, these differences concern the treatment of duplicates and NULLs. There is nothing magic or optimized about EXCEPT by the way.

    The only thing that matters is the execution plan. Different TSQL constructions that are exactly logically equivalent may still be optimized slightly differently, resulting in different plans. It really makes no sense to compare 'performance' of different TSQL constructions that do not have the same semantic.

    Paul

  • Gail Shaw did some interesting bench marks on this. I found them quite informative.

    http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    Todd Fifield

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply