Comparing two very large data sets - which join should i use?

  • I have two very big tables (close to 1 bilion records each), these two tables are pretty much 1-1 relationship. I compare each record from table A and bump it against each record in table B, which type of join or hint should i be using. I am currently using left join and have some performance issues, would like to know how others would do.

  • It really depends on what you're trying to do. If you need all records from table and only those that match from the other, LEFT JOIN is the way to go. As far as performance, well, that really depends. If you don't have any kind of filtering criteria in the query, so you're just retrieving the entire data set, there's no magic to be performed with one JOIN type over another or any kind of hints. Instead, you need lots of big, fast hardware. If you're applying filtering criteria and you're not satisfied with performance, that could be down to the availability of the right kinds of indexes or the statistics on those indexes to get things to perform better.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • With that many rows, you'll probably want to consider other, more efficient methods of comparing tables.

    For example, change tracking would allow you to determine the rows in each table that have been modified (DELETE / INSERT / UPDATE) since the last time you compared the tables. That should drastically reduce the number of rows you have to compare. You could then use standard EXCEPT | INTERSECT logic to determine row differences.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (10/17/2014)


    With that many rows, you'll probably want to consider other, more efficient methods of comparing tables.

    For example, change tracking would allow you to determine the rows in each table that have been modified (DELETE / INSERT / UPDATE) since the last time you compared the tables. That should drastically reduce the number of rows you have to compare. You could then use standard EXCEPT | INTERSECT logic to determine row differences.

    I can give it a shot but i think one of the tables is completely deleted and loaded again, would change tracking still help?

  • Probably not, since everything changed. You might want to modify that process too.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply