SQL JOINS BETTER WAY

  • Which way is better way of joining two large tables :

    Select a.col2

    ,b.col2

    from table1 a

    Inner join table2 b

    on a.col1 = b.col1

    Vs.

    Select a.col2

    ,b.col2

    from table1 a

    Inner join ( select col1, col2 from table2 ) b

    on a.col1 = b.col1

    Will second query result in better perfornance ?

  • They will more than likely have identical execution plans.

    You should look at the query plans for both so that you can tell for yourself.

  • I ran 3 quick tests over some tables on our test server...

    Test 1: I ran with several columns selected (not exactly your query but one that made sense for the data). Query B ran 1 second faster than Query A for 2.5 million records.

    Test 2: I ran over the same data set, with just 2 columns. Equal time.

    Test 3: I ran over 24 million records, with just 2 columns selected. Query B ran 1 second faster than Query A.

    I'm not seeing much of a difference. Is there a certain instance where we can expect a significant difference in this approach?

  • I think it would make difference where table2 will have a large number of columns . I am not sure where I read it but it seems to make a difference when you write Select col1... query rather than doing a join on entire table.

    I suppose when JOIN is on entire table , internally it is translating as 'Select * from table2' and hence if we write 'Select col1, col2 from table2" where table2 has more than 50 columns , it would make a difference

    What do you think ?

  • The optimizer won't grab a field if it doesn't have to. If you use, for instance, a covering index, it won't even hit the main table at all but just use the index. Neither query you were showing use all of the fields (assuming that table B has more fields). Therefore if there is a covering index for those two fields, it won't even use the base table and there would be no difference in performance. And if you DID need more fields, then it won't matter whether it's in a derived table or a straight join. The work to get the data should basically be the same.

  • Alan Vogan (7/9/2010)


    Test 1: I ran with several columns selected (not exactly your query but one that made sense for the data). Query B ran 1 second faster than Query A for 2.5 million records.

    Test 2: I ran over the same data set, with just 2 columns. Equal time.

    Test 3: I ran over 24 million records, with just 2 columns selected. Query B ran 1 second faster than Query A.

    One second out of how many seconds total? How many times did you run each test? Are the execution plans different, if so how?

    Small variations of execution time are expected and normal.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/9/2010)


    Alan Vogan (7/9/2010)


    Test 1: I ran with several columns selected (not exactly your query but one that made sense for the data). Query B ran 1 second faster than Query A for 2.5 million records.

    Test 2: I ran over the same data set, with just 2 columns. Equal time.

    Test 3: I ran over 24 million records, with just 2 columns selected. Query B ran 1 second faster than Query A.

    One second out of how many seconds total? How many times did you run each test? Are the execution plans different, if so how?

    Small variations of execution time are expected and normal.

    Test 1: 14 vs. 13 seconds

    Test 2: 14 vs. 14 seconds

    Test 3: 5:51 vs. 5:50 (min:sec)

    I only ran the test once each time. The execution plans were the same for each test pair.

  • Alan Vogan (7/9/2010)


    GilaMonster (7/9/2010)


    Alan Vogan (7/9/2010)


    Test 1: I ran with several columns selected (not exactly your query but one that made sense for the data). Query B ran 1 second faster than Query A for 2.5 million records.

    Test 2: I ran over the same data set, with just 2 columns. Equal time.

    Test 3: I ran over 24 million records, with just 2 columns selected. Query B ran 1 second faster than Query A.

    One second out of how many seconds total? How many times did you run each test? Are the execution plans different, if so how?

    Small variations of execution time are expected and normal.

    Test 1: 14 vs. 13 seconds

    Test 2: 14 vs. 14 seconds

    Test 3: 5:51 vs. 5:50 (min:sec)

    I only ran the test once each time. The execution plans were the same for each test pair.

    The last one is no where close to statistically significant. Small variations in execution time are normal and I suspect a second run would produce different results. If the exec plans are identical, it means SQL will execute those two queries identically.

    p.s. If you had the actual execution plan enabled for the tests, that would account for at least a portion of the execution time, generating and rendering of it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • If you look closely at the execution plan for your first example, you should see that it only pulls the column(s) necessary for filtering or output from the joined table. The optimizer is intelligent enough to know just what is needed without your having to specify them in advance, as you did in the second example.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • There should be no difference in performance because the plans should be the same. The plans should be the same because the optimizer is smart enough to only output the columns you are selecting. So the reality is the derived table query and the traditional query will be treated the same; therefore, making the plans the same.

    There are scenrarios where putting complex queries in derived tables can save IO and increase performance, but this simplistic example is not one.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply