Blog Post

TSQL Tuesday #26 or #23 – Identity Crisis

,

The first opportunity of this New Year to participate in TSQLTuesday, we have been invited by David Howard (blog) to take a second shot at a previous TSQLTuesday.

This second shot is giving me fits.  I have no clue if it is TSQLTuesday 26 or if it is TSQLTuesday 23.  Is it some sort of amoeba of both TSQLTuesdays combined?

While I try to figure that out, how about we look at some pictures of what the New Year might have looked like as it was rung in.  Let’s begin with Paris.

Oooh…Aaaaahhh…Those are quite impressive.

Next up on our tour is where a twin resides for a famous lady.  How did they ring in the New Year in New York City?

And now, let’s DBCC Timewarp to the other side of the world.  Here is what you might have seen if you were in Sydney Australia.

Ahhh.  Yes, that did the trick.  This little diversion sure gave me enough time to think about which TSQLTuesday this is.  Et merci a Stuart Ainsworth (Blog | Twitter).  Le Sujet qu’il a propose est celui laquelle dont je vais parler ce mois.  Dans le TSQLTuesday qu’il a organise, il nous a invite de parler a propos des JOINS.

J’ai completement rate cet occasion de parler a propos de “Joins” en participantes dans TSQLTuesday.  Voyez, TSQLTuesday 23 etait tenu pendant le premier semain au lieu de deuxieme semain ce fois ci.  Et, maintenant, je vais terminer cet article en Anglais.  I was going to write the whole thing in French, but will save that for another time.  I should have a second chance to do that someday.

This second chance, gives me the opportunity to finally talk about a topic that has been on my to-blog list for quite some time.  I hope this post will show some different ways of joining in TSQL.  They are certainly methods I had never considered – until it was required.

Business Requirement

I have some tables that I need query.  One of the tables has lookup information with a bitmask applied to the id field.  Another table references this table but the ids can be a direct match or an indirect match to the id of the lookup table.  For this case, they will always only differ by the value of 1 if the two values do not directly match.  No other match possibility is considered for this example.

Based on this need, the solution dictates some sort of bitwise math.  I have several examples of how a join could be written to accomplish this primary objective.  I am only going to show the possible ways of performing this.  In my environment these all yield the same results and the data is unique and large enough (4.2 million records).  I will compare performance of these different queries in a later post as I demonstrate a query tuning method to drive the query down from nine seconds to 100ms or less.  For the record, I would choose any of queries 5, 6, or 7 for this particular need based on plan cost and performance.

The JOINS

First up is the ANSI style INNER Join using addition in one of the conditions as well as an OR to fulfill the second portion of the business requirement.

PRINT 'Query 1 -- Join with Or and source+1'
------
Select TOP 1000
PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
,US.Source,Us.SourceID, PPV.SourceID
FROMRumorView PPV
Inner Join SourceType US
On (PPV.SourceID = US.SourceID
Or PPV.SourceID = US.SourceID+1)

This is probably the easiest to understand and it performs well enough.  Until running into this business requirement, I hadn’t considered putting an OR in the JOIN conditions.  But it makes sense considering that an AND can be used there.

Next is a NON-ANSI style of JOIN.

PRINT 'Query 2 -- Non-Ansi Join with Or and source+1'
------
Select TOP 1000
PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
,US.Source,Us.SourceID, PPV.SourceID
FROMRumorView PPV, SourceType US
Where (PPV.SourceID = US.SourceID
Or PPV.SourceID = US.SourceID+1)

Through 2008 R2, this works just as well as the ANSI JOIN already shown.  I haven’t tested in SQL 2012 but I do know that the NON-ANSI syntax of *= (for example) no longer works.  I am not a big fan of this style JOIN because it is far too easy to end up with a Cartesian product.

Another type of JOIN that I like is the use of APPLY.

PRINT 'Query 3 -- Cross Apply with Or and source+1'
------
Select TOP 1000
PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
,US.Source,Us.SourceID, PPV.SourceID
FROMRumorView PPV
Cross Apply SourceType US
Where (PPV.SourceID = US.SourceID
Or PPV.SourceID = US.SourceID+1)

This particular code segment is the equivalent of the first query shown.  This is the last in the set of using basic math and an OR in the JOIN conditions.  The remaining queries all rely on bitwise operations to perform the JOIN.  Again, until this particular need, I had never even considered using a bitwise operation in a JOIN.  First in this series is the NON-ANSI style JOIN.

PRINT 'Query 4 -- Non-Ansi with COALESCE and Bit compare'
------
Select TOP 1000
PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
,US.Source,Us.SourceID, PPV.SourceID
FROMRumorView PPV, SourceType US
Where (PPV.SourceID|1 = COALESCE(US.SourceID|1,Us.SourceID))

The big change here is in the where clause.  Notice the use of COALESCE and the first comparison value in that COALESCE.  This is called a BITWISE OR.  From MSDN: “The bits in the result are set to 1 if either or both bits (for the current bit being resolved) in the input expressions have a value of 1; if neither bit in the input expressions is 1, the bit in the result is set to 0.”

So I am comparing the bit values of 1 and the SourceID.  The SourceID from RumorView will create a match meeting the requirements put forth thanks in large part to the BIT OR operation being performed on both sides of the equality in the WHERE clause.  It is also worth mentioning that the COALESCE is completely unnecessary in this query but it I am leaving it as a pseudo reference point for the performance tuning article that will be based on these same queries.

Next on tap is the CROSS Apply version.

PRINT 'Query 5 -- Cross with Bit compare'
------
Select TOP 1000
PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
,US.Source,Us.SourceID, PPV.SourceID
FROMRumorView PPV
Cross Apply SourceType US
Where (PPV.SourceID|1 = US.SourceID|1)

And the last two queries that the optimizer equate to the same query.

PRINT 'Query 6 -- Join with Bit compare'
------
Select TOP 1000
PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
,US.Source,Us.SourceID, PPV.SourceID
FROMRumorView PPV
Inner Join SourceType US
On (PPV.SourceID|1 = US.SourceID|1)
------
PRINT 'Query 7 -- Join with ISNULL and Bit compare'
------
Select TOP 1000
PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
,US.Source,Us.SourceID, PPV.SourceID
FROMRumorView PPV
Inner Join SourceType US
On (PPV.SourceID|1 = ISNULL(US.SourceID|1,Us.SourceID))

The query optimizer in this case is smart and eliminates the ISNULL.  These two queries use the same exact plan, have the same cost and the same execution statistics.  The version with COALESCE is considered more expensive and takes longer to run than these queries.  It is also important to note that the Cross Apply Join also produces the exact same plan as these two queries.

Conclusion

So there you have it.  Many different ways to write the JOIN for this little query.  Performance and results may vary.  It is good to have a few different ways of writing this particular JOIN.  During my testing, it was evident that various methods performed better under different circumstances (such as how the indexes were configured – which will be discussed in the follow-up article). Here are some other mind-blowing (fireworks) articles that you should read as well (here and here).

Notes: Names have been concealed to protect the innocent ;).  Also, the fireworks images are all links to external sites.  I have no affiliation with those sites…disclaimer yada yada yada…I am not responsible for content on those sites but they can have the credit for the images.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating