Am I taking crazy pills? Dispute over inner vs. outer join.

Question

Am I taking crazy pills? Dispute over inner vs. outer join.

TheGreenShepherd

SSCertifiable

Points: 6979
More actions
February 14, 2014 at 12:10 pm

#285588

We have the following code (dumbed down):
SELECT Customer.CustomerId, Location.Description
FROM Customer
LEFT JOIN Location ON Customer.LocationId = Location.LocationId
It's a left join here because LocationId may be NULL in the Customer table. Our report designer is suggesting, on the basis of "better performance from inner joins", that we add a record to the Location table with a LocationId value of (-1), and then use the following query.
SELECT Customer.CustomerId, Location.Description
FROM Customer
JOIN Location ON ISNULL(Customer.LocationId, -1) = Location.LocationId
To me, this just seems like bad design and, even if it is faster (I estimate something like a possible .003% improvement), it's just a hackish way of doing things. What say you, colleagues?

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Matt Crowley SSCarpal Tunnel Points: 4163 More actions · Answer 1

Estimates are fine and well, but have the report designer demonstrate how much of a gain it is. My guess is as yours, that this will not add up to even one hour of development time over a few thousand uses.

Also, the query may not be able to make use of an index on Customer.LocationID, due to that column being wrapped in an isnull function.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

Set up a test environment, test both multiple times (run each at least 10 times, more is better), analyse results, conclude.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQLRNNR SSC Guru Points: 281334 More actions · Answer 3

I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Sean Lange SSC Guru Points: 286590 More actions · Answer 4

TheGreenShepherd (2/14/2014)
We have the following code (dumbed down):
SELECT Customer.CustomerId, Location.Description
FROM Customer
LEFT JOIN Location ON Customer.LocationId = Location.LocationId
It's a left join here because LocationId may be NULL in the Customer table. Our report designer is suggesting, on the basis of "better performance from inner joins", that we add a record to the Location table with a LocationId value of (-1), and then use the following query.
SELECT Customer.CustomerId, Location.Description
FROM Customer
JOIN Location ON ISNULL(Customer.LocationId, -1) = Location.LocationId
To me, this just seems like bad design and, even if it is faster (I estimate something like a possible .003% improvement), it's just a hackish way of doing things. What say you, colleagues?

BWAHAAHAA!!!! That is the funniest thing I have heard so far today. Let me get this straight. We are going to add an actual row to a table as a placeholder for NULL. It is true that a LEFT JOIN may perform slower than an INNER JOIN but that is because it can return more rows. Even though I think the very thought of this is absolutely horrible, the implementation suggestion is even worse. This is going to effectively make LocationId a non nullable column. Why not make the column non nullable with a default of -1. That would at least eliminate the ridiculous ISNULL in your join.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Sean Lange SSC Guru Points: 286590 More actions · Answer 5

SQLRNNR (2/14/2014)
I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.

Really? That just seems like such a horribly bad idea it couldn't possibly work.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Luis Cazares SSC Guru Points: 183706 More actions · Answer 6

That inner join won't give you a better performance as your condition is non-SARGable. You should remain with a LEFT JOIN like this:

SELECT Customer.CustomerId, ISNULL( Location.Description, 'Unknown Location') Description

FROM Customer

LEFT JOIN Location ON Customer.LocationId = Location.LocationId

Or if you want to add the Location -1, you should update as well your Customer table and maybe make the column non nullable.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

SQLRNNR SSC Guru Points: 281334 More actions · Answer 7

Sean Lange (2/14/2014)
SQLRNNR (2/14/2014)
I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.
Really? That just seems like such a horribly bad idea it couldn't possibly work.

Yeah, I know. It blew me away to see it work properly and faster.

I think the best conclusion is to test, analyse and conclude as Gail said.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 8

Unfortunately, this is a project still in the design phase, so it's not like we're trying to address an existing performance problem or anything. And, we only have about 100k rows of mock data. I haven't seen much difference between the two queries when I run them. My analysis was based off of the query cost in the execution plan.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 9

Ask your designer about the performance implications of ISNULL around a join column - and research it yourself first if you're not already 100% sure.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Sean Lange SSC Guru Points: 286590 More actions · Answer 10

SQLRNNR (2/14/2014)
Sean Lange (2/14/2014)
SQLRNNR (2/14/2014)
I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.
Really? That just seems like such a horribly bad idea it couldn't possibly work.
Yeah, I know. It blew me away to see it work properly and faster.
I think the best conclusion is to test, analyse and conclude as Gail said.

I was going to throw together a quick million row table test but I took so long writing my initial response that Gail had already suggested testing. I decided to let the OP roll with it.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 11

Yeah, I know, it's non-SARGable. That was really my last line of defense for shooting this idea down, because his next suggestion is to make that column non-nullable, and default the values to -1, which also seems like a bit of a hack.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 12

TheGreenShepherd (2/14/2014)
Yeah, I know, it's non-SARGable. That was really my last line of defense for shooting this idea down, because his next suggestion is to make that column non-nullable, and default the values to -1, which also seems like a bit of a hack.

That is a better option than allowing nulls imho (making the column non-nullable). There are savings in the pages by not having null values and the reduced need to throw coding hacks at the database to get around the null values.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 13

SQLRNNR (2/14/2014)
Sean Lange (2/14/2014)
SQLRNNR (2/14/2014)
I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.
Really? That just seems like such a horribly bad idea it couldn't possibly work.
Yeah, I know. It blew me away to see it work properly and faster.
I think the best conclusion is to test, analyse and conclude as Gail said.

I've seen it too but in VFP years ago. It worked ok. Not all of the kludges in the same system worked as well and they make the ERD and the code significantly more tricky to work with when properly-written code would have performed better in almost all cases.

Set up a test which picks 3 customers out of a mockup sample set of a million 😉

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Matt Crowley SSCarpal Tunnel Points: 4163 More actions · Answer 14

The estimated cost of a query is not a guarantee to how the query will actually perform (in fact,t he cost value has an er....interesting origin...). Run statistics IO, and statistics time on the query to see how many logical reads each one will have to do.