The Potential of Joins

Question

The Potential of Joins

ashutosh.kumarpandey

Old Hand

Points: 341
More actions
February 25, 2015 at 10:30 pm

#398082

Comments posted to this topic are about the item The Potential of Joins

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

redtux1 Old Hand Points: 320 More actions · Answer 1

Generally a good article, however there is a potential pitfall which should have been mentioned. This is that when using joins you do always have to be aware of the potential for unintential duplicates, which is not the case with exists/in

h.tobisch SSCommitted Points: 1671 More actions · Answer 2

the general ideas outlined in this article are worth thinking over them.

However, what is stated therein, cannot be taken as matter of fact, because the optimizer

has its own ways.

Just an example: the queries (used my own database)

select * from kunden_zp_status where id_zp in (select id_zp from kd_zp where zp_name like 'aso%')

select s.* from kunden_zp_status s join kd_zp z on z.id_zp = s.id_zp and z.zp_name like 'aso%' option

behave alike : index seek on kd_zp - loop join - clustered index seek on kunden_zp_status ,

neither execution plan does show a sub-select

if you, however, force the optimizer to do what you have ordered it to do,

you will see the penalty from the sub-select

select * from kunden_zp_status where id_zp in (select id_zp from kd_zp where zp_name like 'aso%') option (force order)

select s.* from kunden_zp_status s join kd_zp z on z.id_zp = s.id_zp and z.zp_name like 'aso%' option (force order)

now the execution plan has a loop join from a sub-select for the first query, and a merge join for the second.

radu.gheorghiu Old Hand Points: 363 More actions · Answer 3

I'm curious as to why in the last example he used a LEFT OUTER JOIN, although in other queries in the article he used LEFT JOIN. From my knowledge, there is no difference between LEFT OUTER and LEFT type of JOINs in SQL Server.

Assuming that it doesn't return the same data, wouldn't it be a redundant condition to put WHERE Q.CustomerName IS NULL if you have already used LEFT OUTER JOIN?

~ Just some guy trying to tune queries ~

redtux1 Old Hand Points: 320 More actions · Answer 4

radu.gheorghiu (2/26/2015)
I'm curious as to why in the last example he used a LEFT OUTER JOIN, although in other queries in the article he used LEFT JOIN. From my knowledge, there is no difference between LEFT OUTER and LEFT type of JOINs in SQL Server.
Assuming that it doesn't return the same data, wouldn't it be a redundant condition to put WHERE Q.CustomerName IS NULL if you have already used LEFT OUTER JOIN?

I would not say so.

SELECT T.CustomerName,T.CustomerResponse,T.CalledOn

FROM #TMP T

LEFT OUTER JOIN (SELECT CustomerName

FROM #TMP

WHERE CustomerResponse = 'Not Interested'

) Q

ON Q.CustomerName = T.CustomerName

INNER JOIN (SELECT CustomerName,

MAX(CONVERT(DATE,CalledOn)) AS MaxCalledOn

FROM #TMP

GROUP BY CustomerName

)Q2

ON Q2.CustomerName = T.CustomerName

WHERE Q.CustomerName IS NULL

AND T.CalledOn = Q2.MaxCalledOn

First the inner join (Q2) restricts data to the last called date

Then this record set is joined against the main data set and finally the where statement filters out records that are not in the left join (Q)

paul.chaplin SSC Enthusiast Points: 138 More actions · Answer 5

I think a previous post alluded to this, but the last bit of SQL is fine where a customer would only be called once on a given date. If a customer could have been called twice, either the customer would appear twice or there would need to be a max/group in the code to pick one of the outcomes randomly.

Phrozt SSC Veteran Points: 222 More actions · Answer 6

The English of this article needs to be cleaned up. I'm thinking the author does not speak English as a primary language, which is totally fine, but if you're going to *publish* an article in English, the extra effort needs to be made in order to be professional.

There were also some inconsistencies such as the one radu pointed out, and one in scenario 2 where it was about a "NOT IN" sub query, but the query was put together "... where IN (select... where <> "Mountain Works")", which is slightly different.

Also, there are no benchmarks or Execution plans to prove the point, so we're just supposed to take the author's word for it???

Finally, I'm wondering how sub queries stack up when they are used in conjunction w/EXISTS and/or NOT EXISTS conditions.

robert.drew.dudley SSC Rookie Points: 25 More actions · Answer 7

It has been my experience when I have to refactor scripts that use sub-queries, that the use of join's has improved the efficiency. It would be nice though to see some stats that prove it out.

josh.henderson SSC Rookie Points: 34 More actions · Answer 8

Thinking of the aggregate query.. curious if efficiencies could be gained using a WITH Statement or temp table and joining.

With Statement:

With S as (select ProductID,SalesOrderID,SUM(OrderQty) AS OrderQty FROM Sales.SalesOrderDetail GROUP BY ProductID,SalesOrderID)

Then join it as a table

Inner join S

ON P.ProductID = S.ProductID

Or create a temp table and you could take advantage of creating an index off your results if you happen to be linking off some data that isn't a key (like Customer_ID from an Order table)

macrostarrphish Old Hand Points: 353 More actions · Answer 9

Scenario 2 doesn't have a NOT IN in the original statement...I believe it belongs here:

select DISTINCT Name

from Production.Product

where ProductID IN (Select ProductID from Purchasing.ProductVendor

where VendorID NOT IN (select VendorID from Purchasing.Vendor

where Name<>'Mountain Works'

)

Adam Angelini SSCrazy Eights Points: 8548 More actions · Answer 10

Adam Angelini

SSCrazy Eights

Points: 8548

February 26, 2015 at 9:01 am

#1780297

Yeah, scenario 2 examples are the same as scenario 1.

penny.casey SSC Rookie Points: 25 More actions · Answer 11

My boss is a subquery junkie. When I try to make sense of the queries she has saved as examples for me, I find this confusing. You've intrigued me with your use of joins especially in regards to a substitute for multiple group by's.

THANKS!

jnp15 SSC-Addicted Points: 497 More actions · Answer 12

I realize that the point of this article was to compare joins to subqueries but I would have thought to use the rank function. Here is how I would have done it:

;WITH Test AS (

SELECT

CustomerName,

CalledOn,

CustomerResponse,

RANK() OVER (PARTITION BY customername ORDER BY CONVERT(DATE,CalledOn) DESC) AS Rank

FROM #TMP

)

SELECT CustomerName,CustomerResponse,CalledOn

FROM Test

WHERE Rank=1

AND CustomerResponse<>'Not Interested'

There weren't enough records in the table to know which is fastest. I am curious to know what people think of using the Rank function for these purposes instead.

josh.henderson SSC Rookie Points: 34 More actions · Answer 13

I had the same thought as you but I think you would want to use row_number not rank. Here is a good example of the difference between rank, and row_number.

http://stackoverflow.com/questions/7747327/sql-rank-versus-row-number

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 14

Joins vs Exists. Neither is always right. Just be flexible.