Sub query expressions vs. Joins.

brian118

Hall of Fame

Points: 3506
More actions
July 23, 2010 at 12:21 am

#1197347

tim.napier (7/21/2010)
What happens if you use a left join in the second of the two queries. The fact they both returned 40000 rows is by coincidence, cos the sub select would returns if their isn't an equi joiin
Thanks for your feedback!
I populated the data intentionally so that both queries would return the same amount of rows. As you correctly noticed, in case that not all rows qualify for an equi join, a left/right outer join would have to be used, however in this case I just wanted to show the performance difference between an Inner join and a sub-select. I should have expanded more the article by also including Outer joins and compare their performance.
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
brian118

Hall of Fame

Points: 3506
More actions
July 23, 2010 at 12:26 am

#1197349

crackbridge (7/22/2010)
Its interesting your article, one thing its not clear enought to me is why do you open your article with a GROUP and COUNT issue but later you develop a SELECT JOIN for 1-m topic ?
.
Thanks for your feedback!
The first SELECT statement using the GROUP BY and COUNT was used just to display the rows distribution count. I could have left this simple statement out from the article but was adviced by the editor to show how I arrived at the record counts table.
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
brian118

Hall of Fame

Points: 3506
More actions
July 23, 2010 at 12:28 am

#1197350

paul-876346 (7/22/2010)
Thank you - I also wonder how the sub-query or join would compare to Cross (or Outer) Apply - as in the following query?
Thanks for your feedback! Next time I'll take your suggestion and will not only compare Inner Joins with Sub-Query expressions but also with CROSS, LEFT, RIGHT joins...
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
brian118

Hall of Fame

Points: 3506
More actions
July 23, 2010 at 12:29 am

#1197351

jainraj_2000 (7/22/2010)
The Script is not working in SQL SERVER 2005 the populate tables procedure is having an insert into managers with multiple values.
In order to run it correctly it should be changed to
INSERT INTO Managers (ManagerName)VALUES ('Manager1');
INSERT INTO Managers (ManagerName) VALUES('Manager2');
INSERT INTO Managers (ManagerName)VALUES('Manager3');
Regards
Rajendra
Thanks for your feedback!
You're right, I should have specified which platform I've used, i.e. in this case SQL 2008. Thanks for pointing this out.
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
brian118

Hall of Fame

Points: 3506
More actions
July 23, 2010 at 12:33 am

#1197354

Johannes Krackowizer (7/22/2010)
There is a big flaw in the article, you take two different statements with the same result but only by accident. The subselect ist quite the same as an outer join. The second statement uses an inner join which is in this example the same because ther is no employe without manager. But if there would be an employe without manager the statments would have different results. so it depends on what you try to accomplish. If you have two tables that could be inner joined, the inner join will be better than a subselect. But if you have to do an outer join, than the subselect will be mostly equal to the join. A third scenario could bethat you need an outer join AND more than one column from the subselect (multiple subselects), than the outer join will be much better than the subselect.
So there is a big difference in which scenario you are using subselects. in some scenarios there will be virtually no difference between both approches.
And if you are using oracle databases than all of this is the exact opposite than in mssql. 😉 I tried this example on my oracle db and found out that an subselect has only half the costs than the join even if i do multiple subselects.
Thanks for your feedback!
I agree with you.... 'it depends on what you try to accomplish'. The scope of this article was just to demostrate the performance difference between an INNER join and a Sub Query expression.... should have expanded more to compare between other JOIN types...
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
brian118

Hall of Fame

Points: 3506
More actions
July 23, 2010 at 12:43 am

#1197355

glen.Gorman (7/22/2010)
I was thinking something is missing from this article, and whats missing is the reasoning why joins are generally better than sub queries. The reasoning give you more knowledge of why and when you should use joins as opposed sub queries. It would be erroneous otherwise.
Thanks for your feedback!
I did show why a sub query will be slower than an inner join (when returning many rows) since a temporarily table is introduced (tempdb), however I'm sure that there are other resources such as books and papers which will better explain why this happens.
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
brian118

Hall of Fame

Points: 3506
More actions
July 23, 2010 at 1:06 am

#1197361

gary.strange-1058508 (7/22/2010)
Some of the comments posted seem to be following the lines of "what if this, what if that, this isn't quite right".
I would agree that the article may leave some discrepancies but I think it highlights a novice error that I've seen at every company I've worked for.
If you're querying tables that have a few thousand, maybe even tens of thousands of records, with ever lowering costs of high performing hardware, you probably don't need to care about how the query is constructed or executed. 50ms or 500ms means nothing to an end user and you'll get no reward for your troubles.
But if you are dealing with millions/billions of records this is definitely a lesson to be learnt. You'll achieve massive performance increase by helping the query optimizer and giving it a well structured query before it has to process it. Then just sit back and enjoy the praise from your boss.:-P
Thanks to all for your feedback!
I should have made it more clear in the introduction that in this article I'm highlighting a novice error which I frequently encounter, were INNER Joins are replaced by Sub-Query expressions when in these cases, the INNER joins should be the prefered method. I agree that you can't replace the Sub-Query with an INNER join in all cases, since it ALL depends on the data (and just replacing them blindly will introduce bugs with the data), therefore should have better highlighted my point.
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
Arjun Sivadasan

SSCertifiable

Points: 5902
More actions
July 23, 2010 at 1:44 am

#1197365

Arjun Sivadasan (7/22/2010)
Brian, Thanks for the article. I have a question here. I am not gonna post the DDL scripts as I have used your sample tables.
Consider the following statement that you had posted:
Select COUNT(*),ManagerID
From Employees
Group By ManagerID
Order By ManagerID
Now say, I need a third column which shows all the subordinates (IDs in this case) of a manager as csv. Can this be done in a better way than this?
select
e.ManagerID,
COUNT(*) [No of Subs],
(select(stuff((select ',' + cast(EmployeeID as varchar)
from Employees where managerid = e.managerid
for xml path('')),1,1,''))) [SubIDs]
from Employees e
group By e.ManagerID
order By e.ManagerID
I know the requirement (showing IDs as csv) is a little weird but I had to get similar output in some reports. Please comment.
- arjun
Hey, no one answered my question. I would like to add that I'm talking of tens of millions of rows in the employees table. Is there a better way to do this?
- arjun
https://sqlroadie.com/
Jeff Moden

SSC Guru

Points: 1004704
More actions
July 23, 2010 at 6:04 am

#1197447

Arjun Sivadasan (7/23/2010)
Arjun Sivadasan (7/22/2010)
Brian, Thanks for the article. I have a question here. I am not gonna post the DDL scripts as I have used your sample tables.
Consider the following statement that you had posted:
Select COUNT(*),ManagerID
From Employees
Group By ManagerID
Order By ManagerID
Now say, I need a third column which shows all the subordinates (IDs in this case) of a manager as csv. Can this be done in a better way than this?
select
e.ManagerID,
COUNT(*) [No of Subs],
(select(stuff((select ',' + cast(EmployeeID as varchar)
from Employees where managerid = e.managerid
for xml path('')),1,1,''))) [SubIDs]
from Employees e
group By e.ManagerID
order By e.ManagerID
I know the requirement (showing IDs as csv) is a little weird but I had to get similar output in some reports. Please comment.
- arjun
Hey, no one answered my question. I would like to add that I'm talking of tens of millions of rows in the employees table. Is there a better way to do this?
- arjun
It's a little beyond the scope of the article, Arjun. I believe I'd open a separate thread on the appropriate forum... you'll get more answers that way.
Also... ten of millions of rows in the employees table? I'd really like to know what company that is. 😉
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Arjun Sivadasan

SSCertifiable

Points: 5902
More actions
July 23, 2010 at 6:25 am

#1197459

lol, no it is not the employees table. As i said the scenario is similar but not the same. And i don't have to show values from all the rows; there's always a filter criteria such that the csv will contain, say 100 values tops. Anyway, I will post it in the dev forums. Thanks.
- arjun
https://sqlroadie.com/
alen teplitsky

SSC-Dedicated

Points: 30014
More actions
July 23, 2010 at 12:51 pm

#1197691

gary.strange-1058508 (7/22/2010)
Some of the comments posted seem to be following the lines of "what if this, what if that, this isn't quite right".
I would agree that the article may leave some discrepancies but I think it highlights a novice error that I've seen at every company I've worked for.
If you're querying tables that have a few thousand, maybe even tens of thousands of records, with ever lowering costs of high performing hardware, you probably don't need to care about how the query is constructed or executed. 50ms or 500ms means nothing to an end user and you'll get no reward for your troubles.
But if you are dealing with millions/billions of records this is definitely a lesson to be learnt. You'll achieve massive performance increase by helping the query optimizer and giving it a well structured query before it has to process it. Then just sit back and enjoy the praise from your boss.:-P
we have tables like this and my advice is always to break your searches into smaller data sets. i've seen people query a few years of data at once and it literally takes days to run. if you were to break this into smaller queries you would have your data in hours.
gary.strange-sqlconsumer

SSCommitted

Points: 1816
More actions
July 24, 2010 at 5:37 am

#1197849

bit vague ???
got any examples of what you're talking about?
hrc

Old Hand

Points: 333
More actions
July 25, 2010 at 5:33 am

#1197928

.. glad I always try to avoid the use of subqueries in my applications. This was done by intuition as I didn't know of the tabs in Management Studio. I hate the fact that I discovered this functionality so late.
I added an index to a table citizen(lastname asc, firstname asc) as this is the normal way I order selects for that table. Assumed it would make a difference but according to the execution plan the index wasn't even used. Is my assumption wrong or am I doing it wrong?
Jeff Moden

SSC Guru

Points: 1004704
More actions
July 25, 2010 at 1:08 pm

#1197960

hrc_public (7/25/2010)
.. glad I always try to avoid the use of subqueries in my applications.
Ummmm.... why? We just proved that the article is incorrect and that certain correctly formed subqueries can actually be faster than a classic inner join. Read the rest of the discussion above.
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
cmapowers

SSC Veteran

Points: 209
More actions
July 25, 2010 at 11:21 pm

#1198007

This seems to be a correlated sub-query. Does this apply to non correlated sub-query?

Viewing 15 posts - 31 through 45 (of 57 total)

You must be logged in to reply to this topic. Login to reply