Count across multiple tables with left outer join

Question

Post reply

Count across multiple tables with left outer join

Kiara

SSCarpal Tunnel

Points: 4185
More actions
August 12, 2011 at 8:04 am

#246192

I inherited code that looks like the following. According to my colleagues, this used to work in a SQL 2000 database, but it doesn't work in SQL 2008. I don't have a SQL 2000 database to run this against, so I can't test to see if it ever actually gave the intended results. I feel like I'm missing something obvious here...
Note that the below code is *much* simpler than the actual - the actual code is checking for the existence and number of transactions across about 15 tables that have a couple of million rows each.
create table #projectList(
projectID char(1)
)
insert into #projectList(projectID)
(select 'A'
union all
select 'B'
union all
select 'C'
)
create table #APData(
projectID char(1),
tranAmt float
)
insert into #APData(projectID, tranAmt)
(select 'A',101.01
union all
select 'B', 202.02
union all
select 'C', 303.03
union all
select 'C', 909.09
)
create table #ARData(
projectID char(1),
tranAmt float
)
insert into #ARData(projectID, tranAmt)
(select 'A',44.44
union all
select 'A',55.55
union all
select 'B',66.66
union all
select 'B',77.77
union all
select 'B',88.88
)
/*
the existing code that doesn't return data the way the programmer expected it to
*/
select p.projectid, apCount = COUNT(ap.projectid), arCount = COUNT(ar.projectid)
from #projectList p
left outer join #APData ap
on p.projectID = ap.projectID
left outer join #ARData ar
on p.projectID = ar.projectID
group by p.projectID
-- this shows the expected results
select p.projectid,
apCount = (select COUNT(ap.projectid) from #APData ap where ap.projectID = p.projectID),
arCount = (select COUNT(ar.projectid) from #ARData ar where ar.projectID = p.projectID)
from #projectList p
drop table #projectList
drop table #APData
drop table #ARData
Thanks in advance,
-Ki

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Daryl AZ Hall of Fame Points: 3087 More actions · Answer 1

I think somebody has a faulty memory. When I run against Sql2000 and sql2008 I get the same results from your sample.

projectidapCountarCount

A22

B33

C20

projectidapCountarCount

A12

B13

C20

Note: I did run both from Sql2008 SSMS but the functionality is at the server.

sql-programmers SSC-Addicted Points: 494 More actions · Answer 2

Hi Kiara,

Please check the below modified code.It shows your expected result.

select p.projectid,

apCount = COALESCE(countap,0)

, arCount = COALESCE(countar,0)

from #projectList p

left outer join (SELECT projectID,COUNT(*) AS countap FROM #APData GROUP BY projectID) AS ap

on p.projectID = ap.projectID

left outer JOIN (SELECT projectID,COUNT(*) AS countar FROM #ARData GROUP BY projectID) AS ar

on p.projectID = ar.projectID

SQL Sever Performance tuning

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 3

sql-programmers (8/13/2011)
Hi Kiara,
Please check the below modified code.It shows your expected result.
select p.projectid,
apCount = COALESCE(countap,0)
, arCount = COALESCE(countar,0)
from #projectList p
left outer join (SELECT projectID,COUNT(*) AS countap FROM #APData GROUP BY projectID) AS ap
on p.projectID = ap.projectID
left outer JOIN (SELECT projectID,COUNT(*) AS countar FROM #ARData GROUP BY projectID) AS ar
on p.projectID = ar.projectID

Thanks - and very true.

I was actually just trying to figure out if the original code *should* work - because I couldn't find any way to *get* it to.

Your suggestion works just fine, although I'd probably run with something more like this if I do this rewrite with derived tables:

select p.projectid,

apCount = coalesce(apc.apCount,0),

arCount = coalesce(arc.arCount,0)

from #projectList p

left outer join (select distinct ap.projectid, apCount = count(*) over (partition by ap.projectid) from #APData ap) apc

on p.projectid = apc.projectid

left outer join (select distinct ar.projectid, arCount = count(*) over (partition by ar.projectid) from #ARData ar) arc

on p.projectID = arc.projectid

Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet. I'm just more likely to use window functions when I can - but until I take a look at the execution plans, I'm not going to swear that they're faster than not using them. It just normally works out that way.

Given what this particular section of code is trying to accomplish, I may just gut the entire section of the program and rewrite it. When I originally saw this, my reaction was "oh, cool! I've never done x that way!" followed by "but it doesn't do what it claims to do..." 🙂

-Ki

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

Kiara (8/15/2011)

...Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet...

Have a look at the actual execution plans - the OVER() version is much more expensive.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 5

ChrisM@Work (8/15/2011)

Kiara (8/15/2011)
...Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet...
Have a look at the actual execution plans - the OVER() version is much more expensive.

LOL - of *course* I will! Against the actual data involved, not the temp tables I used for the example. When you're dealing with millions of rows per table, and checking for existence on multiple tables, there's no way on earth I'm making a production decision based on a quick reply to a post that I've only tested far enough to make sure I didn't make a typo in the code.

Now I'm curious, though, so once I get back to the actual problem this is related to, I'll post a version of what actually works best in the environment it has to play nicely in...

-Ki

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Kiara (8/15/2011)

...Now I'm curious, though, so once I get back to the actual problem this is related to, I'll post a version of what actually works best in the environment it has to play nicely in...

Cool. I've run the same code against a 5krow table and a 40mrow table, each generating the same plan for the same query. The plans for the two different queries however are wildly different, and on both of these tables, the GROUP BY is about 8 times faster than DISTINCT with OVER(). Each table has a non-unique clustered index on the aggregated column.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden