August 12, 2011 at 8:04 am
I inherited code that looks like the following. According to my colleagues, this used to work in a SQL 2000 database, but it doesn't work in SQL 2008. I don't have a SQL 2000 database to run this against, so I can't test to see if it ever actually gave the intended results. I feel like I'm missing something obvious here...
Note that the below code is *much* simpler than the actual - the actual code is checking for the existence and number of transactions across about 15 tables that have a couple of million rows each.
create table #projectList(
projectID char(1)
)
insert into #projectList(projectID)
(select 'A'
union all
select 'B'
union all
select 'C'
)
create table #APData(
projectID char(1),
tranAmt float
)
insert into #APData(projectID, tranAmt)
(select 'A',101.01
union all
select 'B', 202.02
union all
select 'C', 303.03
union all
select 'C', 909.09
)
create table #ARData(
projectID char(1),
tranAmt float
)
insert into #ARData(projectID, tranAmt)
(select 'A',44.44
union all
select 'A',55.55
union all
select 'B',66.66
union all
select 'B',77.77
union all
select 'B',88.88
)
/*
the existing code that doesn't return data the way the programmer expected it to
*/
select p.projectid, apCount = COUNT(ap.projectid), arCount = COUNT(ar.projectid)
from #projectList p
left outer join #APData ap
on p.projectID = ap.projectID
left outer join #ARData ar
on p.projectID = ar.projectID
group by p.projectID
-- this shows the expected results
select p.projectid,
apCount = (select COUNT(ap.projectid) from #APData ap where ap.projectID = p.projectID),
arCount = (select COUNT(ar.projectid) from #ARData ar where ar.projectID = p.projectID)
from #projectList p
drop table #projectList
drop table #APData
drop table #ARData
Thanks in advance,
-Ki
August 12, 2011 at 9:34 am
I think somebody has a faulty memory. When I run against Sql2000 and sql2008 I get the same results from your sample.
projectidapCountarCount
A22
B33
C20
projectidapCountarCount
A12
B13
C20
Note: I did run both from Sql2008 SSMS but the functionality is at the server.
August 13, 2011 at 12:04 am
Hi Kiara,
Please check the below modified code.It shows your expected result.
select p.projectid,
apCount = COALESCE(countap,0)
, arCount = COALESCE(countar,0)
from #projectList p
left outer join (SELECT projectID,COUNT(*) AS countap FROM #APData GROUP BY projectID) AS ap
on p.projectID = ap.projectID
left outer JOIN (SELECT projectID,COUNT(*) AS countar FROM #ARData GROUP BY projectID) AS ar
on p.projectID = ar.projectID
August 15, 2011 at 5:30 am
sql-programmers (8/13/2011)
Hi Kiara,Please check the below modified code.It shows your expected result.
select p.projectid,
apCount = COALESCE(countap,0)
, arCount = COALESCE(countar,0)
from #projectList p
left outer join (SELECT projectID,COUNT(*) AS countap FROM #APData GROUP BY projectID) AS ap
on p.projectID = ap.projectID
left outer JOIN (SELECT projectID,COUNT(*) AS countar FROM #ARData GROUP BY projectID) AS ar
on p.projectID = ar.projectID
Thanks - and very true.
I was actually just trying to figure out if the original code *should* work - because I couldn't find any way to *get* it to.
Your suggestion works just fine, although I'd probably run with something more like this if I do this rewrite with derived tables:
select p.projectid,
apCount = coalesce(apc.apCount,0),
arCount = coalesce(arc.arCount,0)
from #projectList p
left outer join (select distinct ap.projectid, apCount = count(*) over (partition by ap.projectid) from #APData ap) apc
on p.projectid = apc.projectid
left outer join (select distinct ar.projectid, arCount = count(*) over (partition by ar.projectid) from #ARData ar) arc
on p.projectID = arc.projectid
Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet. I'm just more likely to use window functions when I can - but until I take a look at the execution plans, I'm not going to swear that they're faster than not using them. It just normally works out that way.
Given what this particular section of code is trying to accomplish, I may just gut the entire section of the program and rewrite it. When I originally saw this, my reaction was "oh, cool! I've never done x that way!" followed by "but it doesn't do what it claims to do..." 🙂
-Ki
August 15, 2011 at 5:50 am
Kiara (8/15/2011)
...Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet...
Have a look at the actual execution plans - the OVER() version is much more expensive.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 15, 2011 at 5:56 am
ChrisM@Work (8/15/2011)
Kiara (8/15/2011)
...Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet...
Have a look at the actual execution plans - the OVER() version is much more expensive.
LOL - of *course* I will! Against the actual data involved, not the temp tables I used for the example. When you're dealing with millions of rows per table, and checking for existence on multiple tables, there's no way on earth I'm making a production decision based on a quick reply to a post that I've only tested far enough to make sure I didn't make a typo in the code.
Now I'm curious, though, so once I get back to the actual problem this is related to, I'll post a version of what actually works best in the environment it has to play nicely in...
-Ki
August 15, 2011 at 6:09 am
Kiara (8/15/2011)
...Now I'm curious, though, so once I get back to the actual problem this is related to, I'll post a version of what actually works best in the environment it has to play nicely in...
Cool. I've run the same code against a 5krow table and a 40mrow table, each generating the same plan for the same query. The plans for the two different queries however are wildly different, and on both of these tables, the GROUP BY is about 8 times faster than DISTINCT with OVER(). Each table has a non-unique clustered index on the aggregated column.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply