December 24, 2011 at 8:05 pm
I'm working on a report. I need the details, and summary from one table basically. I'm using a self join query. Doesn't perform real well, but it certainly does what I need. Of course the redundant data is poor database design, but this is just a report. So the report is based on a "select" like the one at the end of the code. Is there a better way to do this sort of thing? Here's the code:
declare @transactions table (
TranId int not null primary key identity(1,1)
,TranCode varchar(4) not null
,TranDescription varchar(30) null
,TranAmount money
)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('100', 'REST', 100.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('100', 'REST', 100.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('100', 'REST', 100.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('100', 'REST', 100.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('200', 'BAR', 200.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('200', 'BAR', 120.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('200', 'BAR', 160.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('200', 'BAR', 9.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('200', 'BAR', 45.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('300', 'GIFT', 78.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('300', 'GIFT', 82.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('300', 'GIFT', 35.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('300', 'GIFT', 100.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('300', 'GIFT', 120.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('300', 'GIFT', 72.00)
insert into @transactions(TranCode, TranDescription, TranAmount)
values('300', 'GIFT', 66.00)
select t.TranCode, t.TranDescription, t.TranDescription, TranTotals.TranCodeTotal
from @transactions t
left join (
select TranCode, SUM(TranAmount) as TranCodeTotal
from @transactions
group by TranCode
) as TranTotals on t.TranCode = TranTotals.TranCode
order by TranCode
.
December 24, 2011 at 9:57 pm
You could use a SUM Window function as an alternative...
select
TranCode
,TranDescription
,TranDescription
,sum(TranAmount) over (partition by TranCode) as TranCodeTotal
from @transactions
order by TranCode
December 24, 2011 at 11:10 pm
Thanks for the alternative. Hmm, sure is cleaner syntax wise. Interesting thing about it is that if I understand the execution plan correctly, it takes twice as long. Rather shocking! I would expect it to be virtually identical.
When I look at the execution plan, the self join takes %4 of the total batch. The "window" version takes 8% of the batch. I'm not realy great at reading execution plans, but that seems like twice as long.
.
December 25, 2011 at 3:12 am
For this kind of analysis, you either have to look closely on the execution plan, or have a relevant amount of test data. If you look at the execution plan, you'll see that the self join has two clustered index scans against your table. While this may be more efficient with a few rows, it is definitely not the most efficient when you get millions or billions of rows. So, for larger amounts of data, I do believe that the windowing approach is more efficient.
December 25, 2011 at 6:00 pm
Makes good sense. I'll have to study the plan. I would think the window and the self join end up with the same plan. Probably a good lesson to be learned here.
Thanks and Happy Holidays.
.
December 26, 2011 at 3:04 pm
Don't be fooled by the % of total batch numbers in the execution plan. 8% of a batch that runs in 20 ms is much better than 4% of a batch that runs in 500 ms. You have to play these numbers against CPU and logical reads.
You should play the execution plan against the numbers you see from SET STATISTICS IO, TIME ON.
Todd Fifield
December 27, 2011 at 7:50 am
I agree with Todd, the cost percentages by themselves aren't that useful. You'll have to compare IO, cpu, and duration between the two queries with varying amounts of data to see which is the more efficient for your purposes.
December 27, 2011 at 7:55 am
Costs in execution plans are, by and large, junk data. Use the I/O and time stats instead.
Another option would be to Outer Apply the sub-query, but that's likely to get the same execution plan as the Outer Join.
Are you, in the real query, running on a table variable, or on a "real" table? That'll have a huge impact on the execution plan and the overall speed, in most cases. (Table variables reduce the "cost", but usually actually take more work and time to run. One of the reasons "costs" are largely a junk value.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
December 27, 2011 at 7:59 pm
Yea, I've come to that conclusion. I was trying to use that stat for a simple shortcut in this case. Should have known better!
Thanks all!
.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply