August 9, 2009 at 8:12 pm
Hello, I'm looking for a way to carry a continuous value across rows to calculate a weighted average.
create table t1
(
record_id int identity(1,1),
input_number int,
weight decimal(2,2),
weighted_average decimal(5,2)
)
INSERT INTO t1 (input_number, weight, weighted_average) VALUES (0,1,20)
INSERT INTO t1 (input_number, weight) VALUES (13,.25)
INSERT INTO t1 (input_number, weight) VALUES (18,.1)
INSERT INTO t1 (input_number, weight) VALUES (51,.18)
INSERT INTO t1 (input_number, weight) VALUES (23,.2)
INSERT INTO t1 (input_number, weight) VALUES (21,.23)
I need to fill in weighted_average for the formula = prev(weighted_average) + (weight*( current(input_number) - prev(weighted_average))). Any ideas on how to do this? Recursion CTE is not an option.
August 9, 2009 at 8:26 pm
Gabriel P (8/9/2009)
Hello, I'm looking for a way to carry a continuous value across rows to calculate a weighted average.
create table t1
(
record_id int identity(1,1),
input_number int,
weight decimal(2,2),
weighted_average decimal(5,2)
)
INSERT INTO t1 (input_number, weight, weighted_average) VALUES (0,1,20)
INSERT INTO t1 (input_number, weight) VALUES (13,.25)
INSERT INTO t1 (input_number, weight) VALUES (18,.1)
INSERT INTO t1 (input_number, weight) VALUES (51,.18)
INSERT INTO t1 (input_number, weight) VALUES (23,.2)
INSERT INTO t1 (input_number, weight) VALUES (21,.23)
I need to fill in weighted_average for the formula = prev(weighted_average) + (weight*( current(input_number) - prev(weighted_average))). Any ideas on how to do this? Recursion CTE is not an option.
Now, how about expected results based on the sample data provided? It would help to have something to check results against.
August 9, 2009 at 8:34 pm
Sure
input_number,weight,weighted_average
00,1.00,20.00
13,0.25,18.25
18,0.10,18.22
51,0.18,24.12
23,0.20,23.90
21,0.23,23.23
August 9, 2009 at 8:42 pm
Okay, I'm close. I think the difference I see in my nummay be due to rounding because of the small size of the decimal fields. The following code is based of the same method of computing running totals using the "quirky update" as Jeff Moden calls it. here is my code:
create table dbo.t1
(
record_id int identity(1,1),
input_number int,
weight decimal(4,2),
weighted_average decimal(5,2)
);
create clustered index IX_WeightedAvg on dbo.t1 (
record_id asc
);
INSERT INTO dbo.t1 (input_number, weight, weighted_average) VALUES (0,1,20)
INSERT INTO dbo.t1 (input_number, weight) VALUES (13,.25)
INSERT INTO dbo.t1 (input_number, weight) VALUES (18,.1)
INSERT INTO dbo.t1 (input_number, weight) VALUES (51,.18)
INSERT INTO dbo.t1 (input_number, weight) VALUES (23,.2)
INSERT INTO dbo.t1 (input_number, weight) VALUES (21,.23);
select * from dbo.t1;
declare @PrevWeightedAvg decimal(5,2);
set @PrevWeightedAvg = null;
update dbo.t1 set
@PrevWeightedAvg = weighted_average = case when @PrevWeightedAvg is null
then t.weighted_average
else @PrevWeightedAvg + (t.weight * (t.input_number - @PrevWeightedAvg))
end
from
dbo.t1 t with (index(IX_WeightedAvg)) ;
select * from dbo.t1;
drop table dbo.t1;
Please review and ask questions.
August 9, 2009 at 8:46 pm
Jeff is always ahead of the game. Thank you Lynn. Numbers being off is the least of my concerns - I just needed to see a way to carry numbers. I would've never thought to use update with variables.
Thanks - I'll let you know how it turns out.
August 10, 2009 at 4:20 am
I did it slightly differently and realized that I have a lot to learn as far as sql coding is concerned. But I just wanted to present another way of looking at this problem
create table dbo.t1
(
record_id int identity(1,1),
input_number int,
weight decimal(3,2),
weighted_average decimal(5,2)
);
INSERT INTO dbo.t1 (input_number, weight, weighted_average) VALUES (0,1,20)
INSERT INTO dbo.t1 (input_number, weight) VALUES (13,.25)
INSERT INTO dbo.t1 (input_number, weight) VALUES (18,.1)
INSERT INTO dbo.t1 (input_number, weight) VALUES (51,.18)
INSERT INTO dbo.t1 (input_number, weight) VALUES (23,.2)
INSERT INTO dbo.t1 (input_number, weight) VALUES (21,.23);
select *, convert(decimal(5,2), 0.0) as prev_weighted_avg into dbo.t2
from dbo.t1
Declare @I int
Set @I = 1
While @I<7
Begin
update dbo.t2
set prev_weighted_avg = (select weighted_average from dbo.t2 b
where b.record_id = dbo.t2.record_id - 1 and dbo.t2.record_id 1)
update dbo.t2
set weighted_average = prev_weighted_avg + weight*(input_number - prev_weighted_avg)
where record_id 1
Set @I = @I+1
End
select * from dbo.t2
Drop table dbo.t1;
Drop table dbo.t2;
Saurabh Dwivedy
___________________________________________________________
My Blog: http://tinyurl.com/dwivedys
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537
August 10, 2009 at 10:48 am
The only problem with the WHILE LOOP solution is scalability. Test it against a one million row table and you'll see what I'm mean.
August 10, 2009 at 11:46 am
I'm just curious - you're calling that a "weighted average", but your calculation isn't anywhere near what is mathematically called a weighted average (or a running average, or really any average I can figure).
What are you looking at doing? It might help figure out the best way to approach. It would be a shame to get a spiffy new set of code to spit out bad results at the speed of light.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
August 10, 2009 at 12:30 pm
Fixing the math should be the easy part. I didn't really look at what the calculation was doing, I just wanted to be sure I returned results close (and I was close, off in the hundreths position) to what the OP expected.
August 10, 2009 at 1:35 pm
Lynn Pettis (8/10/2009)
Fixing the math should be the easy part. I didn't really look at what the calculation was doing, I just wanted to be sure I returned results close (and I was close, off in the hundreths position) to what the OP expected.
Oh agreed. I just have a feeling the OP needs to carry TWO values to get what he wants, once the math shakes out of course.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
August 10, 2009 at 1:55 pm
True, but we work with what we are given, wether here on SSC or at work. 😉
August 10, 2009 at 6:52 pm
Gabriel P (8/9/2009)
Jeff is always ahead of the game.
Heh... thanks for the wonderful compliment but that's not always true. Lynn will tell you that I'm way behind on the rewrite of the article for that method and he's right.
It's good to see Matt Miller and Lynn Pettis both on this thread. Matt worked with me on a couple of posts concerning this method and discovered that you need an "anchor" column to guarantee that this method will work in 2005 and above. Lynn busted a hump on a great article that shows that the method really doesn't like partitioning as most people do it. Several other caveats have also surfaced over time like what happens when parallelism occurs.
With all of that in mind and NOT withstanding any mathematical errors ;-), you must make the following marked additons to the code you posted in order to ensure the code will run without error... again, I did not make any changes for possible math errors... you need to bang that out...
declare @PrevWeightedAvg decimal(5,2),
@PrevRecord_ID INT; --Add this
set @PrevWeightedAvg = null;
update dbo.t1 set
@PrevWeightedAvg = weighted_average = case when @PrevWeightedAvg is null
then t.weighted_average
else @PrevWeightedAvg + (t.weight * (t.input_number - @PrevWeightedAvg))
end,
@PrevRecord_ID = Record_ID --Add this
from
dbo.t1 t with (index(IX_WeightedAvg),TABLOCKX) --Add the TABLOCKX
OPTION (MAXDOP 1) --Add this
;
The reason for the @PrevRecord_ID is that Matt found that SQL Server 2005 will sometimes lose it's mind unless there's a two part equality in the update. I've never been able to duplicate that problem but, as with folks like Lynn and Matt go, I pretty trust them when they say they've identified a potential problem. Since it costs virtually nothing to be abolutely safe in this area, I've included it as a "gotta-have-it" requirement for the method.
Lynn didn't say it out loud (he implied it with the addition of the clustered index in his code), but the ONLY way this thing works is if the clustered index is in the same order you want the updates in. During the rewrite I'm doing, I've found that the table hint of Index(clusteredindexname) or Index(0) is actually superfluous and does nothing but give people the nice warm fuzzies and it actually slows the code down, a lot. You can remove that hint (or keep it for the warm fuzzies) but add the TABLOCKX hint.
The TABLOCKX hint is there because you don't want anyone updating, inserting, or deleting any rows while this baby is in action. Rather than wait for the natural escalations from row locks to page locks to extent locks to a full table lock, just go for the full table lock and be safe. TABLOCKX will do that for you.
Last but not least, although parallelism doesn't normally kick in on these things, it would be absolutely devastating to the results if it did. The OPTION (MAXDOP 1) absolutely prevents that eventuality from happening.
There are other rules but that covers the gambit on most single table updates. Any questions on this?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 10, 2009 at 7:22 pm
saurabh.dwivedy (8/10/2009)
I did it slightly differently and realized that I have a lot to learn as far as sql coding is concerned. But I just wanted to present another way of looking at this problem
It takes a strong and wise person to say that. Well done.
Differing opinions and different code are always welcome! Even though none have been able to break the code if they follow the rules, many other MVP's don't trust the "quirky update" method and others outright condemn it. When folks feel that way and they don't want to take the time to write a verification routine (all included in the rewrite I'm almost done with), then I tell them to use the "other" procedural methods such as While Loops. Of course, I also warn them of the impact on performance that will have.
As a side bar, if you think about it, Cursors and While Loops are almost infinitely scalable even if they do have some performance problems. Unless a triangular join is involved and if they are well formed, they will always operate in a totally linear and predictable fashion and they almost never reach the "tipping point" that a large update (usually somewhere between 3 and 5 million rows on most systems) will. The "tipping point" is when the time to run an update goes ballistic because it has to start spooling or some other problem with resource limits. Every system has one.
Like I said, thank you for the alternate solution. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
August 10, 2009 at 8:30 pm
I am truly honored to hear the good words from you, Jeff. It's a big encouragement and kind of vindicates my decision of posting my code (I had debated whether posting it would add any value after Lynn's solution was out) despite its obvious inelegance.
Nonetheless I thought it wouldn't harm to present an alternative thought pattern!
I would like to know more about the 'quirky update' about which I have only heard so far. Is there some material already available on this site?
Regards
Saurabh Dwivedy
___________________________________________________________
My Blog: http://tinyurl.com/dwivedys
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537
August 10, 2009 at 9:24 pm
saurabh.dwivedy (8/10/2009)
I am truly honored to hear the good words from you, Jeff. It's a big encouragement and kind of vindicates my decision of posting my code (I had debated whether posting it would add any value after Lynn's solution was out) despite its obvious inelegance.Nonetheless I thought it wouldn't harm to present an alternative thought pattern!
I would like to know more about the 'quirky update' about which I have only heard so far. Is there some material already available on this site?
Regards
There used to be... I wrote an article on it and had to take it down... it needed some changes and I haven't quite finished the rewrite although I should be done with it any day now. Actually, I've had to split the article into two parts. The first part will explain what it is, how it works, and the rules to make it work correctly. It will also demonstrate how to do running totals, grouped running totals, and for SQL Server 2k, how to do running counts (Like ROW_NUMBER/RANK) and grouped running counts (Like partioned ROW_NUMBER/RANK). The second part will explore some of the many other uses that many of us have used the "quirky update" for in a lot of "carry previous row information forward" problems.
To wet the appetite a bit more, all 4 "running problems" are solved in a single update on a million rows in about 7 seconds. Verification code will be included for the faint of heart and some of my nay-saying fellow MVP's and others. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply