July 9, 2012 at 11:08 am
i wanted a single UPDATE query which result cumulative data in last column
Balance is the column, which need to adding up as cumulative one, i have stated one example below
Ex:
Table structure:
cust_no, cust_name, deposit, balance
Data:
C001, Jack, 100, 100
C002, Mick, 500, 600
C003, Rock, 200, 800
Please share your thoughts on it
July 9, 2012 at 11:13 am
July 9, 2012 at 11:14 am
it's not clear what you are asking...why would i want to update "mick"s balance with the information from "Jack"? for example? I'm sure that's not what you meant, but you didn't provide any good sample data to really show us what you wanted.
aside from that, one important rule is to never store the calculated/summary balance in the row data or in a static table;
create a view that calculates the totals on demand, something like
CREATE VIEW vw_Balances
As
SELECT
cust_no,
cust_name,
SUM(deposit) AS balance
FROM SomeTable
GROUP BY
cust_no,
cust_name
Lowell
July 9, 2012 at 11:58 am
aside from that, one important rule is to never store the calculated/summary balance in the row data or in a static table;
Hi Lowell
I'm interested in your thoughts on this, why not store in a reporting table where you can have the complex calculations performed during an ETL stage overnight where there is no overhead in office hours rather than create a view and have the calculations performed 'on the fly'?
I've come across a few DB's where running totals have been required and rather than perform these in hours we have had the running totals calculated via the ETL process.
As I said just curious as always interested in other opinions / methods 🙂
Cheers
Andy
==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
July 9, 2012 at 12:06 pm
I am actually against Lowell's thoughts in many cases. Specifically, generating balances for a customer on a monthly or weekly basis. We then store those balances along with the invoice date. Of course, if I need to know the customer's balance as of right now, the stored data is of no use and could have been updated within the hour. So a current view would need to do the calculation on the fly. I would certainly not say "NEVER" store sums or running totals. It really depends on what the use is.
Another example was a very large report sent out to the Board of Directors. To calculate this data on the fly would take way too long, so it was done on a nightly basis and could be viewed the next day. The datawas stored in such a way that they could look at a report from 2 days ago and compare it with yesterday's. Again, in this case the Board did not care about today's numbers as the day was not yet over. So it did not matter whether the data was current for the day, only for the previous day. The requirement was also that yesterday's report would not change, no matter when they requested it.
Jared
CE - Microsoft
July 9, 2012 at 12:13 pm
I've also had to pre-calculate data like running totals, running averages, and so on, in data warehouses populated by scheduled ETL processes. After all, that's a large part of the purpose of data warehousing, is to pre-calculate values so that you don't have to do complex calculations at runtime.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 9, 2012 at 12:21 pm
Well Andy this is all one man's opinion, although it is built from experience:
my basic premise: a static table that has rolled up totals is never as accurate as the raw data itself. if a single value changes in the details, the static rollup data is inaccurate.
At least in my circle of the data world, i have multiple clients using the same application. For any one given client, the typical database and load are rather light; never more than 200 users, 10% of them(maybe) are heavy reporting users. I don't have processes that run so long that they have to scheduled and batched; my biggest tables are just a few million rows or so. database size is not more than 10 gig, typically, and often closer to one gig.
mostly statistics and totals, and funding/expenses, if i had to label the data.
The application i took over did tons of RBAR logic, and stored summary data in static tables;
a "batch" kind of process, or triggers, or vb6 code embedded in an application, would in update data in those summary tables to account for any changes in the detail/row level data.
because it was all rather disconnected, you could run different reports and they would not return the same totals,when they were in theory against the same data.
My goal in life fueled my desire to be RBAR and trigger free. with that done, i could concentrate on performance and indexing.
I started adding Views to automatically calculated totals, eliminating any need for the RBAR code to update tables, and get rid of triggers as well; such a simple solution to a complex problem.
In your example,where huge ETL processes occur, and iit's probably jsut as easy to dump summary data into tables as part of the ETL, i'd fall back on the old "it depends".
if the data changes AT ALL between ETL's, i'd insist on views to perform any calculations for totals for reports.
if your reports only report against the ETL results, i cannot say anything bad about it: As long the data is accurate and performas well, that's good .
Lowell
July 9, 2012 at 12:30 pm
i should also mention my reports are mostly agaisnt hte OLAP , with minimal data warehousing; I agree witht eh other posts, it makes sense in a data warehouse scenario to create static tables for reporting agaisnt;
my thoughts for the OP were more towards a dynamically changing invoice/account balance scenario.
Lowell
July 9, 2012 at 12:33 pm
I've had hybrid solutions for that kind of thing.
Read today's data and perform on-the-fly calculations (running totals, running averages, et al) on that.
ETL data prior to today and store pre-calculated values in there.
Use "UNION ALL" operators to put both into reports that need it.
On-the-fly totals work in the kind of lightweight environments you're talking about, Lowell. Data warehousing is generally meant to either simplify really complex data models, or to reduce the overhead of really large computations.
In most transactional environments, already-committed transactions can't change. A deposit made 1 second ago can't be edited, but a new transaction with a correction in it can be added. So, pre-calculating on data older than some arbitrary threshhold (best determined by actual business and computational needs but usually picked out of a hat by management) simply makes it so you don't have to run the same calculations over and over and over again.
Done correctly, you don't end up with non-deterministic queries on that kind of thing. Pre-calculating, done correctly, actually tends towards deterministic functionality.
With the kind of live-as-go precalculation you're writing about (triggers, VB objects, et al), the most common problem is lack of ACID control on the data. The same people who want to do that don't want to hold locks on rows used in that, and allow asynchronous issues to break ACIDity, or even get so stupid about it that they use NoLock hints on rows used in the computations! I've seen that kind of thing, and seen "running totals" resulting from it that a 6 year old could tell were wrong, but the devs insisted "they have to be right".
Control for ACIDity in the computations, you generally end up with the right data.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 9, 2012 at 12:37 pm
Agreed all around. So, when we needed to look up a balance on a current customer we could do that calculation on a small number of records from just that customer. However, at the end of month we had a lot of calculations to do. We don't care what the balance was after midnight on the last day of the month, we only care what it was at the end of the month. Again, we don't ever want that number to change if we reference that date. So in this sense, it makes sense to store an EOM or EOY balance as calculated.
Jared
CE - Microsoft
July 9, 2012 at 3:06 pm
Jared:
Agreed all around. So, when we needed to look up a balance on a current customer we could do that calculation on a small number of records from just that customer. However, at the end of month we had a lot of calculations to do. We don't care what the balance was after midnight on the last day of the month, we only care what it was at the end of the month. Again, we don't ever want that number to change if we reference that date. So in this sense, it makes sense to store an EOM or EOY balance as calculated.
Agreed, if you want to store EOD, EOW, EOM or EOY calculations at 'Point in Time' or even Summary's to show the movement between end of month, year, week or day then ETL's and pre calculations are the way to go. This is the only way to achieve the goal of data warehousing.. to quote Kimball 'The One Version of The Truth'
GSquared:
Use "UNION ALL" operators to put both into reports that need it.
I agree with this to a point, ask 'The Business' the question, 'How frequently do you want your reports refreshed'? inevitably the answer will be real time which is, in the real world not reasonable (for larger datasets anyway, agreed for a small transactional system maybe not so much of a problem) but the businesses expectations should be set accordingly (Unless they want to throw a ton of resources into your box(s))
Lowell:
In your example,where huge ETL processes occur, and iit's probably jsut as easy to dump summary data into tables as part of the ETL, i'd fall back on the old "it depends".
if the data changes AT ALL between ETL's, i'd insist on views to perform any calculations for totals for reports.
I would have to say it depends on if real-time is actually required again this can be a tricky beast..
Really I think that there are two different approaches being discussed (a) a hybrid of real time reporting which is a more BAU / daily operational reporting using a combination of ETL's, Jobs and Views and (b) the Business reporting (EOM ect) if you wish to look at the same data a week or month later or to compare movement between those months (using ETL's and Jobs)
I have always found that using a middle ground i.e. implementing a fast, stable incremental loading process for example hourly for the BAU reporting (after setting expectations) and daily / weekly process for the business reporting (in addition to performing the ETL calculations) has worked for myself and the businesses.
I thank you all for your insights, as I said always good to listen to other peoples views and have a discussion 🙂
==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
July 9, 2012 at 8:07 pm
In 2008 or less, I'd REALLY like to know how to create a view that does a running total without forming a Triangular Join. I don't believe it's possible but sure am willing to learn a new trick.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2012 at 8:45 pm
Jeff Moden (7/9/2012)
In 2008 or less, I'd REALLY like to know how to create a view that does a running total without forming a Triangular Join. I don't believe it's possible but sure am willing to learn a new trick.
And there you go bringing up the actual topic of the OP's question... 🙂
Jared
CE - Microsoft
July 9, 2012 at 9:13 pm
pls try below code.
declare @test-2 table (id varchar(10),name varchar(10),sal int,cum_sal int)
insert into @test-2(id,name,sal)
select 'C001','Jack',100
union
select 'C002','Mick', 500
union
select 'C003','Rock', 200
--update stmt
update t1 set cum_sal=(select SUM(t.sal) from @test-2 t where t.id<=t1.id)
from @test-2 t1
select * from @test-2
July 9, 2012 at 9:16 pm
pls try below code declare @test-2 table (id varchar(10),name varchar(10),sal int,cum_sal int)
insert into @test-2(id,name,sal)
select 'C001','Jack',100
union
select 'C002','Mick', 500
union
select 'C003','Rock', 200
--update stmt
update t1 set cum_sal=(select SUM(t.sal) from @test-2 t where t.id<=t1.id)
from @test-2 t1
select * from @test-2
Viewing 15 posts - 1 through 15 (of 43 total)
You must be logged in to reply to this topic. Login to reply