February 9, 2008 at 5:30 am
Guys,
I have following scenario where I have to calculate balance for each unique code
CODE PAIDDUEBALANCE
___________________________________
AA1040
AA1040
AA1040
BB80100
BB10100
In the above example for CODE 'AA' balance should read 30, 20, 10 and for code 'BB' balance should 20, 1o.
Is there any way using TSQL code to accomplish this??
Any suggestions and inputs would help
Thanks
February 9, 2008 at 5:53 am
Hi,
Try this
--separate into two tables. The total due, from what I can see, is constant. If it's not constant then it should at
--least still be in a separate table as it remains constant across multiple transactions.
declare @Trans table( transNum int identity(1,1) primary key clustered, Code char(2), Paid money )
declare @Owing table( Code char(2) primary key clustered, Due money )
insert into @Trans( code, paid )
select 'AA', 10
UNION ALL
select 'AA', 10
UNION ALL
select 'AA', 10
UNION ALL
select 'BB', 80
UNION ALL
select 'BB', 10
insert into @Owing ( Code, Due )
select 'AA', 40
UNION ALL
select 'BB', 100
--running total. For max efficiency you should look up Jeff Moden's recent article on the subject and also look up triangular joins and RBAR
select
Trans.transNum, Trans.Code, Paid,
Owing.Due - ( select sum(Paid) as sumPaid
from @Trans TransRBAR
where TransRBAR.transNum <= Trans.transNum
and TransRBAR.code = Trans.code
) as Balance
from @Trans Trans
inner join @Owing Owing
on Trans.Code = Owing.Code
order by Trans.transNum, Trans.Code
Note that I've separated the data into two tables. Does the DUE column's value change over time? If so, or even if it doesn't, I think you should have it in a separate table.
Also, you'll need a column to imply an ordering to the transactions - I've added a TransNum column for that purpose. In a banking scenario you'll probably have a similar column as well as a datetime.
The above code could be more efficient - look up Jeff Moden's article on running totals on this site. If you don't have too many rows it should be ok. The other alternative is to actually calculate and store the balance as each transaction's entered. This way you needn't calculate anything and you wouldn't accumulate rounding errors, etc (I've not written financial stuff but that's something I'd worry about if interest, etc is involved).
February 9, 2008 at 10:23 am
AM,
Ian is absolutely correct... for very small groups of rows by account, the example he gave will work just fine albeit a little slower than you may expect (just as Ian said). The reason why it does run horribly slow on larger groups of rows is because of a thing called a "Triangular Join" which is half as bad as a full "Square" or Cross-Join. Here's why Triangular Joins are so bad...
http://www.sqlservercentral.com/articles/T-SQL/61539/
As Ian also suggested, if you do have accounts with large numbers of rows, you can calculate the new balance everytime a new row is added. You do have to be a bit careful to make sure the code is perfect so the balance never gets out of sync.
Or, (again, as Ian suggested) you can recalculate the balance either for a given account or the whole shootin' match using the code that Ian pointed out at the following URL...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
... it'll do a grouped running balance across a million rows, regardless of the number of rows per account, in about 7-10 seconds.
Ian, thanks for the honorable mention!
--Jeff Moden
Change is inevitable... Change for the better is not.
February 11, 2008 at 8:38 am
i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?
create table #z
( seq int not null, code char(2), paid int, due int )
go
insert into #z values (1,'AA',10,40)
insert into #z values (2,'AA',10,40)
insert into #z values (3,'BB',80,100)
insert into #z values (4,'AA',10,40)
insert into #z values (7,'BB',10,100)
go
select Z1.seq, Z1.code, min(Z1.paid) as paid,
min(Z1.due) as due, min(Z1.due) - sum(Z2.paid) as balance
from #z as Z1 join #z as Z2
on Z1.code = Z2.code and Z2.seq <= Z1.seq
group by Z1.seq, Z1.code
go
result:
1AA104030
2AA104020
4AA104010
3BB8010020
7BB1010010
February 11, 2008 at 9:06 am
antonio.collins (2/11/2008)
i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?
Sure it can, and as a matter of fact - that's the "recommended way to do it". The problem is - this turns into a real dog of a query, and will have the tendency to blow out your tempDB, steal all of the RAM, etc... when you try to run this against something production size (because of the cardinality of the dataset you have to process). It you had a 10,000 row table, you'd need to process (through the sub-query) 10000*10001/2 rows, or 50Million rows (give or take 50,000 rows).
Read up on Jeff's articles - it's a good read and a compelling argument (IMO).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 11, 2008 at 9:21 am
The problem is - this turns into a real dog of a query, and will have the tendency to blow out your tempDB, steal all of the RAM, etc... when you try to run this against something production size (because of the cardinality of the dataset you have to process). It you had a 10,000 row table, you'd need to process (through the sub-query) 10000*10001/2 rows, or 50Million rows (give or take 50,000 rows).
What subquery are you referring to? With the self join on 'code', you're only dealing with the average number of payments per 'code' so the total impact would be more like 250K-500K (assuming 50-100 transactions per code) rows.
February 11, 2008 at 9:36 am
I should have worded that as a 10,000 row GROUP. Making an assumption about the size of your groups is really dangerous. All you would need is one really large group and your logical row count goes through the roof.
Even if your data is very even - you might find that the overhead of creating a temp table, and performing the running count is well worth it since it has a linear growth factor (compared to a second order polynomial growth factor). When you get to bigger sets, this can add up quickly (turning into queries running for hours).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 11, 2008 at 4:22 pm
antonio.collins (2/11/2008)
i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?
Yep... and that "simple self join" is classified as a "Triangular Join"... doesn't need to be in a correlated subquery to be a "Triangular Join".
For groups with very small rowcounts, the "simple self join" you posted is quite effective, as Matt and I have both suggested... try it with groups of 10000 or 20000 rows are present...
As Matt also suggested, go take a look at the articles on Triangular Joins and the way to get around those for running totals and running counts. And, they're not mere speculation... there's Proof of Principle code in the articles that you can run. Here's the URL's, again...
http://www.sqlservercentral.com/articles/T-SQL/61539/
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
February 12, 2008 at 8:03 am
this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.
also, knowing the type and frequency of data that you are dealing with does influence your algorithms and even your database design. so if the original poster can determine that there are only a reasonable number of transactions per 'code' (even a credit card processor only has to deal with at most a couple hundred transactions per month), this simple approach will be efficient and easier for a less-than-expert sql developer to understand, implement, and maintain.
February 12, 2008 at 9:58 am
antonio.collins (2/12/2008)
this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.
Heh... Exactly what Matt and I've both said... 😉
Even with groups as small as, say, 4 rows per account number, you're still looking at touching (4*4+4)/2 or 10 rows per account internally (instead of the 4 that are there) for a "<=" relationship. It's still going to be slower, but it will be tolerable.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 12, 2008 at 10:49 am
antonio.collins (2/12/2008)
this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.also, knowing the type and frequency of data that you are dealing with does influence your algorithms and even your database design. so if the original poster can determine that there are only a reasonable number of transactions per 'code' (even a credit card processor only has to deal with at most a couple hundred transactions per month), this simple approach will be efficient and easier for a less-than-expert sql developer to understand, implement, and maintain.
Knowing the rough patterns in your data is great, but in this case averages don't do you any favors. Just one outlier will wreck your day, and what was "efficient" is now carving out large new chunks of real estate in your tempdb. It really doesn't take all that much (I've seen some stores do 5 times their typical transaction volume, say right around christmas).
But again - if your data is always going to be small (like - really small), yes - this works fine I suppose.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 12, 2008 at 5:36 pm
Heh... correct... there's always that "one time" that justifies the cold water you have running through the garden hose to baseplate of the CPU in your server 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply