TSQL: Computing Balance

Question

Post reply

TSQL: Computing Balance

am-244616

SSCarpal Tunnel

Points: 4936
More actions
February 9, 2008 at 5:30 am

#186490

Guys,
I have following scenario where I have to calculate balance for each unique code
CODE PAIDDUEBALANCE
___________________________________
AA1040
AA1040
AA1040
BB80100
BB10100
In the above example for CODE 'AA' balance should read 30, 20, 10 and for code 'BB' balance should 20, 1o.
Is there any way using TSQL code to accomplish this??
Any suggestions and inputs would help
Thanks

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Ian Yates SSCoach Points: 19738 More actions · Answer 1

Hi,

Try this

--separate into two tables. The total due, from what I can see, is constant. If it's not constant then it should at

--least still be in a separate table as it remains constant across multiple transactions.

declare @Trans table( transNum int identity(1,1) primary key clustered, Code char(2), Paid money )

declare @Owing table( Code char(2) primary key clustered, Due money )

insert into @Trans( code, paid )

select 'AA', 10

UNION ALL

select 'AA', 10

UNION ALL

select 'AA', 10

UNION ALL

select 'BB', 80

UNION ALL

select 'BB', 10

insert into @Owing ( Code, Due )

select 'AA', 40

UNION ALL

select 'BB', 100

--running total. For max efficiency you should look up Jeff Moden's recent article on the subject and also look up triangular joins and RBAR

select

Trans.transNum, Trans.Code, Paid,

Owing.Due - ( select sum(Paid) as sumPaid

from @Trans TransRBAR

where TransRBAR.transNum <= Trans.transNum

and TransRBAR.code = Trans.code

) as Balance

from @Trans Trans

inner join @Owing Owing

on Trans.Code = Owing.Code

order by Trans.transNum, Trans.Code

Note that I've separated the data into two tables. Does the DUE column's value change over time? If so, or even if it doesn't, I think you should have it in a separate table.

Also, you'll need a column to imply an ordering to the transactions - I've added a TransNum column for that purpose. In a banking scenario you'll probably have a similar column as well as a datetime.

The above code could be more efficient - look up Jeff Moden's article on running totals on this site. If you don't have too many rows it should be ok. The other alternative is to actually calculate and store the balance as each transaction's entered. This way you needn't calculate anything and you wouldn't accumulate rounding errors, etc (I've not written financial stuff but that's something I'd worry about if interest, etc is involved).

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 2

AM,

Ian is absolutely correct... for very small groups of rows by account, the example he gave will work just fine albeit a little slower than you may expect (just as Ian said). The reason why it does run horribly slow on larger groups of rows is because of a thing called a "Triangular Join" which is half as bad as a full "Square" or Cross-Join. Here's why Triangular Joins are so bad...

http://www.sqlservercentral.com/articles/T-SQL/61539/

As Ian also suggested, if you do have accounts with large numbers of rows, you can calculate the new balance everytime a new row is added. You do have to be a bit careful to make sure the code is perfect so the balance never gets out of sync.

Or, (again, as Ian suggested) you can recalculate the balance either for a given account or the whole shootin' match using the code that Ian pointed out at the following URL...

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

... it'll do a grouped running balance across a million rows, regardless of the number of rows per account, in about 7-10 seconds.

Ian, thanks for the honorable mention!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 3

i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?

create table #z

( seq int not null, code char(2), paid int, due int )

go

insert into #z values (1,'AA',10,40)

insert into #z values (2,'AA',10,40)

insert into #z values (3,'BB',80,100)

insert into #z values (4,'AA',10,40)

insert into #z values (7,'BB',10,100)

go

select Z1.seq, Z1.code, min(Z1.paid) as paid,

min(Z1.due) as due, min(Z1.due) - sum(Z2.paid) as balance

from #z as Z1 join #z as Z2

on Z1.code = Z2.code and Z2.seq <= Z1.seq

group by Z1.seq, Z1.code

go

result:

1AA104030

2AA104020

4AA104010

3BB8010020

7BB1010010

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 4

antonio.collins (2/11/2008)
i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?

Sure it can, and as a matter of fact - that's the "recommended way to do it". The problem is - this turns into a real dog of a query, and will have the tendency to blow out your tempDB, steal all of the RAM, etc... when you try to run this against something production size (because of the cardinality of the dataset you have to process). It you had a 10,000 row table, you'd need to process (through the sub-query) 10000*10001/2 rows, or 50Million rows (give or take 50,000 rows).

Read up on Jeff's articles - it's a good read and a compelling argument (IMO).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 5

The problem is - this turns into a real dog of a query, and will have the tendency to blow out your tempDB, steal all of the RAM, etc... when you try to run this against something production size (because of the cardinality of the dataset you have to process). It you had a 10,000 row table, you'd need to process (through the sub-query) 10000*10001/2 rows, or 50Million rows (give or take 50,000 rows).

What subquery are you referring to? With the self join on 'code', you're only dealing with the average number of payments per 'code' so the total impact would be more like 250K-500K (assuming 50-100 transactions per code) rows.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

I should have worded that as a 10,000 row GROUP. Making an assumption about the size of your groups is really dangerous. All you would need is one really large group and your logical row count goes through the roof.

Even if your data is very even - you might find that the overhead of creating a temp table, and performing the running count is well worth it since it has a linear growth factor (compared to a second order polynomial growth factor). When you get to bigger sets, this can add up quickly (turning into queries running for hours).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 7

antonio.collins (2/11/2008)
i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?

Yep... and that "simple self join" is classified as a "Triangular Join"... doesn't need to be in a correlated subquery to be a "Triangular Join".

For groups with very small rowcounts, the "simple self join" you posted is quite effective, as Matt and I have both suggested... try it with groups of 10000 or 20000 rows are present...

As Matt also suggested, go take a look at the articles on Triangular Joins and the way to get around those for running totals and running counts. And, they're not mere speculation... there's Proof of Principle code in the articles that you can run. Here's the URL's, again...

http://www.sqlservercentral.com/articles/T-SQL/61539/

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 8

this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.

also, knowing the type and frequency of data that you are dealing with does influence your algorithms and even your database design. so if the original poster can determine that there are only a reasonable number of transactions per 'code' (even a credit card processor only has to deal with at most a couple hundred transactions per month), this simple approach will be efficient and easier for a less-than-expert sql developer to understand, implement, and maintain.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 9

antonio.collins (2/12/2008)
this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.

Heh... Exactly what Matt and I've both said... 😉

Even with groups as small as, say, 4 rows per account number, you're still looking at touching (4*4+4)/2 or 10 rows per account internally (instead of the 4 that are there) for a "<=" relationship. It's still going to be slower, but it will be tolerable.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

antonio.collins (2/12/2008)
this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.
also, knowing the type and frequency of data that you are dealing with does influence your algorithms and even your database design. so if the original poster can determine that there are only a reasonable number of transactions per 'code' (even a credit card processor only has to deal with at most a couple hundred transactions per month), this simple approach will be efficient and easier for a less-than-expert sql developer to understand, implement, and maintain.

Knowing the rough patterns in your data is great, but in this case averages don't do you any favors. Just one outlier will wreck your day, and what was "efficient" is now carving out large new chunks of real estate in your tempdb. It really doesn't take all that much (I've seen some stores do 5 times their typical transaction volume, say right around christmas).

But again - if your data is always going to be small (like - really small), yes - this works fine I suppose.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 11

Heh... correct... there's always that "one time" that justifies the cold water you have running through the garden hose to baseplate of the CPU in your server 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)