TSQL: Computing Balance

  • Guys,

    I have following scenario where I have to calculate balance for each unique code

    CODE PAIDDUEBALANCE

    ___________________________________

    AA1040

    AA1040

    AA1040

    BB80100

    BB10100

    In the above example for CODE 'AA' balance should read 30, 20, 10 and for code 'BB' balance should 20, 1o.

    Is there any way using TSQL code to accomplish this??

    Any suggestions and inputs would help

    Thanks

  • Hi,

    Try this

    --separate into two tables. The total due, from what I can see, is constant. If it's not constant then it should at

    --least still be in a separate table as it remains constant across multiple transactions.

    declare @Trans table( transNum int identity(1,1) primary key clustered, Code char(2), Paid money )

    declare @Owing table( Code char(2) primary key clustered, Due money )

    insert into @Trans( code, paid )

    select 'AA', 10

    UNION ALL

    select 'AA', 10

    UNION ALL

    select 'AA', 10

    UNION ALL

    select 'BB', 80

    UNION ALL

    select 'BB', 10

    insert into @Owing ( Code, Due )

    select 'AA', 40

    UNION ALL

    select 'BB', 100

    --running total. For max efficiency you should look up Jeff Moden's recent article on the subject and also look up triangular joins and RBAR

    select

    Trans.transNum, Trans.Code, Paid,

    Owing.Due - ( select sum(Paid) as sumPaid

    from @Trans TransRBAR

    where TransRBAR.transNum <= Trans.transNum

    and TransRBAR.code = Trans.code

    ) as Balance

    from @Trans Trans

    inner join @Owing Owing

    on Trans.Code = Owing.Code

    order by Trans.transNum, Trans.Code

    Note that I've separated the data into two tables. Does the DUE column's value change over time? If so, or even if it doesn't, I think you should have it in a separate table.

    Also, you'll need a column to imply an ordering to the transactions - I've added a TransNum column for that purpose. In a banking scenario you'll probably have a similar column as well as a datetime.

    The above code could be more efficient - look up Jeff Moden's article on running totals on this site. If you don't have too many rows it should be ok. The other alternative is to actually calculate and store the balance as each transaction's entered. This way you needn't calculate anything and you wouldn't accumulate rounding errors, etc (I've not written financial stuff but that's something I'd worry about if interest, etc is involved).

  • AM,

    Ian is absolutely correct... for very small groups of rows by account, the example he gave will work just fine albeit a little slower than you may expect (just as Ian said). The reason why it does run horribly slow on larger groups of rows is because of a thing called a "Triangular Join" which is half as bad as a full "Square" or Cross-Join. Here's why Triangular Joins are so bad...

    http://www.sqlservercentral.com/articles/T-SQL/61539/

    As Ian also suggested, if you do have accounts with large numbers of rows, you can calculate the new balance everytime a new row is added. You do have to be a bit careful to make sure the code is perfect so the balance never gets out of sync.

    Or, (again, as Ian suggested) you can recalculate the balance either for a given account or the whole shootin' match using the code that Ian pointed out at the following URL...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    ... it'll do a grouped running balance across a million rows, regardless of the number of rows per account, in about 7-10 seconds.

    Ian, thanks for the honorable mention!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?

    create table #z

    ( seq int not null, code char(2), paid int, due int )

    go

    insert into #z values (1,'AA',10,40)

    insert into #z values (2,'AA',10,40)

    insert into #z values (3,'BB',80,100)

    insert into #z values (4,'AA',10,40)

    insert into #z values (7,'BB',10,100)

    go

    select Z1.seq, Z1.code, min(Z1.paid) as paid,

    min(Z1.due) as due, min(Z1.due) - sum(Z2.paid) as balance

    from #z as Z1 join #z as Z2

    on Z1.code = Z2.code and Z2.seq <= Z1.seq

    group by Z1.seq, Z1.code

    go

    result:

    1AA104030

    2AA104020

    4AA104010

    3BB8010020

    7BB1010010

  • antonio.collins (2/11/2008)


    i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?

    Sure it can, and as a matter of fact - that's the "recommended way to do it". The problem is - this turns into a real dog of a query, and will have the tendency to blow out your tempDB, steal all of the RAM, etc... when you try to run this against something production size (because of the cardinality of the dataset you have to process). It you had a 10,000 row table, you'd need to process (through the sub-query) 10000*10001/2 rows, or 50Million rows (give or take 50,000 rows).

    Read up on Jeff's articles - it's a good read and a compelling argument (IMO).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • The problem is - this turns into a real dog of a query, and will have the tendency to blow out your tempDB, steal all of the RAM, etc... when you try to run this against something production size (because of the cardinality of the dataset you have to process). It you had a 10,000 row table, you'd need to process (through the sub-query) 10000*10001/2 rows, or 50Million rows (give or take 50,000 rows).

    What subquery are you referring to? With the self join on 'code', you're only dealing with the average number of payments per 'code' so the total impact would be more like 250K-500K (assuming 50-100 transactions per code) rows.

  • I should have worded that as a 10,000 row GROUP. Making an assumption about the size of your groups is really dangerous. All you would need is one really large group and your logical row count goes through the roof.

    Even if your data is very even - you might find that the overhead of creating a temp table, and performing the running count is well worth it since it has a linear growth factor (compared to a second order polynomial growth factor). When you get to bigger sets, this can add up quickly (turning into queries running for hours).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • antonio.collins (2/11/2008)


    i'm confused. assuming there's some column that can order the payments sequentially/chronologically, can't this be solved with a simple self join?

    Yep... and that "simple self join" is classified as a "Triangular Join"... doesn't need to be in a correlated subquery to be a "Triangular Join".

    For groups with very small rowcounts, the "simple self join" you posted is quite effective, as Matt and I have both suggested... try it with groups of 10000 or 20000 rows are present...

    As Matt also suggested, go take a look at the articles on Triangular Joins and the way to get around those for running totals and running counts. And, they're not mere speculation... there's Proof of Principle code in the articles that you can run. Here's the URL's, again...

    http://www.sqlservercentral.com/articles/T-SQL/61539/

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.

    also, knowing the type and frequency of data that you are dealing with does influence your algorithms and even your database design. so if the original poster can determine that there are only a reasonable number of transactions per 'code' (even a credit card processor only has to deal with at most a couple hundred transactions per month), this simple approach will be efficient and easier for a less-than-expert sql developer to understand, implement, and maintain.

  • antonio.collins (2/12/2008)


    this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.

    Heh... Exactly what Matt and I've both said... 😉

    Even with groups as small as, say, 4 rows per account number, you're still looking at touching (4*4+4)/2 or 10 rows per account internally (instead of the 4 that are there) for a "<=" relationship. It's still going to be slower, but it will be tolerable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • antonio.collins (2/12/2008)


    this particular example is not a pure triangle as illustrated in your article. that's because the 'code' (account number) is used as the primary join article. so, the triangle is only limited to transactions for a particular 'code' and the calculation of the impact is greatly reduced.

    also, knowing the type and frequency of data that you are dealing with does influence your algorithms and even your database design. so if the original poster can determine that there are only a reasonable number of transactions per 'code' (even a credit card processor only has to deal with at most a couple hundred transactions per month), this simple approach will be efficient and easier for a less-than-expert sql developer to understand, implement, and maintain.

    Knowing the rough patterns in your data is great, but in this case averages don't do you any favors. Just one outlier will wreck your day, and what was "efficient" is now carving out large new chunks of real estate in your tempdb. It really doesn't take all that much (I've seen some stores do 5 times their typical transaction volume, say right around christmas).

    But again - if your data is always going to be small (like - really small), yes - this works fine I suppose.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Heh... correct... there's always that "one time" that justifies the cold water you have running through the garden hose to baseplate of the CPU in your server 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply