Filling Gaps - Running Totals

  • I have a problem I can’t figure out and I’m sure someone here can enlighten me.  First, let me setup my scenario.

     

    I have the following table:

     

    CREATE TABLE dbo.table1(

         Acct_nbr           int,

         Month_ind           int,

         end_balance        decimal(15,2),

         credit_amt         decimal(15,2),

         debit_amt          decimal(15,2),

         amt_change         decimal(15,2))

     

    I insert data using the following:

     

    INSERT INTO dbo.table1 VALUES(123, 1, 0, 10, -5, 5)

    INSERT INTO dbo.table1 VALUES(123, 4, 0, 7, 0, 7)

    INSERT INTO dbo.table1 VALUES(123, 5, 0, 12, -5, 7)

    INSERT INTO dbo.table1 VALUES(123, 17, 0, 3, -3, 0)

    INSERT INTO dbo.table1 VALUES(789, 7, 0, 9, 0, 9)

    INSERT INTO dbo.table1 VALUES(789, 12, 0, 53, -10, 43)

    INSERT INTO dbo.table1 VALUES(789, 13, 0, 25, 0, 25)

    INSERT INTO dbo.table1 VALUES(789, 14, 0, 5, -17, -12)

    INSERT INTO dbo.table1 VALUES(789, 19, 0, 38, -17, 21)

    INSERT INTO dbo.table1 VALUES(789, 25, 0, 14, -11, 3)

     

    Now the data looks like the following:

     

    Acct_nbr

    Month_ind

    End_balance

    Credit_amt

    Debit_amt

    Amt_change

    123

    1

    0

    10

    -5

    5

    123

    4

    0

    7

    0

    7

    123

    5

    0

    12

    -5

    7

    123

    17

    0

    3

    -3

    0

    789

    7

    0

    9

    0

    9

    789

    12

    0

    53

    -10

    43

    789

    13

    0

    25

    0

    25

    789

    14

    0

    5

    -17

    -12

    789

    19

    0

    38

    -17

    21

    789

    25

    0

    14

    -11

    3

     

     

    I want to do two things.  One, I need to fill in the gaps in the month_ind field for each account.  These entries will have zeroes in all of the fields except acct_nbr and month_ind.  Two, I then want to update the end_balance by taking the previous month’s end_balance plus the current month’s amt_change and keep this running total specific to each account

     

    Then final result will look like the following:

     

    Acct_nbr

    Month_ind

    End_balance

    Credit_amt

    Debit_amt

    Amt_change

    123

    1

    5

    10

    -5

    5

    123

    2

    5

    0

    0

    0

    123

    3

    5

    0

    0

    0

    123

    4

    12

    7

    0

    7

    123

    5

    19

    12

    -5

    7

    123

    6

    19

    0

    0

    0

    123

    7

    19

    0

    0

    0

    123

    8

    19

    0

    0

    0

    123

    9

    19

    0

    0

    0

    123

    10

    19

    0

    0

    0

    123

    11

    19

    0

    0

    0

    123

    12

    19

    0

    0

    0

    123

    13

    19

    0

    0

    0

    123

    14

    19

    0

    0

    0

    123

    15

    19

    0

    0

    0

    123

    16

    19

    0

    0

    0

    123

    17

    19

    3

    -3

    0

    789

    7

    9

    9

    0

    9

    789

    8

    9

    0

    0

    0

    789

    9

    9

    0

    0

    0

    789

    10

    9

    0

    0

    0

    789

    11

    9

    0

    0

    0

    789

    12

    52

    53

    -10

    43

    789

    13

    77

    25

    0

    25

    789

    14

    65

    5

    -17

    -12

    789

    15

    65

    0

    0

    0

    789

    16

    65

    0

    0

    0

    789

    17

    65

    0

    0

    0

    789

    18

    65

    0

    0

    0

    789

    19

    76

    38

    -17

    21

    789

    20

    76

    0

    0

    0

    789

    21

    76

    0

    0

    0

    789

    22

    76

    0

    0

    0

    789

    23

    76

    0

    0

    0

    789

    24

    76

    0

    0

    0

    789

    25

    79

    14

    -11

    3

     

     

     Thanks in advance.

     

    hawg

     

     

    ----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

  • Being a programmer, I like to tackle problems like this with cursors first and then massage it into a set-based (a.k.a. "good" or "real") solution.

    With that in mind, here you go:

    --loop the accounts

    DECLARE acct CURSOR LOCAL FAST_FORWARD FOR

    SELECT Acct_nbr FROM dbo.table1

    DECLARE @acctid INT

    DECLARE @month_min INT, @month_max INT

    DECLARE @m INT

    DECLARE @prev_bal decimal(15,2)

    OPEN acct

    FETCH NEXT FROM acct INTO @acctid

    WHILE @@FETCH_STATUS = 0

    BEGIN

    --get the start and end months for the account

    SELECT

    @month_min = MIN(Month_ind)

    , @month_max = MAX(Month_ind)

    FROM dbo.table1

    WHERE Acct_nbr = @acctid

    --starting month

    SET @m = @month_min

    --loop each month

    WHILE @m <= @month_max

    BEGIN

    --get the previous end_balance

    -- SELECT @prev_bal = ISNULL(SUM(credit_amt + debit_amt), 0)

    SELECT @prev_bal = ISNULL(SUM(amt_change), 0)

    FROM dbo.table1

    WHERE

    Acct_nbr = @acctid AND

    Month_ind < @m

    --does this month exist for this account?

    IF EXISTS (

    SELECT Acct_nbr

    FROM dbo.table1

    WHERE

    Acct_nbr = @acctid AND

    Month_ind = @m

    )

    --update the end_balance

    UPDATE dbo.table1 SET

    end_balance = @prev_bal

    WHERE

    Acct_nbr = @acctid AND

    Month_ind = @m

    ELSE --insert the month

    INSERT INTO dbo.table1 (

    Acct_nbr,

    Month_ind,

    end_balance,

    credit_amt,

    debit_amt,

    amt_change

    ) VALUES (

    @acctid,

    @m,

    @prev_bal,

    0,

    0,

    0

    )

    --next month

    SET @m = @m + 1

    END

    --next account

    FETCH NEXT FROM acct INTO @acctid

    END

    CLOSE acct

    --all done

    SELECT * FROM dbo.table1 ORDER BY Acct_nbr, Month_ind


    --Eric Swanson
    http://www.ericis.com/

    Posts++;

  • Thanks for the response.  I too am a programmer at heart so the procedural approach is the easiest but, and I forgot to mention this, the problem there is that this job will have to run every night and it will be dealing with millions of records.  Right now it stands at 3.2 million records (without the records I need to add).  This number is going to grow dramatically as we start loading our new data and keeping historical data.

    With that said, I am hoping to find some usage of set-based queries, as much as possible, to solve this problem.  I don't think I can avoid some kind of looping (whether cursor or a WHILE loop) but I am hoping to make this as fast as possible.

     

    hawg

    ----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

  • Will you really have a month indicator > 12...last time I looked at the  calendar...December finished the year...you have 17 + 25.  What would be the max month in your sequence...if the gap was at the end of the sequence?

    I think the best way would be to have a tally/numbers table with the numbers 1-12 are sperate records...and join to it trying to fin the records which don't exist....and creating empty ones, and then resolve the movement records afterewards.

    On a large dataset or with a repeatable exercise....it's best to stay away from cursors....unless absolutely necessary.

  • Oops, my bad.  I was trying to setup a sample of what I am doing because the actual tables and data has many fields and combinations.  I was trying to put together a simple sample to illustrate my problem.

    Ignore the fact that I called it month_ind and just assume it is some kind of counter.  The title, for this sample, really is insignificant. 

    This was just my error.

    hawg

    ----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

  • Are you going to need to process the "end_balance" every time? Can you initialize with NULL and then ignore on subsequent runs, or can it change?


    --Eric Swanson
    http://www.ericis.com/

    Posts++;

  • My initial data is in a staging table and I am  having to write the data to a new table.

    I have to truncate the new table every time and reload so, yes, I have to re-calculate the end_balance every time.

    ----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

  • Aw yeah!!! Well, I took care of the inserting missing rows dealio. With some freakin' crazy idea that hit me upside the head. *The update calculations have dependencies on other rows in the same table, so you will have to incrementally update these. I used a cursor for each month, but updated all account records for the month.

    SET NOCOUNT ON;
    
    CREATE TABLE #table1
    (
       Acct_nbr INT NOT NULL,
       Month_ind INT NOT NULL,
       end_balance DECIMAL(15, 2) NOT NULL DEFAULT(0),
       credit_amt DECIMAL(15, 2) NOT NULL DEFAULT(0),
       debit_amt DECIMAL(15, 2) NOT NULL DEFAULT(0),
       amt_change DECIMAL(15, 2) NOT NULL DEFAULT(0),
       PRIMARY KEY ( Acct_nbr, Month_ind )
    );
    
    --populate the test table
    INSERT INTO #table1
    (
       Acct_nbr,
       Month_ind,
       end_balance,
       credit_amt,
       debit_amt,
       amt_change
    )
    SELECT 123, 1, 0, 10, -5, 5 UNION ALL
    SELECT 123, 4, 0, 7, 0, 7 UNION ALL
    SELECT 123, 5, 0, 12, -5, 7 UNION ALL
    SELECT 123, 17, 0, 3, -3, 0 UNION ALL
    SELECT 789, 7, 0, 9, 0, 9 UNION ALL
    SELECT 789, 12, 0, 53, -10, 43 UNION ALL
    SELECT 789, 13, 0, 25, 0, 25 UNION ALL
    SELECT 789, 14, 0, 5, -17, -12 UNION ALL
    SELECT 789, 19, 0, 38, -17, 21 UNION ALL
    SELECT 789, 25, 0, 14, -11, 3;
    
    --per account the smallest and largest months in the set
    SELECT
       Acct_nbr,
       MIN(Month_ind) AS Month_ind_min,
       MAX(Month_ind) AS Month_ind_max
    INTO #acct_month_range
    FROM #table1
    GROUP BY Acct_nbr;
    
    DECLARE @min_Month_ind INT, @max_Month_ind INT;
    
    --get the smallest and largest months in the set
    SELECT
       @min_Month_ind = MIN(Month_ind_min),
       @max_Month_ind = MAX(Month_ind_max)
    FROM #acct_month_range;
    
    --store all of the possible months
    CREATE TABLE #all_months ( Month_ind INT NOT NULL PRIMARY KEY );
    DECLARE @Month_ind INT;
    SET @Month_ind = @min_Month_ind; --start at the min
    WHILE @Month_ind  prev.Month_ind
       GROUP BY
          curr.Acct_nbr,
          curr.Month_ind,
          curr.amt_change
    ) calc
    WHERE 
       #table1.Acct_nbr = calc.Acct_nbr AND
       #table1.Month_ind = calc.Month_ind;
    
    SELECT * FROM #table1;
    
    --clean up
    DROP TABLE #acct_month_range;
    DROP TABLE #all_months;
    DROP TABLE #table1;
    


    --Eric Swanson
    http://www.ericis.com/

    Posts++;

  • I happen to read this articles on "Joy of Number" and had some explaination about avoiding cursors. So Here is how we can avoid cursors.

    1.Create a table to hold Numbers. All this holds is incremental numbers.

    2.Load the table with Numbers till you reach the max of month

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[Numbers_Loop]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

    drop table [dbo].[Numbers_Loop]

    GO

    CREATE TABLE [dbo].[Numbers_Loop] (

     [Num] [int] NOT NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Numbers_Loop] WITH NOCHECK ADD

     CONSTRAINT [PK_Numbers_Loop] PRIMARY KEY  CLUSTERED

     ([Num])  ON [PRIMARY]

    GO

     Declare @i int;

     Declare @i_Max int

     Select @i_Max = max(Month_ind) from table1

     Select @i = 1;

     While @i <= @i_Max

     Begin

      If Not exists (Select 1 from Numbers_Loop where Num = @i)

      Begin

          Insert INTO Numbers_Loop(Num)

        select @i

       

      End

          Select @i = @i + 1;

     End;

     3.Using the "Numbers" table as the starting point will get us the "Missing Months". So load the data along with "Missing Months" into temp table.

     select MinMax.Acct_Nbr,Numbers_1.Num as Month_ind,--MinMax.MinID,MinMax.MaxID,

      isnull(table1.credit_amt,0) as credit_amt,

      isnull(table1.debit_amt,0) as debit_amt,

      isnull(table1.amt_change,0) as amt_change

     into #tmpFillUp

     from Numbers_1 JOIN ( select Acct_Nbr,min(Month_ind) as MinID,max(Month_ind) as MaxID

         from table1

         group by Acct_Nbr) MinMax

     On Numbers_1.Num between MinID and MaxID

     LEFT OUTER JOIN table1 ON Numbers_1.Num = table1.Month_ind and MinMax.Acct_Nbr = table1.Acct_Nbr

     order by 1,2

    4.Run a query to summarize the ending balance.

     select Acct_Nbr,Month_ind,

      (Select sum(amt_change) from #tmpFillUp EndBalance

       where #tmpFillUp.Acct_Nbr = EndBalance.Acct_Nbr

       and EndBalance.Month_ind <= #tmpFillUp.Month_ind) as end_balance,

     Credit_amt,debit_amt,amt_change

     from #tmpFillUp

    Hope this helps.

    Thanks

    Sreejith

     

     

  • Step #4 is an excellent set-based method for determining the end_balance for the current month. May I also suggest that you eliminate the column "amt_change" and use "SUM(credit_amt + debit_amt)" if possible? If other queries depend on "amt_change", then may I also suggest using a VIEW?


    --Eric Swanson
    http://www.ericis.com/

    Posts++;

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply