Running Total

  • Luis Cazares - Friday, April 7, 2017 8:27 PM

    robertopmorris - Friday, April 7, 2017 2:04 PM

    Jeff:

    Thanks for the response.

    There are about 16,000 to 20,000 rows in the data set in total currently.
    I want the running total to start over on change of PartNum - there are around 10,000 Parts and this number is ever increasing - by around 2,000 per annum.
    The running total needs to be run once or twice a day - rarely three times.
    The current SQL code returns the results very quickly - less than 5 seconds - but only gives a running total per date because the data set contains multiple entries for the same Part on the same date - we need to view this data row by row and have the running total displayed likewise.
    I hope this makes sense.

    Thank you in advance.
    Roberto.

    Did you read the article that we mentioned before?

    Heh... probably not.  He might after my next post, though. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok.  Here's a test data generator for folks to play with.  I normally use a million rows just to get some clue as to how scalable whatever code is being tested will be.  Details are in the code.

    --=====================================================================================================================
    --      Create and populate a test table.
    --      Nothing in this section is a part of the solution. We're just creating test data here.
    --      Written so that it works in all versions of SQL Server from 2005 and up.
    --      Refer to the following URL for tips on "Minimal Logging".
    --          https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
    --      To find out more about creating random constrained data in huge quanties quickly, see the following URLs.
    --          http://www.sqlservercentral.com/articles/Data+Generation/87901/
    --          http://www.sqlservercentral.com/articles/Test+Data/88964/
    --=====================================================================================================================
    --===== If the test table exists, drop it to make reruns in SSMS easier.
         IF OBJECT_ID('tempdb..#MyTable','U') IS NOT NULL
       DROP TABLE #MyTable
    ;
    --===== Create the test table.
         -- Note that there is nothing to uniquely identify rows in this table and there can be dupes.
     CREATE TABLE #MyTable
            (
             PartNum            VARCHAR(50)
            ,DueDate            DATETIME
            ,RequirementFlag    TINYINT
            ,Quantity           DECIMAL(22,8)
            )
    ;
    --===== I don't know if there are any indexes on the table but I included one that seems most likely.
     CREATE CLUSTERED INDEX CI_PartNum_DueDate
         ON #MyTable (PartNum, DueDate)
    ;
    --===== Obviously-named local variables to control what the test data contains.
    DECLARE  @RowCount  INT
            ,@PartCount INT
            ,@StartDate DATETIME
            ,@EndDate   DATETIME
    ;
     SELECT  @RowCount  = 1000000
            ,@PartCount = 10000
            ,@StartDate = '20160101'
            ,@EndDate   = GETDATE() 
    ;
    --===== Populate the test table with random constrained data similar to the real table.
       WITH cte AS
    (
     SELECT TOP (@RowCount)
             PartNum         = STUFF(RIGHT(ABS(CHECKSUM(NEWID())%@PartCount)+1000001,6),3,0,'-')
            ,DueDate         = @StartDate + ABS(CHECKSUM(NEWID())%(DATEDIFF(dd,@StartDate,@EndDate)+1))
            ,RequirementFlag = SIGN(ABS(CHECKSUM(NEWID())%10))
            ,Quantity        = ABS(CHECKSUM(NEWID())%2000)+1
       FROM      sys.all_columns ac1
      CROSS JOIN sys.all_columns ac2
    )
     INSERT INTO #MyTable WITH (TABLOCK) --Allow "Minimal Logging" if not in FULL Recovery Model
            (PartNum, DueDate, RequirementFlag, Quantity)
     SELECT PartNum, DueDate, RequirementFlag, Quantity
       FROM cte
      ORDER BY PartNum, DueDate --Allow "Minimal Logging" when filling empty Clustered Index without a Trace Flag
     OPTION (RECOMPILE) --Helps "Minimal Logging" happen because value of variables unknown until run time.
    ;

    Here's my shot at the problem using the technique that Luis and I have both provided links to.  It only takes 3 seconds on a million rows and if you were to convert it to a permanent table instead of a Temp Table, you could run it several times a day to keep the table up to date.  The code is a bit more complicated but being able to process a million rows into a table doing a running total makes it worth it in my book.  Additionally, it only touches 2 million rows.  The Triangular Join methods have to touch almost 52 million internal rows for this particular test.

    --=====================================================================================================================
    --      Use a "Quirky Update" modified to have a safety check to do the Running Total.
    --      Please refer to the following article on how a "Quirky Update" works.
    --          http://www.sqlservercentral.com/articles/T-SQL/68467/
    --      To find out more about the safety check, please see the following post.
    --          https://www.sqlservercentral.com/Forums/802558/Solving-the-quotRunning-Totalquot-quotOrdinal-Rankquot-Problems-Rewritten?PageIndex=5#bm981258    
    --=====================================================================================================================
    --===== If the running total table exists, drop it to make reruns in SSMS easier.
         -- This could be done on a "real" table and updated several times each day.
         IF OBJECT_ID('tempdb..#RunningTotal','U') IS NOT NULL
       DROP TABLE #RunningTotal
    ;
    --===== Create the test table with both a unique "counter" column and a running total column.
     CREATE TABLE #RunningTotal
            (
             RowNum             INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
            ,PartNum            VARCHAR(50)
            ,DueDate            DATETIME
            ,RequirementFlag    TINYINT
            ,Quantity           DECIMAL(22,8)
            ,RunningTotal       DECIMAL(22,8) DEFAULT (0.0)
            )
    ;
    --===== Populate the table with ALL of the data from the source table.
         -- This is very fast (about 1.5 seconds per million rows on my box).
     INSERT INTO #RunningTotal WITH (TABLOCK) --Allow "Minimal Logging" if not in FULL Recovery Model
            (PartNum, DueDate, RequirementFlag, Quantity)
     SELECT PartNum, DueDate, RequirementFlag, Quantity
       FROM #MyTable
      ORDER BY PartNum, DueDate --Allow "Minimal Logging" when filling empty Clustered Index without a Trace Flag
     OPTION (RECOMPILE) --Not needed here but I've "developed the habit" to support minimal logging
    ;
    --===== Declare the variables we'll need to support the "Quirky Update" method of calculating Running Totals.
    DECLARE  @PrevPartNum   VARCHAR(50)
            ,@RunningTotal  DECIMAL(22,8)
            ,@SafetyCounter INT
    ;
    --===== Presets
     SELECT  @PrevPartNum   = ''
            ,@RunningTotal  = 0.0
            ,@SafetyCounter = 1 --You can test the safety feature by changing this to any number other than 1.
    ;
    --===== Calculate the running total with a safety check.
     UPDATE rt
        SET  @RunningTotal = RunningTotal = CASE
                                            WHEN RowNum = @SafetyCounter
                                            THEN
                                                CASE
                                                WHEN PartNum = @PrevPartNum
                                                THEN @RunningTotal + Quantity
                                                ELSE Quantity
                                                END
                                            ELSE 1/0 --Forces an error if the counter doesn't match the row being worked on.
                                            END
            ,@PrevPartNum   = PartNum
            ,@SafetyCounter = @SafetyCounter + 1
       FROM #RunningTotal rt WITH (TABLOCKX, INDEX(1))
     OPTION (MAXDOP 1)
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, April 7, 2017 11:48 PM

    Here's my shot at the problem using the technique that Luis and I have both provided links to.  It only takes 3 seconds on a million rows and if you were to convert it to a permanent table instead of a Temp Table, you could run it several times a day to keep the table up to date.  The code is a bit more complicated but being able to process a million rows into a table doing a running total makes it worth it in my book.  Additionally, it only touches 2 million rows.  The Triangular Join methods have to touch almost 52 million internal rows for this particular test.

    @jeff

    Just for clarity ...
    Your recommendation for running totals is  

    SELECT ...
    INTO #Temp
    FROM ...
    WHERE ...

    Exec QU on #Temp

    SELECT ...
    FROM #Temp

  • DesNorton - Sunday, April 9, 2017 11:57 PM

    Jeff Moden - Friday, April 7, 2017 11:48 PM

    Here's my shot at the problem using the technique that Luis and I have both provided links to.  It only takes 3 seconds on a million rows and if you were to convert it to a permanent table instead of a Temp Table, you could run it several times a day to keep the table up to date.  The code is a bit more complicated but being able to process a million rows into a table doing a running total makes it worth it in my book.  Additionally, it only touches 2 million rows.  The Triangular Join methods have to touch almost 52 million internal rows for this particular test.

    @jeff

    Just for clarity ...
    Your recommendation for running totals is  

    SELECT ...
    INTO #Temp
    FROM ...
    WHERE ...

    Exec QU on #Temp

    SELECT ...
    FROM #Temp

    Correct.  Most people don't want a "Running Total" column on their main table (and I agree with that because it can go out of date very quickly especially if in-row updates are allowed rather than "corrections" in the form of additional rows) and the necessary clustered index to correctly support the Quirky Update is frequently contrary to what is needed for other things on the main table.  It also guarantees exclusive use of the data at the given moment and guarantees a point in time snapshot of the data at the time of the running total.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply