Cumulative subtraction

  • Jacob Wilkins - Friday, November 17, 2017 9:42 AM

    Seems to be an odd recursive thing, where for each row you compute a SubtractionResult, with SubtractionResult=ColumnValue-(SubtractionResult from previous row, or 0 if no previous row).

    Fairly simple to implement in whatever is the most appropriate recursive solution here (likely Quirky Update will be most efficient, with all its usual caveats).

    Another way of saying the same thing is that you're computing these terms for each row, with RowN standing in for the column value in question for the row N:

    Row1-0, or simply Row1
    Row2-Row1
    Row3-(Row2-Row1), or Row3-Row2+Row1
    Row4-(Row3-(Row2-Row1)), or Row4-Row3+Row2-Row1
    ...

    So, the other way you could do it with the allowance of ORDER BY in windowed aggregate functions in 2012 is to do a SUM of the column value in question for all preceding rows inclusive of the current row, switching the signs of rows with a different (row number in specified order)%2 value than the current row (note that pattern in the expanded forms of each expression above).

    Of course, you have to have something that defines the order of each row; the OP will have to adapt to whatever defines order for his data (I've just used ascending numerical order as his sample seems to use).

    Something like this:


    IF OBJECT_ID ('tempdb.dbo.#somenumbers') IS NOT NULL DROP TABLE #somenumbers;

    CREATE TABLE #somenumbers (somenumber INT);

    INSERT INTO #somenumbers VALUES
    (10),
    (20),
    (30),
    (40);

    WITH numbered AS
    (
    SELECT rn=ROW_NUMBER() OVER (ORDER BY somenumber ASC), somenumber
    FROM #somenumbers
    ),
    odds_and_evens AS
    (
    SELECT *,
    odds=CASE WHEN rn%2=0 THEN -1*somenumber ELSE somenumber END,
    evens=CASE WHEN rn%2=1 THEN -1*somenumber ELSE somenumber END
    FROM numbered
    )

    SELECT
    somenumber
    ,cumulative_subtraction
    =CASE WHEN rn%2=0 THEN SUM(evens) OVER (ORDER BY rn ASC) ELSE SUM(odds) OVER (ORDER BY rn ASC) END
    FROM odds_and_evens
    ORDER BY rn ASC;

    EDIT: Tidied up some word choices.

    I used a similar approach, but it seems to require fewer scans and logical reads.  I also added a sequence number (identity) to use for sorting.

    ;
    WITH Mults AS
    (
        SELECT *, 2 * (ROW_NUMBER() OVER(ORDER BY Seq) % 2) - 1 AS mult
        FROM @t
    )

    SELECT *, mult * SUM(Col1 * mult) OVER(ORDER BY Seq ROWS UNBOUNDED PRECEDING)
    FROM Mults
    ;

    Here are the results:

    /*  Drew  */
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#B626E99C'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    /*  Jacob  */
    Table 'Worktable'. Scan count 5, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#somenumbers________________________________________________________________________________________________________00000001AA97'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


    As you can see, my solution requires no scans or logical reads of the worktable (CTE), but Jacob's requires 5 scans and 25 logical reads.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Tuesday, November 21, 2017 9:05 AM

    Jacob Wilkins - Friday, November 17, 2017 9:42 AM

    Seems to be an odd recursive thing, where for each row you compute a SubtractionResult, with SubtractionResult=ColumnValue-(SubtractionResult from previous row, or 0 if no previous row).

    Fairly simple to implement in whatever is the most appropriate recursive solution here (likely Quirky Update will be most efficient, with all its usual caveats).

    Another way of saying the same thing is that you're computing these terms for each row, with RowN standing in for the column value in question for the row N:

    Row1-0, or simply Row1
    Row2-Row1
    Row3-(Row2-Row1), or Row3-Row2+Row1
    Row4-(Row3-(Row2-Row1)), or Row4-Row3+Row2-Row1
    ...

    So, the other way you could do it with the allowance of ORDER BY in windowed aggregate functions in 2012 is to do a SUM of the column value in question for all preceding rows inclusive of the current row, switching the signs of rows with a different (row number in specified order)%2 value than the current row (note that pattern in the expanded forms of each expression above).

    Of course, you have to have something that defines the order of each row; the OP will have to adapt to whatever defines order for his data (I've just used ascending numerical order as his sample seems to use).

    Something like this:


    IF OBJECT_ID ('tempdb.dbo.#somenumbers') IS NOT NULL DROP TABLE #somenumbers;

    CREATE TABLE #somenumbers (somenumber INT);

    INSERT INTO #somenumbers VALUES
    (10),
    (20),
    (30),
    (40);

    WITH numbered AS
    (
    SELECT rn=ROW_NUMBER() OVER (ORDER BY somenumber ASC), somenumber
    FROM #somenumbers
    ),
    odds_and_evens AS
    (
    SELECT *,
    odds=CASE WHEN rn%2=0 THEN -1*somenumber ELSE somenumber END,
    evens=CASE WHEN rn%2=1 THEN -1*somenumber ELSE somenumber END
    FROM numbered
    )

    SELECT
    somenumber
    ,cumulative_subtraction
    =CASE WHEN rn%2=0 THEN SUM(evens) OVER (ORDER BY rn ASC) ELSE SUM(odds) OVER (ORDER BY rn ASC) END
    FROM odds_and_evens
    ORDER BY rn ASC;

    EDIT: Tidied up some word choices.

    I used a similar approach, but it seems to require fewer scans and logical reads.  I also added a sequence number (identity) to use for sorting.

    ;
    WITH Mults AS
    (
        SELECT *, 2 * (ROW_NUMBER() OVER(ORDER BY Seq) % 2) - 1 AS mult
        FROM @t
    )

    SELECT *, mult * SUM(Col1 * mult) OVER(ORDER BY Seq ROWS UNBOUNDED PRECEDING)
    FROM Mults
    ;

    Here are the results:

    /*  Drew  */
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#B626E99C'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    /*  Jacob  */
    Table 'Worktable'. Scan count 5, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#somenumbers________________________________________________________________________________________________________00000001AA97'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


    As you can see, my solution requires no scans or logical reads of the worktable (CTE), but Jacob's requires 5 scans and 25 logical reads.

    Drew

    After reviewing, I noticed that Jacob's solution did not specify a frame, so it used the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is known to perform worse than specifying ROWS....  When I specified ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW to his windowed functions, the scans and logic reads went to 0.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Indeed. Changing the frame to ROWS UNBOUNDED PRECEDING is responsible for that difference (instead of just letting it use the default RANGE like I did).

    If my window functions are changed to use the same frame then the IO is the same between the two queries.

    I still prefer yours for its brevity. 

    Cheers!

    EDIT: Heh, my usual weakness strikes, starting a post and wandering off to do other things. Drew already caught all this. I really ,really need to get into that habit of checking the page again before I actually post 🙂

  • pity the OP @selpoivre seems to have left the building !

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • The solutions are really nice, but I just want to add something to the mix. This is a simple approach, but should be handled with care (with great power comes great responsibility). Read the article that describes it: Solving the Running Total and Ordinal Rank Problems (Rewritten) - SQLServerCentral


    USE Test

    CREATE TABLE #Sample(
      id int PRIMARY KEY,
      Col1 int)
    INSERT INTO #Sample
    VALUES(1,10), (2,20), (3,30), (4,40);

    DECLARE @id int, @Col1 int = 0;

    UPDATE #Sample WITH (TABLOCKX) SET
      @Col1 = Col1 = Col1 - @Col1,
      @ID = ID
    OPTION ( MAXDOP 1);

    SELECT *
    FROM #Sample

    GO
    DROP TABLE #Sample;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply