Limit Resources to UGLY query

  • Heh... no... not real painful 😀 Let me explain...

    Let's say you have a table that looks like this...

    --===== Create and populate a 1,000,000 row test table.

    -- Column RowNum has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Takes about 77 seconds to execute.

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

    SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

    + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

    SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME),

    Change = CAST(NULL AS INT)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD CONSTRAINT PK_jbmTest_RowNum PRIMARY KEY CLUSTERED (RowNum)

    GO

    ... and, let's say we want to do something similar to what you are doing... in order by rownum (forgetting account numbers, etc, for now), we want to id the change of the SomeNumber column... if the next row has a larger value than the previous, well mark 1... if the same, then 0... if smaller, then -1. We'll do a little setup and then don't blink... it runs nasty fast... notice the handy way we do an update remembering the previous value without either a self join, a join, or an explicit loop... that's what makes it so fast...

    --===== Do a little setup

    DECLARE @PrevVal MONEY

    SET @PrevVal = 0

    DECLARE @Dummy INT

    --===== Do the update to the Change column

    UPDATE jbmTest

    SET @Dummy = Change = SIGN(SomeNumber-@PrevVal),

    @PrevVal = SomeNumber

    FROM jbmTest WITH (INDEX(PK_jbmTest_RowNum),TABLOCK)

    That update on a million rows take 7 seconds on my box.

    But, add 1 "reporting" index...

    CREATE INDEX IDX_DODAH

    ON jbmTest (SomeDate,SomeNumber,Change)

    GO

    ... and, suddenly, the same "previous row comparison" update takes over 2 minutes to run instead of 7 seconds.

    You can do the same type of update in your temp tables... The clustered primary key (whatever it turns out to be) is most important for the blazing speed. Updates on a table with a bazillion reporting indexes are going to continue to kill you... with such a large update, you may want to drop all of the indexes, do your inserts/updates, and rebuild the indexes... even that's going to take a while and might not be worth it (although it will optimize the indexes for reporting :hehe: )

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I love that trick you did. I tried to do that myself, but was unable to find a way to do it.

    @dummy = change = sign(type1 - @previous) ,@previous = type1

  • Heh, yeah... the key was the @Dummy variable in this case. Needed it to get by the mix of columns and variables because the value wasn't necessary. The other key is the index hint on an index that sorts in the order you need.

    Thanks for the feedback, Bob. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good proof that index maintenance really DOES have a price! I like the update mechanism you used too Jeff.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • It's going to take a little to fix the code. Should have it done a little later today. Let you know the results. Now for the other nightmare issue I have.

    Since I often mix days and weeks, I need to do something like the following.

    Date,Week,Month,Daily_Type1,Monthly_Type2

    2006-11-29 00:00:00.000,5577,1283,0.176767241809893,NULL

    2006-11-30 00:00:00.000,5578,1283,0.206843646769383,0.467677951418334

    Now you see that the value for Monthly_Type2 is null because I don't have a value for that date, but 11/30 is the monthly value.

    Since they are both the same month, I need to back fill in the nulls with the value for that month. Any other really cool tricks like the previous one to do that?

    UPDATE #ComplexFilter

    SET Monthly_FM= MonthlyValue.Monthly_FM,Monthly_FM_Change= MonthlyValue.Monthly_FM_Change

    FROM #ComplexFilter

    INNER JOIN #ComplexFilter AS MonthlyValue

    ON #ComplexFilter.MonthNumber = MonthlyValue.MonthNumber

    AND MonthlyValue.StoredCalcDataID = #ComplexFilter.StoredCalcDataID

    AND MonthlyValue.Monthly=1

    AND MonthlyValue.Monthly_FM IS NOT NULL

    AND MonthlyValue.Monthly_FM_Change IS NOT NULL

    WHERE #ComplexFilter.Monthly=0

  • Yes I would agree but not in this case. The process I'm referring to deals with a large volume of data.

    Thanks for your reply

    Kurt

    DBA

    RHWI, Inc

    Poughkeepsie, NY

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • I was able to update 11M+ rows 3x and Update every 20th row (months) 1x

    in 210 seconds. I am going to see if I can get this to do this in no more than 1 pass per day,week, month but at this point that is an improvement. I am re-running now to see how significant. But I am thinking it is about a 600% improvement.

  • DECLARE @Dummy INT,@PrevID INT

    SELECT @PrevID = 0

    DECLARE @PreDaily_D FLOAT

    DECLARE @PreDaily_M FLOAT

    DECLARE @PreDaily_RM FLOAT

    UPDATE #ComplexFilter

    SET

    @PreDaily_D = CASE WHEN @PrevID = StoredCalcDataID THEN @PreDaily_D ELSE NULL END,

    @PreDaily_M = CASE WHEN @PrevID = StoredCalcDataID THEN @PreDaily_M ELSE NULL END,

    @PreDaily_RM = CASE WHEN @PrevID = StoredCalcDataID THEN @PreDaily_RM ELSE NULL END,

    @Dummy=Daily_D_Change=SIGN(Daily_D-@PreDaily_D),@PreDaily_D=Daily_D,

    @Dummy=Daily_M_Change=SIGN(Daily_M-@PreDaily_M),@PreDaily_M=Daily_M,

    @Dummy=Daily_RM_Change=SIGN(Daily_RM-@PreDaily_RM),@PreDaily_RM=Daily_RM,

    @PrevID = StoredCalcDataID

    FROM #ComplexFilter WITH (INDEX(#ComplexFilter_PK))

    WHERE DAILY=1

    SELECT @PrevID = 0

    DECLARE @PreMonthly_FM FLOAT

    UPDATE #ComplexFilter

    SET

    @PreMonthly_FM = CASE WHEN @PrevID = StoredCalcDataID THEN @PreMonthly_FM ELSE NULL END,

    @Dummy=Monthly_FM_Change=SIGN(Monthly_FM-@PreMonthly_FM),@PreMonthly_FM=Monthly_FM,

    @PrevID = StoredCalcDataID

    FROM #ComplexFilter WITH (INDEX(#ComplexFilter_PK))

    WHERE MONTHLY=1

    FYI... Now it is down to only 105 seconds. Which with my past method took about 800 seconds for a 4 field 11M row update.

    Thanks Jeff!!! That's an improvement.

  • Problem 2.

    Date,Week,Month,Daily_Type1,Monthly_Type2

    2006-11-29 00:00:00.000,5577,1283,0.176767241809893,NULL

    2006-11-30 00:00:00.000,5578,1283,0.206843646769383,0.467677951418334

    Drop PK index

    create unique clustered index #TmpPK on #ComplexFilter(AsOfDate DESC) -- Change order

    UPDATE #ComplexFilter

    SET

    @CurrMonthly_Type2 = CASE WHEN Monthly_Type2 IS NULL THEN @CurrMonthly_Type2 ELSE Monthly_Type2 END,

    Monthly_Type2 = @CurrMonthly_Type2

    FROM #ComplexFilter WITH (INDEX (#TmpPK ))

    -- Apparently you don't need the @dummy field (at least in 2k5) Nor the HINT since default would be to go in order of CLUSTERED INDEX.

    Now I am in the process of figuring out if the dropping and creating of the index is FASTER than the old UPDATE, but I'll let you know.

  • An Update

    The @Dummy= is required. Well at least if you want the value to be right 🙂 The statement will run, but the value will be wrong. Not sure exactly what happens, but it isn't what it should be.

  • Just a little warning... you are using an undocumented behaviour and a SP could belly up your project. :hehe:

    Other than that the use of variables in update statements have been stable for the last releases but never documented..

    Cheers,


    * Noel

  • hmmm. Well, I guess the good news here is that this code isn't controlling a missile launch system or anything like that. However, I will keep that in mind. Thanks.

  • What is it for?

  • TheSQLGuru (10/2/2007)


    Good proof that index maintenance really DOES have a price! I like the update mechanism you used too Jeff.

    Thank you much!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bob Fazio (10/2/2007)


    I was able to update 11M+ rows 3x and Update every 20th row (months) 1x

    in 210 seconds. I am going to see if I can get this to do this in no more than 1 pass per day,week, month but at this point that is an improvement. I am re-running now to see how significant. But I am thinking it is about a 600% improvement.

    Now, we're cookin'... nice job, Bob...

    You can make other variables/formula combination in the same update if you want to try to get it down to a single pass... it won't cost but a bit performance wise.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 58 total)

You must be logged in to reply to this topic. Login to reply