October 2, 2007 at 12:17 am
Heh... no... not real painful 😀 Let me explain...
Let's say you have a table that looks like this...
--===== Create and populate a 1,000,000 row test table.
-- Column RowNum has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Takes about 77 seconds to execute.
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME),
Change = CAST(NULL AS INT)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD CONSTRAINT PK_jbmTest_RowNum PRIMARY KEY CLUSTERED (RowNum)
GO
... and, let's say we want to do something similar to what you are doing... in order by rownum (forgetting account numbers, etc, for now), we want to id the change of the SomeNumber column... if the next row has a larger value than the previous, well mark 1... if the same, then 0... if smaller, then -1. We'll do a little setup and then don't blink... it runs nasty fast... notice the handy way we do an update remembering the previous value without either a self join, a join, or an explicit loop... that's what makes it so fast...
--===== Do a little setup
DECLARE @PrevVal MONEY
SET @PrevVal = 0
DECLARE @Dummy INT
--===== Do the update to the Change column
UPDATE jbmTest
SET @Dummy = Change = SIGN(SomeNumber-@PrevVal),
@PrevVal = SomeNumber
FROM jbmTest WITH (INDEX(PK_jbmTest_RowNum),TABLOCK)
That update on a million rows take 7 seconds on my box.
But, add 1 "reporting" index...
CREATE INDEX IDX_DODAH
ON jbmTest (SomeDate,SomeNumber,Change)
GO
... and, suddenly, the same "previous row comparison" update takes over 2 minutes to run instead of 7 seconds.
You can do the same type of update in your temp tables... The clustered primary key (whatever it turns out to be) is most important for the blazing speed. Updates on a table with a bazillion reporting indexes are going to continue to kill you... with such a large update, you may want to drop all of the indexes, do your inserts/updates, and rebuild the indexes... even that's going to take a while and might not be worth it (although it will optimize the indexes for reporting :hehe: )
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2007 at 6:16 am
I love that trick you did. I tried to do that myself, but was unable to find a way to do it.
@dummy = change = sign(type1 - @previous) ,@previous = type1
October 2, 2007 at 6:20 am
Heh, yeah... the key was the @Dummy variable in this case. Needed it to get by the mix of columns and variables because the value wasn't necessary. The other key is the index hint on an index that sorts in the order you need.
Thanks for the feedback, Bob. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2007 at 6:55 am
Good proof that index maintenance really DOES have a price! I like the update mechanism you used too Jeff.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 2, 2007 at 7:05 am
It's going to take a little to fix the code. Should have it done a little later today. Let you know the results. Now for the other nightmare issue I have.
Since I often mix days and weeks, I need to do something like the following.
Date,Week,Month,Daily_Type1,Monthly_Type2
2006-11-29 00:00:00.000,5577,1283,0.176767241809893,NULL
2006-11-30 00:00:00.000,5578,1283,0.206843646769383,0.467677951418334
Now you see that the value for Monthly_Type2 is null because I don't have a value for that date, but 11/30 is the monthly value.
Since they are both the same month, I need to back fill in the nulls with the value for that month. Any other really cool tricks like the previous one to do that?
UPDATE #ComplexFilter
SET Monthly_FM= MonthlyValue.Monthly_FM,Monthly_FM_Change= MonthlyValue.Monthly_FM_Change
FROM #ComplexFilter
INNER JOIN #ComplexFilter AS MonthlyValue
ON #ComplexFilter.MonthNumber = MonthlyValue.MonthNumber
AND MonthlyValue.StoredCalcDataID = #ComplexFilter.StoredCalcDataID
AND MonthlyValue.Monthly=1
AND MonthlyValue.Monthly_FM IS NOT NULL
AND MonthlyValue.Monthly_FM_Change IS NOT NULL
WHERE #ComplexFilter.Monthly=0
October 2, 2007 at 7:46 am
Yes I would agree but not in this case. The process I'm referring to deals with a large volume of data.
Thanks for your reply
Kurt
DBA
RHWI, Inc
Poughkeepsie, NY
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
October 2, 2007 at 8:38 am
I was able to update 11M+ rows 3x and Update every 20th row (months) 1x
in 210 seconds. I am going to see if I can get this to do this in no more than 1 pass per day,week, month but at this point that is an improvement. I am re-running now to see how significant. But I am thinking it is about a 600% improvement.
October 2, 2007 at 10:05 am
DECLARE @Dummy INT,@PrevID INT
SELECT @PrevID = 0
DECLARE @PreDaily_D FLOAT
DECLARE @PreDaily_M FLOAT
DECLARE @PreDaily_RM FLOAT
UPDATE #ComplexFilter
SET
@PreDaily_D = CASE WHEN @PrevID = StoredCalcDataID THEN @PreDaily_D ELSE NULL END,
@PreDaily_M = CASE WHEN @PrevID = StoredCalcDataID THEN @PreDaily_M ELSE NULL END,
@PreDaily_RM = CASE WHEN @PrevID = StoredCalcDataID THEN @PreDaily_RM ELSE NULL END,
@Dummy=Daily_D_Change=SIGN(Daily_D-@PreDaily_D),@PreDaily_D=Daily_D,
@Dummy=Daily_M_Change=SIGN(Daily_M-@PreDaily_M),@PreDaily_M=Daily_M,
@Dummy=Daily_RM_Change=SIGN(Daily_RM-@PreDaily_RM),@PreDaily_RM=Daily_RM,
@PrevID = StoredCalcDataID
FROM #ComplexFilter WITH (INDEX(#ComplexFilter_PK))
WHERE DAILY=1
SELECT @PrevID = 0
DECLARE @PreMonthly_FM FLOAT
UPDATE #ComplexFilter
SET
@PreMonthly_FM = CASE WHEN @PrevID = StoredCalcDataID THEN @PreMonthly_FM ELSE NULL END,
@Dummy=Monthly_FM_Change=SIGN(Monthly_FM-@PreMonthly_FM),@PreMonthly_FM=Monthly_FM,
@PrevID = StoredCalcDataID
FROM #ComplexFilter WITH (INDEX(#ComplexFilter_PK))
WHERE MONTHLY=1
FYI... Now it is down to only 105 seconds. Which with my past method took about 800 seconds for a 4 field 11M row update.
Thanks Jeff!!! That's an improvement.
October 2, 2007 at 12:31 pm
Problem 2.
Date,Week,Month,Daily_Type1,Monthly_Type2
2006-11-29 00:00:00.000,5577,1283,0.176767241809893,NULL
2006-11-30 00:00:00.000,5578,1283,0.206843646769383,0.467677951418334
Drop PK index
create unique clustered index #TmpPK on #ComplexFilter(AsOfDate DESC) -- Change order
UPDATE #ComplexFilter
SET
@CurrMonthly_Type2 = CASE WHEN Monthly_Type2 IS NULL THEN @CurrMonthly_Type2 ELSE Monthly_Type2 END,
Monthly_Type2 = @CurrMonthly_Type2
FROM #ComplexFilter WITH (INDEX (#TmpPK ))
-- Apparently you don't need the @dummy field (at least in 2k5) Nor the HINT since default would be to go in order of CLUSTERED INDEX.
Now I am in the process of figuring out if the dropping and creating of the index is FASTER than the old UPDATE, but I'll let you know.
October 2, 2007 at 3:09 pm
An Update
The @Dummy= is required. Well at least if you want the value to be right 🙂 The statement will run, but the value will be wrong. Not sure exactly what happens, but it isn't what it should be.
October 2, 2007 at 4:34 pm
Just a little warning... you are using an undocumented behaviour and a SP could belly up your project. :hehe:
Other than that the use of variables in update statements have been stable for the last releases but never documented..
Cheers,
* Noel
October 2, 2007 at 5:20 pm
hmmm. Well, I guess the good news here is that this code isn't controlling a missile launch system or anything like that. However, I will keep that in mind. Thanks.
October 2, 2007 at 5:23 pm
What is it for?
October 2, 2007 at 5:43 pm
TheSQLGuru (10/2/2007)
Good proof that index maintenance really DOES have a price! I like the update mechanism you used too Jeff.
Thank you much!
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2007 at 5:46 pm
Bob Fazio (10/2/2007)
I was able to update 11M+ rows 3x and Update every 20th row (months) 1xin 210 seconds. I am going to see if I can get this to do this in no more than 1 pass per day,week, month but at this point that is an improvement. I am re-running now to see how significant. But I am thinking it is about a 600% improvement.
Now, we're cookin'... nice job, Bob...
You can make other variables/formula combination in the same update if you want to try to get it down to a single pass... it won't cost but a bit performance wise.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 45 (of 58 total)
You must be logged in to reply to this topic. Login to reply