June 3, 2009 at 8:31 pm
Heh... in case anyone is wondering if I simply disappeared on this, the answer is "No". It's just that the freakin' cursor is taking me so long to write. I have to wash my mouth out every 5 minutes because I can't stand the taste. 😛
I'll be back with the typical million row solution shortly.... hang in there.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 3, 2009 at 9:32 pm
Back in a sec... virus software update kicked in. Wouldn't make for a fair test.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 3, 2009 at 10:52 pm
The following code creates a million row multiple account checkbook, does the "Fire Hose" cursor running total thing 5 times and follows that with the "Pseudo Cursor" method 5 times. After that, the graphic I captured from that code running on Profiler on my poor old 7 year old P4 1.8GHz 1GB Ram desktop computer running 2k5 DE sp2. As usual, the details are in the embedded comments in the code.
[font="Courier New"]--===== Create the test table and the necessary indexes
--===== Do this in a nice safe place
USE TempDB
--===== Create a "check book" table with a ratio of deposits
-- to withdrawals and constraints on the amounts for both.
-- This time, we include transaction dates and we don't
-- care about the order of the transactions. We also
-- include different account numbers.
-- (All Versions)
--==============================================================================
--===== Conditionally drop the test table
-- (so we can rerun the test code as needed)
IF OBJECT_ID('TempDB..CheckBook') IS NOT NULL
DROP TABLE CheckBook
--===== Declare some obviously named variables
DECLARE @StartDate DATETIME,
@EndDate DATETIME, --Upto and NOT including this date
@Days INT, --This will be the "range"
@Counter INT --Used in the While Loop to make page splits
--===== Preset the start date, end date, and calculate the range
-- in days. Since we don't want the end date to be
-- included, we DIDN'T add 1 to the range.
SELECT @StartDate = '01 Jan 2000',
@EndDate = '01 Jan 2010',
@Days = DATEDIFF(dd,@StartDate,@EndDate)
--===== Create and populate the table on the fly.
-- Produces approximately 10,000 rows for each of 1,000 accounts
SELECT TOP 1000000
IDENTITY(INT,1,1) AS CheckBookID,
ABS(CHECKSUM(NEWID()))%1000+1 AS AccountNumber, --1000 accounts
CAST(--= Roughly every tenth row (avg) will be a deposit
CASE ABS(CHECKSUM(NEWID())) % 10 -- 0-9 (10 digits)
WHEN 0 --Occurs 1 out of ten times on average
THEN RAND(CHECKSUM(NEWID())) * 500 + 1000 --Deposits
ELSE RAND(CHECKSUM(NEWID())) * (-100) --Withdrawals
END
AS DECIMAL (9,2)) AS Amount,
ABS(CHECKSUM(NEWID())) % @Days + @StartDate --Random DateTime
AS TransactionDate
INTO dbo.CheckBook
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2
--===== Create the initial deposit(s) for each account.
-- Everybody gets at least 1 and maybe more on the
-- first day (all first day trans are deposits)
UPDATE CheckBook
SET Amount = RAND(CHECKSUM(NEWID())) * 500 + 1000
FROM CheckBook tt
INNER JOIN
(--==== Find the initial date for each account
SELECT AccountNumber,
MIN(TransactionDate) AS MinTransactionDate
FROM CheckBook
GROUP BY AccountNumber
) firstdate
ON tt.AccountNumber = firstdate.AccountNumber
AND tt.TransactionDate = firstdate.MinTransactionDate
--===== Create the PK like many folks would do it to prevent
-- fragging the table on inserts. Pretty tough on SELECT's, though
ALTER TABLE dbo.CheckBook
ADD CONSTRAINT PK_CheckBook_CheckBookID
PRIMARY KEY CLUSTERED (CheckBookID)
WITH FILLFACTOR = 100 --Actually, this is the default on most systems
--===== Add a nonclustered index to support SELECTs and (ugh!) Cursors
-- that have an ORDER BY
CREATE NONCLUSTERED INDEX IX_CheckBook_AccountNumber_TransactionDate_CheckBookID
ON dbo.CheckBook (AccountNumber, TransactionDate, CheckBookID)
WITH FILLFACTOR = 90
--===== Let's see what the distribution of rows is for each account
-- and what the total number of rows is.
-- This will also see if the index is up to snuff.
SELECT CASE
WHEN GROUPING(cb.AccountNumber) = 0
THEN CAST(cb.AccountNumber AS CHAR(5))
ELSE 'Total'
END AS AccountNumber,
COUNT(*) AS NumberOfRows
FROM dbo.CheckBook cb
GROUP BY cb.AccountNumber WITH ROLLUP
ORDER BY cb.AccountNumber
GO
--==============================================================================
GO 2
--The "Fire Hose" Cursor Running Total Solution
--===============================================================================
--===== Supress the auto-display of rowcounts or the cursor will be even slower.
SET NOCOUNT ON
--===== Create the necessary Temp table to store the result set so we can use
-- the result set as a table if we need to. This will be used for the
-- cursor run. The Psuedo-Cursor run will build one on the fly.
-- The condition drop is just for reruns. It wouldn't be needed
-- in production but it sure helps troubleshooting.
IF OBJECT_ID('TempDB..#cWork','U') IS NOT NULL
DROP TABLE #cWork
--===== Create the cursor result set table using the "zero row" trick
SELECT CAST(CheckBookID AS INT) AS CheckBookID, --Strips of IDENTITY
AccountNumber,
Amount,
TransactionDate,
CAST(0 AS DECIMAL(9,2)) AS AccountBalance
INTO #cWork
FROM dbo.CheckBook
WHERE 1 = 0 --We just want table structure, no rows
--===== Declare local variables for the cursor run
DECLARE @CheckBookID INT,
@AccountNumber INT,
@Amount DECIMAL(9,2),
@TransactionDate DATETIME,
@CurBalance DECIMAL(9,2),
@PrevBalance DECIMAL(9,2),
@PrevAccount INT
--===== Preset the necessary variables
SELECT @CurBalance = 0,
@PrevBalance = 0,
@PrevAccount = 0
--===== Declare and open a cursor
--Haven't confirmed it, but rumor has it that FAST_FORWARD
--isn't quite as fast as ORWARD_ONLY READ_ONLY
DECLARE curRunningTotal CURSOR LOCAL FORWARD_ONLY READ_ONLY
FOR SELECT CheckBookID,
AccountNumber,
Amount,
TransactionDate
FROM dbo.CheckBook WITH(NOLOCK) --Just to help speed on this test
ORDER BY AccountNumber, TransactionDate, CheckBookID
--===== Open the Cursor and Fetch the first row (Lordy I hate saying that!! ;-))
OPEN curRunningTotal
FETCH NEXT FROM curRunningTotal
INTO @CheckBookID,
@AccountNumber,
@Amount,
@TransactionDate
--===== Now, loop through the cursor and insert the rows into
-- the work table including calculations for the
-- running total
WHILE @@FETCH_STATUS = 0
BEGIN
--===== Do the necessary math for balances that restart
-- when the AccountNumber changes. The also preps
-- the @Prev* variables for the next record.
SELECT @CurBalance =
CASE
WHEN @AccountNumber = @PrevAccount
THEN @PrevBalance + @Amount
ELSE @Amount
END,
@PrevBalance = @CurBalance,
@PrevAccount = @AccountNumber
--===== Insert those calculations as a new row in the result set
INSERT INTO #cWork
(
CheckBookID,
AccountNumber,
Amount,
TransactionDate,
AccountBalance
)
SELECT @CheckBookID AS CheckBookID,
@AccountNumber AS AccountNumber,
@Amount AS Amount,
@TransactionDate AS TransactionDate,
@CurBalance AS AccountBalance
--===== Get the next record (I really hate this ;-))
FETCH NEXT FROM curRunningTotal
INTO @CheckBookID,
@AccountNumber,
@Amount,
@TransactionDate
END --Loop back for the next record.
--======== Clean up the mess from the cursor
CLOSE curRunningTotal
DEALLOCATE curRunningTotal
--===== Add a clustered index to the #cWork table to support SELECT's
CREATE CLUSTERED INDEX IX_#cWork_AccountNumber_TransactionDate_CheckBookID
ON #cWork (AccountNumber, TransactionDate, CheckBookID)
WITH FILLFACTOR = 100
--===== Select the first 2 accounts just to wake us up that it actually finished
-- and so it's easier to see that it actually worked when the account
-- number changed.
SELECT *
FROM #cWork
WHERE AccountNumber <= 2
ORDER BY AccountNumber, TransactionDate
GO 5
--==============================================================================
GO 2
--The "Pseudo-Cursor" Running Total Solution ("Quirky" Update as Phil calls it)
--==============================================================================
--===== Turn the auto-display of rowcounts back on just to give the cursor a
-- chance. ;-)
SET NOCOUNT OFF
--===== Create the necessary Temp table to store the result set so we can use
-- the result set as a table if we need to. The Psuedo-Cursor run will
-- build and populate it on the fly.
-- The condition drop is just for reruns. It wouldn't be needed
-- in production but it sure helps troubleshooting.
IF OBJECT_ID('TempDB..#pcWork','U') IS NOT NULL
DROP TABLE #pcWork
--===== Create the cursor result set table using the "zero row" trick
SELECT CAST(CheckBookID AS INT) AS CheckBookID, --Strips off IDENTITY
AccountNumber,
Amount,
TransactionDate,
CAST(0 AS DECIMAL(9,2)) AS AccountBalance --Creates new column
INTO #pcWork
FROM dbo.CheckBook
--===== Add a clustered index to the #psWork table to support SELECT's.
-- This is also the key to the Pseudo-Cursor or "quirky" update
-- method for doing running totals.
CREATE CLUSTERED INDEX IX_#pcWork_AccountNumber_TransactionDate_CheckBookID
ON #pcWork (AccountNumber, TransactionDate, CheckBookID)
WITH FILLFACTOR = 100
--===== Declare local variables for the cursor run
DECLARE @PrevBalance DECIMAL(9,2),
@PrevAccount INT
--===== Preset the necessary variables
SELECT @PrevBalance = 0,
@PrevAccount = 0
--===== Let's rock. Do the running balance using a single
-- update as a Pseudo-Cursor.
UPDATE #pcWork
SET @PrevBalance = AccountBalance = CASE
WHEN AccountNumber = @PrevAccount
THEN @PrevBalance + Amount
ELSE Amount
END,
@PrevAccount = AccountNumber
FROM #pcWork WITH(INDEX(0),TABLOCKX) --Warm fuzzy options
OPTION (MAXDOP 1) --Just to be absolutely safe
--===== Select the first 2 accounts to do the same as the cursor
-- and so it's easier to see that it actually worked when the account
-- number changed.
SELECT *
FROM #pcWork
WHERE AccountNumber <= 2
ORDER BY AccountNumber, TransactionDate
GO 5[/font]
Here's the performance stats from the profiler run I setup to look at the single spid this all ran on...
... and, here's the code that proves they both came up with the same running balances.
[font="Courier New"] SELECT c.*,pc.*
FROM #cWork c
INNER JOIN #pcWork pc
ON c.CheckBookID = pc.CheckBookID
AND c.AccountBalance <> pc.AccountBalance
ORDER BY pc.AccountNumber, pc.TransactionDate[/font]
Like Adam said, I do use these for speed. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2009 at 2:50 am
Hey, Jeff is back! 😀 Great stuff Jeff.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 4, 2009 at 2:53 am
Lynn Pettis (5/28/2009)
Okay! Any one in Germany near Frankfurt? Maybe has a teen age (around 18 year old) son or daughter (preferably)?And yes I am serious. I have a teenage daughter who paid for her own trip to Germany and is currently not having a very good time like she had hoped.
Lynn, if you can get her to London - flights from Frankfurt to London City used to be quite cheap - I'll send my 18-year old daughter down from Manchester. My flat in Leyton is empty at the mo, they could share and explore London together.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 4, 2009 at 4:24 am
Hey Jeff,
Really neat demonstration. Love it. Lots of cool optimizations too.
The ultimate demo is where the original table already happens to have a suitable clustered index (it does happen!) and an existing column for the running total needs to be populated (though adding a NULLable column is fast too). This results in a plan with one clustered index scan, a top operator (for halloween protection), a compute scalar (for the maths), and a clustered index update. It doesn't get much better than that. On my (2GHz single-core Pentium M) laptop, that scenario completes in just 4,348 logical reads and 2,391ms (with a warm cache).
BTW: I use INDEX(1) instead of (INDEX(0)) for the in-place update to ensure I get a linked-list ordered scan (to avoid a sort) rather than the IAM scan that happens with TABLOCK or at READ UNCOMMITTED. For sure, that's not an issue for your script - I think you even optimized for the IAM scan knowing that the table was pretty much guaranteed to be in physical order. Nice one.
Quite shocking to see a cursor authored by Mr RBAR though, even if it was for demonstration purposes. 😉
Paul
June 4, 2009 at 5:01 am
Paul White (6/3/2009)
One thing to mention is that the article is the first of a series
Well, actually it's the second in a series.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 4, 2009 at 5:09 am
Jeff Moden (6/3/2009)
The following code creates a million row multiple account checkbook, does the "Fire Hose" cursor running total thing 5 times and follows that with the "Pseudo Cursor" method 5 times. After that, the graphic I captured from that code running on Profiler on my poor old 7 year old P4 1.8GHz 1GB Ram desktop computer running 2k5 DE sp2. As usual, the details are in the embedded comments in the code.
Jeff, I have a (simple) question. Is this the million-row test you were whipping up for Adam, or your approach for Itzik's article --- or were you able to handle both of these in one?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 4, 2009 at 5:15 am
Paul White (6/4/2009)
BTW: I use INDEX(1) instead of (INDEX(0)) for the in-place update to ensure I get a linked-list ordered scan (to avoid a sort) rather than the IAM scan that happens with TABLOCK or at READ UNCOMMITTED. For sure, that's not an issue for your script - I think you even optimized for the IAM scan knowing that the table was pretty much guaranteed to be in physical order. Nice one.
Jeff's technique depends on not getting an IAM scan, not having the data fetched in any order than that of the clustering key (index order scan) and not having the optimiser use any order-mangling operations. If any of those happen, the 'quirky' update can give incorrect values.
That's why the restrictions are no partitioning, no parallelism.
I asked about the IAM scan a while back and apparently it will not happen on updates. Sorry, no public blog I can refer you to, was a conversation with someone on the optimiser team.
Way to tell that it's index order, not allocation order - check the order property of the clustered index scan in the execution plan. Ordered = No is an allocation order scan
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 4, 2009 at 5:41 am
GilaMonster (6/4/2009)
Jeff's technique depends on not getting an IAM scan, not having the data fetched in any order than that of the clustering key (index order scan) and not having the optimiser use any order-mangling operations. If any of those happen, the 'quirky' update can give incorrect values.That's why the restrictions are no partitioning, no parallelism.
I asked about the IAM scan a while back and apparently it will not happen on updates. Sorry, no public blog I can refer you to, was a conversation with someone on the optimiser team.
Way to tell that it's index order, not allocation order - check the order property of the clustered index scan in the execution plan. Ordered = No is an allocation order scan
Interesting - knew all that except the no IAM scan in an update plan. Seems odd because I get that sort in my UPDATE plan with an INDEX(0) hint, but no sort with INDEX(1).
Perhaps someone forgot to tell the QO about that behaviour.
I thought TABLOCK/NOLOCK = IAM scan? I wonder why Jeff's hints are as they are then?
BTW - I don't trust the Ordered = No thing, since I read http://www.sqlmag.com/Article/ArticleID/92887/sql_server_92887.html
There's an explicit example there of ordered:false doing a logical order scan.
edit: pesky word 'not' crept in there
June 4, 2009 at 6:00 am
Paul White (6/4/2009)
Interesting - knew all that except the no IAM scan in an update plan.
Think it was that. I know it was a discussion on 'ordering' in updates related to exactly this query and why it seems to work. (in case you haven't figured it out, I don't like this query form and I do have my reasons).
Can't find the conversation, can't completely recall details (was about 8 or so months ago) and even if I could find the conversation, I wouldn't be able to post it here.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 4, 2009 at 6:02 am
Jeff, after you install the hand rails, you'll also need one of these:
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 4, 2009 at 6:07 am
Paul White (6/4/2009)
Way to tell that it's index order, not allocation order - check the order property of the clustered index scan in the execution plan. Ordered = No is an allocation order scanInteresting - knew all that except the no IAM scan in an update plan. Seems odd because I get that sort in my UPDATE plan with an INDEX(0) hint, but no sort with INDEX(1).
Perhaps someone forgot to tell the QO about that behaviour.
I thought TABLOCK/NOLOCK = IAM scan? I wonder why Jeff's hints are as they are then?
BTW - I don't trust the Ordered = No thing, since I read http://www.sqlmag.com/Article/ArticleID/92887/sql_server_92887.html
There's an explicit example there of ordered:false doing a logical order scan.
edit: pesky word 'not' crept in there
Here's the reason why I use Index(0) instead of Index(1). From BOL:
If a clustered index exists, INDEX(0) forces a clustered index scan and INDEX(1) forces a clustered index scan or seek.
It's overkill, but it makes people feel better about using the method. Just don't try it with any non-clustered index because they can go "merry-go-round" on you... heh... well, almost never. I'm still doing some experiments on that.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2009 at 6:11 am
Grant Fritchey (6/4/2009)
Jeff, after you install the hand rails, you'll also need one of these:
BWAA-HAAA!!! Thanks for the help. Looks like I'm gonna need the hand rails after all, huh? 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2009 at 6:31 am
Jeff, has your avatar something in common with this?
http://www.youtube.com/watch?v=vF4iWIE77Ts
It's incredible I didn't notice it before!!
-- Gianluca Sartori
Viewing 15 posts - 5,311 through 5,325 (of 66,738 total)
You must be logged in to reply to this topic. Login to reply