January 6, 2012 at 5:38 pm
Jeff Moden (1/6/2012)
It would also be interesting if that problem causes a regular update to fail for that many rows... perhaps, just using ROW_NUMBER() over the PK.
Not sure if this is what you had in mind, but a quick test in 256MB of this:
;
WITH cteBS AS
(
SELECT
RowCounter = ROW_NUMBER() OVER(
ORDER BY sm.LocID, sm.ProdID, sm.StkTransDate, sm.TranID),
*
FROM dbo.StockMovements AS sm
)
UPDATE cteBS WITH (TABLOCKX)
SET StkLocProdRunTotal = RowCounter
OPTION (MAXDOP 1)
...runs successfully and correctly, with:
Table 'StockMovements'.
Scan count 1, logical reads 30947773, physical reads 46407, read-ahead reads 46487
Table 'Worktable'.
Scan count 1, logical reads 30523792, physical reads 56203, read-ahead reads 0
(10090000 row(s) affected)
By correctly, I mean this returns no rows after the UPDATE:
WITH cteBS AS
(
SELECT
RowCounter = ROW_NUMBER() OVER(
ORDER BY sm.LocID, sm.ProdID, sm.StkTransDate, sm.TranID),
*
FROM dbo.StockMovements AS sm
)
SELECT *
FROM cteBS
WHERE RowCounter <> StkLocProdRunTotal
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 6, 2012 at 7:10 pm
That's about what I had in mind. Thanks, Paul.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 7, 2012 at 7:12 am
SQL Kiwi (1/6/2012)
Peter Brinkhaus (1/6/2012)
Configuration 256MB: the script doesn't terminate. The process is blocking itself.The way I read your comments makes it sound like this mysterious self-blocking only occurs when TF 652 is OFF, would that be correct? Anything extra you can share about the nature of the blocking? (things like wait types and so on). I'm in the process of setting up a 32-bit VM here so I can install 32-bit SQL Servers and have a go myself.
I finally got the same results as on SQL Server 2005. Results (all runs with a cold data cache):
run 1: max. server memore 512 MB, traceflag 652=off: division by zero
run 2: max. server memore 512 MB, traceflag 652=on: succeeded
Table 'StockMovements'. Scan count 1, logical reads 30316289, physical reads 71123, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 66376, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 126251 ms, elapsed time = 181798 ms.
run 3: max. server memore 1024 MB, traceflag 652=off: succeeded
Table 'StockMovements'. Scan count 1, logical reads 30947108, physical reads 19, read-ahead reads 46496, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 100355 ms, elapsed time = 101714 ms.
run 4: max. server memore 1024 MB, traceflag 652=on: succeeded
Table 'StockMovements'. Scan count 1, logical reads 30316289, physical reads 5813, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 99685 ms, elapsed time = 106168 ms.
run 5: max. server memore 256 MB, traceflag 652=off: division by zero
run 6: max. server memore 256 MB, traceflag 652=on: succeeded
Table 'StockMovements'. Scan count 1, logical reads 30316289, physical reads 82415, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 66376, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 113818 ms, elapsed time = 161112 ms.
Note: initially I restarted the server after changing 'max. server memory' (according to BOL: To reduce the max server memory you may need to restart SQL Server to release the memory.). However I was not able to create table StockMovements using the script because SELECT TOP 8000000 ... INTO StockMovements ... caused a deadlock both with max. server memory 256MB and 1024MB. So I started with 512MB, created the testtables using the script and changed memory to 1024MB and 256MB without restarting the server.
In case of interest, the deadlock was caused by trying to convert a lock to an exclusive lock on tempdb.sys.syscolpars. Ouput from DBCC TRACEON(1222, -1):
Date,Source,Severity,Message
01/07/2012 11:57:08,spid27s,Unknown,waiter id=process3612e38 mode=X requestType=convert
01/07/2012 11:57:08,spid27s,Unknown,waiter-list
01/07/2012 11:57:08,spid27s,Unknown,owner id=process3612e38 mode=S
01/07/2012 11:57:08,spid27s,Unknown,owner id=process3612e38 mode=U
01/07/2012 11:57:08,spid27s,Unknown,owner-list
01/07/2012 11:57:08,spid27s,Unknown,keylock hobtid=562949956108288 dbid=2 objectname=tempdb.sys.syscolpars indexname=nc id=lock5da31c0 mode=U associatedObjectId=562949956108288
01/07/2012 11:57:08,spid27s,Unknown,resource-list
01/07/2012 11:57:08,spid27s,Unknown,CROSS JOIN sys.all_columns ac3
01/07/2012 11:57:08,spid27s,Unknown,CROSS JOIN sys.all_columns ac2
01/07/2012 11:57:08,spid27s,Unknown,FROM sys.all_columns ac1
01/07/2012 11:57:08,spid27s,Unknown,INTO StockMovements
01/07/2012 11:57:08,spid27s,Unknown,StkLocProdRunTotal = 0
01/07/2012 11:57:08,spid27s,Unknown,StkTransDate = Dateadd(dd<c/> Abs(Checksum(Newid()))%Datediff(dd<c/> '2009'<c/> '2011')<c/> '2009')<c/>
01/07/2012 11:57:08,spid27s,Unknown,Qty = -10 + CAST(Rand(Checksum(Newid())) * 10 AS INT)<c/>
01/07/2012 11:57:08,spid27s,Unknown,LocID = 1 +CAST(Abs(Checksum(Newid()) % 9 ) AS INT)<c/>
01/07/2012 11:57:08,spid27s,Unknown,ProdID = 1 +CAST(Abs(Checksum(Newid()) % 9000 ) AS INT)<c/>
01/07/2012 11:57:08,spid27s,Unknown,TranID = IDENTITY(INT<c/> 1<c/> 1)<c/>
01/07/2012 11:57:08,spid27s,Unknown,SELECT TOP 8000000---- NOTE 8 MILLION rows
01/07/2012 11:57:08,spid27s,Unknown,--OUTWARD MOVEMENTS
01/07/2012 11:57:08,spid27s,Unknown,inputbuf
01/07/2012 11:57:08,spid27s,Unknown,CROSS JOIN sys.all_columns ac3
01/07/2012 11:57:08,spid27s,Unknown,CROSS JOIN sys.all_columns ac2
01/07/2012 11:57:08,spid27s,Unknown,FROM sys.all_columns ac1
01/07/2012 11:57:08,spid27s,Unknown,INTO StockMovements
01/07/2012 11:57:08,spid27s,Unknown,StkLocProdRunTotal = 0
01/07/2012 11:57:08,spid27s,Unknown,StkTransDate = Dateadd(dd<c/> Abs(Checksum(Newid()))%Datediff(dd<c/> '2009'<c/> '2011')<c/> '2009')<c/>
01/07/2012 11:57:08,spid27s,Unknown,Qty = -10 + CAST(Rand(Checksum(Newid())) * 10 AS INT)<c/>
01/07/2012 11:57:08,spid27s,Unknown,LocID = 1 +CAST(Abs(Checksum(Newid()) % 9 ) AS INT)<c/>
01/07/2012 11:57:08,spid27s,Unknown,ProdID = 1 +CAST(Abs(Checksum(Newid()) % 9000 ) AS INT)<c/>
01/07/2012 11:57:08,spid27s,Unknown,TranID = IDENTITY(INT<c/> 1<c/> 1)<c/>
01/07/2012 11:57:08,spid27s,Unknown,SELECT TOP 8000000---- NOTE 8 MILLION rows
01/07/2012 11:57:08,spid27s,Unknown,frame procname=adhoc line=3 stmtstart=46 sqlhandle=0x02000000a0c08108c99c578ed4a30afece45a28e8b418372
01/07/2012 11:57:08,spid27s,Unknown,executionStack
01/07/2012 11:57:08,spid27s,Unknown,process id=process3612e38 taskpriority=0 logused=2804 waitresource=KEY: 2:562949956108288 (8c02f49baf53) waittime=1071 ownerId=11282 transactionname=droptemp lasttranstarted=2012-01-07T11:57:07.673 XDES=0x6b5c280 lockMode=X schedulerid=4 kpid=3636 status=suspended spid=54 sbid=0 ecid=0 priority=0 trancount=2 lastbatchstarted=2012-01-07T11:56:51.790 lastbatchcompleted=2012-01-07T11:56:37.883 clientapp=Microsoft SQL Server Management Studio - Query hostname=HELSINKI hostpid=7952 loginname=HELSINKI\Peter isolationlevel=read committed (2) xactid=10950 currentdb=2 lockTimeout=4294967295 clientoption1=671090784 clientoption2=390200
01/07/2012 11:57:08,spid27s,Unknown,process-list
01/07/2012 11:57:08,spid27s,Unknown,deadlock victim=process3612e38
01/07/2012 11:57:08,spid27s,Unknown,deadlock-list
Looking at the owner- and waiting-list, the query seems to deadlock itself. Anybody got an idea why?
January 7, 2012 at 7:21 am
So what is the conclusion here? That you shouldn't try to do "enterprise-scale" quirky updates on a box that has less memory than most of today's handheld scientific calculators? 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
January 7, 2012 at 7:47 am
Jeff Moden (1/7/2012)
So what is the conclusion here? That you shouldn't try to do "enterprise-scale" quirky updates on a box that has less memory than most of today's handheld scientific calculators? 😀
I'll leave the conclusion to those considering to use the quirky update. I never had the need for it. I am just trying to give Paul some answers he asked for. And, as the deadlock problem shows, things can go well with less memory than most of today's handheld scientific calculators while fail with more.
January 7, 2012 at 8:16 am
Peter Brinkhaus (1/7/2012)
I finally got the same results as on SQL Server 2005. Results (all runs with a cold data cache):
Fantastic work Peter, and luckily no surprises, so great confirmation.
Looking at the owner- and waiting-list, the query seems to deadlock itself. Anybody got an idea why?
I originally misunderstood you - I thought you meant the quirky update ended up blocking itself. The deadlocked script looks very much like a bug (hopefully long since fixed).
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 7, 2012 at 8:38 am
Jeff Moden (1/7/2012)
So what is the conclusion here? That you shouldn't try to do "enterprise-scale" quirky updates on a box that has less memory than most of today's handheld scientific calculators? 😀
I think that conclusion, smiley notwithstanding, risks offending those that have spent significant personal time looking into this.
My reasonably firm belief right now, is that SQL Server (since at least 2005 and on both processor architectures) exhibits behaviour that breaks the quirky update when memory is low enough so that (a) significant read-ahead is required to bring pages into memory; and (b) there is some kind of contention between what read-ahead is trying to do and other memory management features.
So, my expectation is that this behaviour is likely to occur on a server with any amount of RAM, if the server as a whole is under memory pressure. The only sensible move at this point seems to be to recommend that the quirky update is only ever used with the safety check in place. Without the safety check, as we have seen, the quirky update can produce wrong results silently, even where all the published rules have been followed exactly.
I have experienced test failures on 64-bit (where structures and memory use is generally larger than on 32-bit) with as much as 1GB max server memory set, with no concurrent workload obviously. Even if the point about larger memory systems under memory pressure is disputed, there will be many x64 systems out there running 1GB or so - think VMs, lower SKUs, that sort of thing. If this were my method, I would be taking all this pretty seriously. After all, the test table in question is only a few million rows in about 350MB - hardly up in any exclusive "Enterprise-scale" realm.
To go much deeper, I would need to do some work with a debugger attached to SQL Server - but that is extremely time-intensive. Tests on a larger amount of memory with concurrent workload (memory pressure) should be easier.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 7, 2012 at 10:45 am
SQL Kiwi (1/7/2012)
I think that conclusion, smiley notwithstanding, risks offending those that have spent significant personal time looking into this.
Sorry ol' friend. Didn't mean to come across as snarky. I realize that you good folks have put in a good amount of work on this subject and my comment wasn't meant to devalue that in any way.
However, and without any offense to anyone, there have been a large number of posts on how to do massive deletes and updates and why they should be segmented to fewer numbers of rows even on enterprise-class machines. For example, updating a million rows on a given machine might take 3 seconds. Updating 2 million rows on the same machine might take only double that or 6 seconds. Yet, when updating only 3 million rows, instead of it taking only triple the time (9 seconds), it might take 20 minutes to 2 hours to do the single update (numbers came from a couple of personal experiences). It just seems incredibly odd to me that the participants of this thread haven't mentioned that, quirky update or not, it's not a good thing to do a single 10 million row update on a machine that's not much more capable than a toaster or on a machine that is already under memory pressure.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 7, 2012 at 10:58 am
As I have said earlier...and for the record
I haven't had any issues before now...and this occurrence ONLY materialized on a memory compromised PC....it has never occurred on our servers.
The reason why I raised it was because I was interested in understanding why it was failing on the PC...and I believe that others (Paul in particular) have identified the cause....for which I very much appreciate their efforts and detailed analysis.
I always use the safety check...and this is what will save my ar5e...if and when a server comes under memory pressure. (these updates are only performed out of hours)
@jeff..."it's not a good thing to do a single 10 million row update on a machine that's not much more capable than a toaster".... :-P:-P:-P......I do understand your sentiment and will be using your analogy to ask for a new PC.
Thanks to all.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
January 7, 2012 at 11:21 am
SQL Kiwi (1/7/2012)
The only sensible move at this point seems to be to recommend that the quirky update is only ever used with the safety check in place.
P.S. I almost forgot... I absolutely agree with that statement and have since you made that particular change on the related thread.
Shifting gears... I wonder if the old index hint that I used (you were actually a bit disappointed that I dumped the hint, IIRC) to use would fix the problems you good folks are running into with shy amounts of memory.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 7, 2012 at 11:59 am
as I spawned this thread...thought it only polite to report back
based on my pc (with the memory of a toaster) .......
using traceflag 652=ON.....works every time
using WITH (INDEX(0)) in the cte...works every time
though not surprisingly.....far slower in bot cases.
regards
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
January 7, 2012 at 1:30 pm
Sorry about delay. Results from home lappy, specs posted earlier:
/* maxmem 3072:
Table 'StockMovements'. Scan count 1, logical reads 30947106, physical reads 7, read-ahead reads 70, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 107968 ms, elapsed time = 110757 ms.
*/
/* 1024:
Table 'StockMovements'. Scan count 1, logical reads 30947106, physical reads 7, read-ahead reads 70, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 107890 ms, elapsed time = 110236 ms.
*/
/* 512:
run 1. Msg 8134, Level 16, State 1, Line 104 Divide by zero error encountered.
run 2. Msg 8134, Level 16, State 1, Line 104 Divide by zero error encountered.
run 3. Msg 8134, Level 16, State 1, Line 104 Divide by zero error encountered.
DBCC TRACEON (652, -1)
run 1Table 'StockMovements'. Scan count 1, logical reads 30316287, physical reads 70767, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 66375, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
run 2Table 'StockMovements'. Scan count 1, logical reads 30316287, physical reads 70777, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 30596864, physical reads 66370, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
DBCC TRACEOff (652, -1)
run 1. Msg 8134, Level 16, State 1, Line 105 Divide by zero error encountered.
run 2. Msg 8134, Level 16, State 1, Line 105 Divide by zero error encountered.
*/
No deadlocks were detected.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 7, 2012 at 10:18 pm
ChrisM@home (1/7/2012)
Sorry about delay. Results from home lappy, specs posted earlier:[/code]
Thanks Chris. I am able to get an error with max memory = 1000MB, but not 1024MB 🙂
Intuitively, the cut-off seems likely to vary with the size of the table, among other things. Thanks again for the test - I think we have enough data points now to say at least that the issue is not limited to a particular architecture or SQL Server version.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 7, 2012 at 11:08 pm
J Livingston SQL (1/7/2012)
as I spawned this thread...thought it only polite to report backbased on my pc (with the memory of a toaster) .......
using traceflag 652=ON.....works every time
using WITH (INDEX(0)) in the cte...works every time
though not surprisingly.....far slower in bot cases.
Thank you also. With INDEX(0) I get two sorts (no table spool) in the plan; (INDEX(0) forces an unordered scan of the clustered index). The first sort is to get rows into order for the safety check (LocID, ProdID, StkTransDate, TranID), the second sort is a minor sort on the unique-ifer (LocID, ProdID, StkTransDate, TranID, unique-ifier) for the update. Defining the clustered index as unique would avoid this, by the way. I get the safety-check error with at both 512MB and 256MB on x64 R2, just to complete the picture.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 7, 2012 at 11:14 pm
Jeff Moden (1/7/2012)
Sorry ol' friend. Didn't mean to come across as snarky. I realize that you good folks have put in a good amount of work on this subject and my comment wasn't meant to devalue that in any way.
That's ok, I have relaxed about it a bit now, apologies for stating my point a bit too strongly, perhaps.
However, and without any offense to anyone, there have been a large number of posts on how to do massive deletes and updates and why they should be segmented to fewer numbers of rows even on enterprise-class machines. For example, updating a million rows on a given machine might take 3 seconds. Updating 2 million rows on the same machine might take only double that or 6 seconds. Yet, when updating only 3 million rows, instead of it taking only triple the time (9 seconds), it might take 20 minutes to 2 hours to do the single update (numbers came from a couple of personal experiences). It just seems incredibly odd to me that the participants of this thread haven't mentioned that, quirky update or not, it's not a good thing to do a single 10 million row update on a machine that's not much more capable than a toaster or on a machine that is already under memory pressure.
This is a valid point. I'm still curious about the exact mechanism involved because the thing about Quirky Update is that we've never had a really good repro of it failing with all the best practices followed before (leaving aside a fairly artificial example Hugo provided once). It is technically interesting to me, that's the biggest part of it. I have no issue with people making an informed decision to use QU.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 31 through 45 (of 55 total)
You must be logged in to reply to this topic. Login to reply