January 24, 2007 at 2:03 pm
Have used NOLOCK for years for batch processing requirements to attempt to avoid the impact of on-going updates when these updates do not impact the underlying batch process(different columns in same table(s) being updated).
Have seen a nasty side effect of NOLOCK in very high transaction environment (multiple updates per second) with queries being executed multiple times per second: the SELECT statement used for the NOLOCK can fail due to "block read" issues. Apparently the optimizer can get confused on which blocks to read (UNCOMMITED versus COMMITTED).
January 24, 2007 at 2:14 pm
"I would add that a hint is just that "a hint" not 100% of the time you will get what you asked for"
This seems to be a fairly common misconception. Even though they are called Hints, they are not mere suggestions to the optimizer. They OVERRIDE the optimizers default behavior--absolutely! There's no "maybe" about Hints. Which is why they should be used with care and only by those who understand the ramifications.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
January 24, 2007 at 2:26 pm
I operate in Web environment where all READ ONLY functional content is served up out of a SQL Server database and then rendered on the site. All "GET" stored procedures have the SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED statement in them. This has the same effect of putting a (NOLOCK) hint on every table in the SP. If you are using Dynamic SQL from the application layer, JAVA or .Net, the isolation level can be altered at the connection level.
We have found this to be a huge performance benefit and have also seen some unexpected benefits. It makes mass data modifications and content deployments to the site doable while we are up without ill effects on the performance of the site. This ofcourse has the same dangers as mentioned above. The read uncommitted isolation level can result in a dirty read.
January 24, 2007 at 3:28 pm
Just wondering if anyone has used the new SNAPSHOT isolation level in SQL 2005 yet, and if anyone has any comments on it's potential as a replacement for nolock/READ UNCOMMITTED.
From BOL: SET TRANSACTION ISOLATION LEVEL (Transact-SQL)
Specifies that data read by any statement in a transaction will be the transactionally consistent version of the data that existed at the start of the transaction. The transaction can only recognize data modifications that were committed before the start of the transaction. Data modifications made by other transactions after the start of the current transaction are not visible to statements executing in the current transaction. The effect is as if the statements in a transaction get a snapshot of the committed data as it existed at the start of the transaction.
Except when a database is being recovered, SNAPSHOT transactions do not request locks when reading data. SNAPSHOT transactions reading data do not block other transactions from writing data. Transactions writing data do not block SNAPSHOT transactions from reading data.
During the roll-back phase of a database recovery, SNAPSHOT transactions will request a lock if an attempt is made to read data that is locked by another transaction that is being rolled back. The SNAPSHOT transaction is blocked until that transaction has been rolled back. The lock is released immediately after it has been granted.
The ALLOW_SNAPSHOT_ISOLATION database option must be set to ON before you can start a transaction that uses the SNAPSHOT isolation level. If a transaction using the SNAPSHOT isolation level accesses data in multiple databases, ALLOW_SNAPSHOT_ISOLATION must be set to ON in each database.
A transaction cannot be set to SNAPSHOT isolation level that started with another isolation level; doing so will cause the transaction to abort. If a transaction starts in the SNAPSHOT isolation level, you can change it to another isolation level and then back to SNAPSHOT. A transaction starts at the BEGIN TRANSACTION statement.
A transaction running under SNAPSHOT isolation level can view changes made by that transaction. For example, if the transaction performs an UPDATE on a table and then issues a SELECT statement against the same table, the modified data will be included in the result set.
January 24, 2007 at 3:40 pm
If you work with data in a multi-threaded app, you have to synchronize data access. (NOLOCK) turns off half of the synchronization code. using nolock creates a temporal race condition. Since Murphy's law is absolute, the race condition will occur during the worst possible time.
I guess the next topic will about how pinning tables in memory is a good idea.
January 24, 2007 at 4:02 pm
Hi. I agree with the discussion posters to my article that NOLOCK should be used with caution. I attempted to explain this with the followng statement, but perhaps I should have explained in more detail:
"One solution to blocking is to place the NOLOCK query optimiser hint on all select statements where a dirty read is not going to cause problems for your data integrity. You would generally avoid this practice in your Finance stored procedures, for example."
I admit I am a newbie compared to your levels of experience, but I think the tool needs to suit the job. I have worked primarily in companies where data is relatively static, or simply name, address and email is primarily being read. In these scenarios NOLOCK can be used with reasonable safety. I used to work at a amazon-like online retail company with thousands of shoppers. We were using NOLOCK extensively for several years, and we never had a problem using NOLOCK.
If you are working at a bank or financial institution, well that is a different story. NOLOCK would probably get you into a lot of trouble.
If you have the luxtury of having plenty of money to throw at your hardware, of are able to design a system from the ground up then NOLOCK probably won't be necessary. I have worked at companies where money was hard to come by, hardware was limited (and outdated) and systems were already in place and badly designed. NOLOCK is often considered a way out of the resultant deadlocks and blocking problems you will have in this type of situation.
For the other newbies - NOLOCK should never be an alternative to a good design. Use it with caution, and only when your data is static or historical.
I'll definately run the MAXDOP test, and test this out on a production server. My current company is possibly getting a high-end 64bit machine in the new few weeks, so hopefully I can book some time on it before it goes into the live environment. This test has to be run with no-one else on the box, so we'll see how it goes.
And lastly, NOLOCK is one of those things that people will never agree on, so get used to it. Either you love it, or hate it.
Cheers, and thanks for the debate 🙂
Wayne
January 24, 2007 at 6:38 pm
"There's my 2 cents.... I'm Canadian, so does that make mine less valuable?"
Not if you keep your Loonies and Twonies straight.
ATBCharles Kincaid
January 24, 2007 at 6:44 pm
Hi,
Here is my personal takes on nolock.
I am working on a reporting environment where data refreshes occurs once a day. Thereafter, I can see only scores of Queries or Stored procedures SELECT type of activity with only one type of exception, the SSIS loads that concurrently populates a single table from multiple source tables.
For the reports and procedures that merely just read data, even with multiples joins in each and every query as well as a deadlock buster.
WITH (NOLOCK) has been proved to be a speed boost and lock squasher good friend.
Now, When Loading a single reporting table from multiples source tables at the time of night when nobody really cares, I think the SSIS datasource destination "Table Lock" should simply be disabled so any numbers of simultaneous data sources can populate a single table data destination without getting into a useless and disruptive Dead lock and or TIMEOUT situation. All data reconciliations I have done so far across multiple server that were either using or not using this query hint (nolock) were successful. I spend half of my time checking on data quality using control unbiased data extracts and comparisons. For me, It is the key, the check and balance for my reporting instance, I check basic direct adhoc queries results against my complex reporting queries going through all the ETL process almost every day. In most instances, I know when data is good because I know what it was at the source and it balances with what is in the reporting instance. I also works closely with business people and this also let me another level of data quality confidence not just based on system "no error" operations but on constant validation of data accuracy according to business people knowledge.
This is of course a scenario that applies only to a fairly static "reporting" situation. In a situation where someone would have the idea to report against a moving target highly updated OLTP system, I would be very careful. Why would you query such OLTP system besides doing some seldom insurance quality work or trying to save some money on hardware at the expense of your data availability anyways?
Lastly, In a reporting situation, you face all the components of SQL, DB, SSIS, Reports, Cubes, Jobs etc. So. To minimize worldwide business users impact, you use a couple strategies. Load multiple parallel sources into a staging table, then append the result to production table or use a populate temporary table and rename it as to be the production table, this of course while dealing with indexes drops and re-creates when and if it makes sense.
Gurus DBA's pardon me, I am a reporting guy. You figured it by now. I am mostly driven by practical (business) experience at the risk of missing crucial technical best practices.
Philippe
BI Guy
January 25, 2007 at 5:37 am
Several people have commented that they use NOLOCk or setting the isolation level because they're in a read only environment. Have any of them considered setting their database to read only? You'll get a huge performance benefit and without having to do funky things to your queries.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 25, 2007 at 7:04 am
NOLOCK helps with concurrency, but use carefully, like cautioned earlier, especially if you have a lot of explicit transactions that could be rolled back. Don't use NOLOCK to SELECT when you are going to UPDATE/INSERT/DELETE based on the data you SELECT'ed later ("later" could just be milliseconds later) -- that's asking for data integrity problems.
Also, I haven't seen anyone mention it, but if you use NOLOCK, your apps should be prepared to handle the transitory "605" error which basically means "I tried to read a page using NOLOCK, but someone else was in the middle of altering (moving) it". This error DOES happen on busy systems (which coincidentally are the ones you'll be tempted the most to throw NOLOCK everywhere). The solution is to capture the 605 error and resubmit the query, since its a transitory error and most likely doing the same query even immediately will succeed. This isn't as straight forward as it seems. If you are handling the 605 at your framework level (e.g. a generic database class that submits a query and retries automatically on 605 errors), your queries need to be written in way where resubmitting the query doesn't have any adverse side effects (i.e. a batch with some INSERT/UPDATEs that repeating would be bad). Of course this increases round trips to the database which can hurt performance, especially if you have network latency. So you could then move the whole thing to a stored proc where it has logic to handle catching @@ERROR = 605, but then you run into the nice side effect that even if your stored proc handles the error, your application code still throws an exception actually prevents you from getting data back. Obviously, this is a much more involved discussion to do it properly, but I throw these out just to help you realize that NOLOCK isn't something you mindlessly (or even slightly mindlessly) throw out without considering the impacts.
READPAST isn't all its cracked up to be either. READPAST really only works on tables that were specifically designed to be work queue tables and they are accessed in really specific ways. READPAST doesn't skip ALL locks, only (if I'm remembering correctly) key locks. I had to do a bunch of work on an existing table that was also being used as a work queue and tried to use READPAST to help out, but it didn't work. I can give more details if anyone's interested. Also, unlike NOLOCK which WILL get you ALL the data (though it may not be committed data...), READPAST skips data that is locked (hence, read PAST) so you would get extremely inconsistent results. READPAST is only for work queues where multiple processes are grabbing some chunk of next available work and they don't want to block each other.
Mike Jansen
January 25, 2007 at 7:18 am
Tony Rogerson's blog he has code which demonstrates this. http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/16/1345.aspx |
That code, once you actually get the data population script on the link back to the WITH (NOLOCK) example to work correctly, certainly does prove that if you run over a Martini glass with a gravel-hauler, it will break. The problem is not with NOLOCK in that case... the problem is that, unless it's on an autonumbering column or an immutable "CreatedDate" column, anyone who uses a Clustered Index on a highly transactional table deserves the certain "Death by SQL" they will receive whether they use NOLOCK or not.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 25, 2007 at 7:57 am
"I would add that a hint is just that "a hint" not 100% of the time you will get what you asked for"
Not all hints are indeed absolute (like ROWLOCK) but the NOLOCK is certainly not a hint but a command (unless you are modifying data of course). Like said before hint is a dangerous word choice.
January 25, 2007 at 8:08 am
I HAVE to chime in on this one. From what it appears, yes, putting "WITH (NOLOCK)" on every table in a query call SHOULD BE the same as putting the "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED" statement at the top of the stored procedure, but I had a problem that showed that this is NOT the case.
I was dealing with a table that stored security information and was getting the occational TimeOut because of all of the blocking going on. I added the "NOLOCK" to the 3 tables in the query and the problem went away, but there was still some wierdness going on. I later came across the other command and thought that I would try it out and see if there was any difference, to my surprise there was a huge difference. When looking at the statistical IO and time, the reads dropped from around a million to a few hundred and the time dropped from around 850 milliseconds to around 15 milliseconds. So I would be interested to know if anyone else has seen this behavior.
Have Fun!
Eating an elephant is easy, you just need bigger utensils!
January 25, 2007 at 8:22 am
Yes, I've used it and it works great. The Read Committed Snapshot (RCS) Isolation level prevents readers from blocking writers and vice versa. It CAN greatly enhance concurrency, but may actually hurt performance due to the overhead involved. Using this isolation level every transaction is basically kept isolated from the others by creating snapshots of the data as it was when the transaction started. SQL Server uses the heck out of tempdb for this, so in a high volume enviornment you'd better give tempdb some fast IO...
The first project I used RCS for was a product that was being deveoloped for both Oracle and SQL Server. Turning on RCS makes SQL Server behave like Oracle in the transaction isolation and concurrency department, greatly simplifying the development effort. We found that for that particular application it improved our concurrency and total throughput by about 25%. That application was an insurance claims processing system and handled up to 1,200 claims per minute. That might not seem like much, but every claim had dozens of rules that applied which in turn generated multiple database calls... We figured that each claim generated over 275 distinct queries on average.
The current system I'm working on is a credit card processing system. We use RCS to handle as many as 80 authorizations/settlements per second. Each of those generates about 15 - 20 queries. So, that's something on the order of 5 million queries per hour on a 2x Xeon (dual core x64) system with 16Gb of RAM.
The really great thing about RCS is that you don't have to make any code changes to take advantage, which also means that it's very easy to test as your mileage may differ.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
January 25, 2007 at 8:29 am
Wayne,
Whether I agree with the premise of using WITH (NOLOCK) or not, your post did what all good posts are supposed to do... spark pro/con conversation.
Thought I'd reward you for your efforts especially since you and I share the same view about volumetric testing... try this out...
--===== This takes just over 2 minutes to make 5 million rows instead of an hour
-- and has the side benefit of being highly configurable.
-- It'll make 100K rows in just under 4 seconds and a million in 33 seconds
-- DROP TABLE ORDERS
--===== Declare local control variables
DECLARE @TableSize INT --Number of rows to create in the test table
DECLARE @StartDate DATETIME --Lowest possible date to put in the table (inclusive)
DECLARE @EndDate DATETIME --Highest possible date to put in the table (non inclusive)
DECLARE @ProductLimit FLOAT --How many products there will be
DECLARE @Days FLOAT --Number of days between start and end dates
--===== Preset local control variables
SET @TableSize = 5000000
SET @ProductLimit = 50000
SET @StartDate = '20000101'
SET @EndDate = '20100101'
--===== Calculate the number of days between the start and end dates
SET @Days = DATEDIFF(dd,@StartDate,@EndDate)
--===== Limit size of table to size identified by the appropriate control variable
SET ROWCOUNT @TableSize
--===== Create and populate the test table on the fly
SELECT OrderID = IDENTITY(INT,1,1),
ProductID = CAST(RAND(CAST(NEWID() AS VARBINARY))*@ProductLimit+1 AS INT),
OrderDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*@Days+@StartDate AS DATETIME)
INTO dbo.Orders
FROM Master.dbo.SysColumns sc1 WITH (NOLOCK),
Master.dbo.SysColumns sc2 WITH (NOLOCK)
--===== Restore normal processing of full row counts
SET ROWCOUNT 0
--===== Add the primary key to the table
ALTER TABLE dbo.Orders
ADD CONSTRAINT PK_Orders_OrderID
PRIMARY KEY CLUSTERED (OrderID)
--===== Configure the non-nullable columns to be NOT NULL
ALTER TABLE dbo.Orders
ALTER COLUMN ProductID INT NOT NULL
ALTER TABLE dbo.Orders
ALTER COLUMN OrderDate DATETIME NOT NULL
--===== Verify the limits of the data
SELECT MinOrderID = MIN(OrderID),
MaxOrderID = MAX(OrderID),
CountProductID = COUNT(DISTINCT OrderID),
MinProductID = MIN(ProductID),
MaxProductID = MAX(ProductID),
CountProductID = COUNT(DISTINCT ProductID),
MinOrderDate = MIN(OrderDate),
MaxOrderDate = MAX(OrderDate)
FROM dbo.Orders
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 88 total)
You must be logged in to reply to this topic. Login to reply