October 21, 2009 at 7:04 am
Hi All,
I am trying these queries -
SELECT TOP 10 * FROM Table1
select top 10 * from Table1 WITH (NOLOCK)
Both the queries will give different result set.
But when i used order by with them the result set will be the same.
Is this related to Isolation level in SQL Server or anything else?
I am very much confused.
Waiting for reply.
MJ:cool:
October 21, 2009 at 7:28 am
(NOLOCK) allows dirty reads. Basically, there is a chance that if you are reading data out of the table while it is in the process of being updated, you could read the wrong data. You can also read data that has been modified by transactions that have not been committed yet as well as a slew of other problems.
Best practice is not to use NOLOCK unless you are reading from tables that really don't change (such as a table containing states) or from a data warehouse type DB that is not constantly updated.
October 21, 2009 at 7:38 am
In the case of data that doesn't change, that's better off in a read-only database or read-only file. SQL Server won't even attempt to take locks on read-only data, and thus avoids the processing overhead involved. The performance boost is minor, but in a busy system it can make a difference.
With(nolock) has some uses very occassionally. Inappropriate use of it is the single most common error I've ever seen in database devs. Most don't understand that it can result in all kinds of junk data being used in important business processes.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 21, 2009 at 11:13 pm
It means what i am getting dirty or uncommitted records from second query (select top 10 * from Table1 With(NOLOCK).
Actually i am confused because of order by clause. When i use order by both the queries will give me same result set.
Reason why i am stuck in this situation is that we are improving the performance of our application using WITH (NOLOCK). During this process i got two different result set from the similar queries .
My question is that will it affect my work or not? If i get two different result sets.
Or it is better not to use WITH (NOLOCK) in select statement.
MJ:cool:
October 22, 2009 at 1:22 am
AMJ-458987 (10/21/2009)When i use order by both the queries will give me same result set.
Probably the rows in the dirty read wont be in the top 10 of your ordered results.
As you have discovered nolock can be a very dangerous beast.
Take a look at this link
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx
October 22, 2009 at 7:07 am
Using With(Nolock) can improve performance, at the cost of reliability and integrity.
The question to ask on that one is: If you were a manager, and had an employee who was really, really fast at his job, but who lied to you, broke things, and sabotaged the work of his co-workers, would you keep him around? If the answer is yes, then with(nolock) is for you.
The thing you're seeing with Top without an Order By is that SQL Server (and any normal database) can give you whatever it wants in that case. What you're doing is telling it, "give me the most convenient 10 rows". Since one query honors locks and the other doesn't, they're going to have different definitions of "most convenient". That's not so much an issue with "with(nolock)" as it is an issue with unordered Top commands. You never know what you're going to get with one of those.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 22, 2009 at 7:15 am
Mithilesh,
I tried a similar query. I do agree that NOLOCK gives uncommited data but very strangely when I was trying the queries, there was no other user/activity...so i dont expect any thing uncommited. Can anybody tell the reason for different result set when there is no update/lock on table.
October 22, 2009 at 7:17 am
ranjitrjha (10/22/2009)
Mithilesh,I tried a similar query. I do agree that NOLOCK gives uncommited data but very strangely when I was trying the queries, there was no other user/activity...so i dont expect any thing uncommited. Can anybody tell the reason for different result set when there is no update/lock on table.
Because an unordered Top statement gives you whatever comes up. It doesn't have consistency, because it's not required to.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 22, 2009 at 7:24 am
GSquared,
Your explanation makes sense but when i am running
select top 10 * from table1
select top 10 * from table1
then for both queries, result set is same.
It's only when I put a nolock hint as in queries below, the result set differs.
select top 10 * from table1
select top 10 * from table1 with (nolock)
October 22, 2009 at 8:21 am
That's simply due to the fact that the software is essentially lazy. The first query, run multiple times, is likely to pull the same data, because both the plan and the data are cached. Even if you clear the cache, you're likely to get the same data on multiple runs of the same query, because of the physical implementation of the database and the fact that the devs didn't add in random number generators just for the heck of it.
The point is that you are likely to get the same data. You aren't guaranteed to get the same data.
Thus, a minor, seemingly immaterial change in the query, is likely to get different results. Again, it might, or it might not.
The point is that you don't know what you'll get. It might give you the same results 10,000 times in a row, and then on the 10,001st, give you something different. No way to know.
Since most businesses don't want to get that kind of unpredictability out of their data, it's better to force the issue. That means ordering your query if you want the top X rows. It means avoiding "nolock" unless you have a real business reason to allow dirty reads. Has a lot of other ramifications, but those are definitely two of them.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 22, 2009 at 10:10 am
You should forget about using NOLOCK.
With SQL Server 2005, set the database to read_committed_snapshot, and your queries will not take locks or be blocked in most situations.
You should read about it in SQL Server 2005 Books Online first, to make sure you understand all the implications.
-- kill all connections to MyDatabase first, then run
use master
alter database [MyDatabase] set allow_snapshot_isolation on
alter database [MyDatabase] set read_committed_snapshot on
October 22, 2009 at 9:12 pm
One fun thing to know about ORDER BY is that w/o a ORDER BY statement in the ENTERPRISE version of SQL you can't rely on the fact that SELECT * FROM tablename will return back the record names based on the order in the clustered index. SQL 2005 EE / Dev Edition will actually look at any other queries that are running concurrently with the same query and it will hop on the read to return back the same rows that the 2nd query is running and will complete the batch with the read through to the key that started the query. So say you have spid 1 saying SELECT * FROM tableA.. and tableA has 1,000,000 rows in it ordered by Col1. Well if you started a query on a new SPID that was saying SELECT * FROM tableA while the first SPID was executing it wouldn't actualy give you ROW 1 as the first record returned. It could actually return row 500,000- 1mill first and then give you 0 - 499,999.
The I/O from an instance of the SQL Server Database Engine includes logical and physical reads. A logical read occurs every time the Database Engine requests a page from the buffer cache. If the page is not currently in the buffer cache, a physical read first copies the page from disk into the cache.
The read requests generated by an instance of the Database Engine are controlled by the relational engine and optimized by the storage engine. The relational engine determines the most effective access method (such as a table scan, an index scan, or a keyed read); the access methods and buffer manager components of the storage engine determine the general pattern of reads to perform, and optimize the reads required to implement the access method. The thread executing the batch schedules the reads.
Read-Ahead
The Database Engine supports a performance optimization mechanism called read-ahead. Read-ahead anticipates the data and index pages needed to fulfill a query execution plan and brings the pages into the buffer cache before they are actually used by the query. This allows computation and I/O to overlap, taking full advantage of both the CPU and the disk.
The read-ahead mechanism allows the Database Engine to read up to 64 contiguous pages (512KB) from one file. The read is performed as a single scatter-gather read to the appropriate number of (probably non-contiguous) buffers in the buffer cache. If any of the pages in the range are already present in the buffer cache, the corresponding page from the read will be discarded when the read completes. The range of pages may also be "trimmed" from either end if the corresponding pages are already present in the cache.
There are two kinds of read-ahead: one for data pages and one for index pages.
Reading Data Pages
Table scans used to read data pages are very efficient in the Database Engine. The index allocation map (IAM) pages in a SQL Server database list the extents used by a table or index. The storage engine can read the IAM to build a sorted list of the disk addresses that must be read. This allows the storage engine to optimize its I/Os as large sequential reads that are performed in sequence, based on their location on the disk. For more information about IAM pages, see Managing Space Used by Objects.
Reading Index Pages
The storage engine reads index pages serially in key order. For example, this illustration shows a simplified representation of a set of leaf pages that contains a set of keys and the intermediate index node mapping the leaf pages. For more information about the structure of pages in an index, see Clustered Index Structures.
The storage engine uses the information in the intermediate index page above the leaf level to schedule serial read-aheads for the pages that contain the keys. If a request is made for all the keys from ABC to DEF, the storage engine first reads the index page above the leaf page. However, it does not just read each data page in sequence from page 504 to page 556 (the last page with keys in the specified range). Instead, the storage engine scans the intermediate index page and builds a list of the leaf pages that must be read. The storage engine then schedules all the reads in key order. The storage engine also recognizes that pages 504/505 and 527/528 are contiguous and performs a single scatter read to retrieve the adjacent pages in a single operation. When there are many pages to be retrieved in a serial operation, the storage engine schedules a block of reads at a time. When a subset of these reads is completed, the storage engine schedules an equal number of new reads until all the required reads have been scheduled.
The storage engine uses prefetching to speed base table lookups from nonclustered indexes. The leaf rows of a nonclustered index contain pointers to the data rows that contain each specific key value. As the storage engine reads through the leaf pages of the nonclustered index, it also starts scheduling asynchronous reads for the data rows whose pointers have already been retrieved. This allows the storage engine to retrieve data rows from the underlying table before it has completed the scan of the nonclustered index. Prefetching is used regardless of whether the table has a clustered index. SQL Server Enterprise uses more prefetching than other editions of SQL Server, allowing more pages to be read ahead. The level of prefetching is not configurable in any edition. For more information about nonclustered indexes, see Nonclustered Index Structures.
PER BOL..
Advanced Scanning
In SQL Server Enterprise, the advanced scan feature allows multiple tasks to share full table scans. If the execution plan of a Transact-SQL statement requires a scan of the data pages in a table and the Database Engine detects that the table is already being scanned for another execution plan, the Database Engine joins the second scan to the first, at the current location of the second scan. The Database Engine reads each page one time and passes the rows from each page to both execution plans. This continues until the end of the table is reached.
At that point, the first execution plan has the complete results of a scan, but the second execution plan must still retrieve the data pages that were read before it joined the in-progress scan. The scan for the second execution plan then wraps back to the first data page of the table and scans forward to where it joined the first scan. Any number of scans can be combined like this. The Database Engine will keep looping through the data pages until it has completed all the scans. This mechanism is also called "merry-go-round scanning" and demonstrates why the order of the results returned from a SELECT statement cannot be guaranteed without an ORDER BY clause.
For example, assume that you have a table with 500,000 pages. UserA executes a Transact-SQL statement that requires a scan of the table. When that scan has processed 100,000 pages, UserB executes another Transact-SQL statement that scans the same table. The Database Engine schedules one set of read requests for pages after 100,001, and passes the rows from each page back to both scans. When the scan reaches the 200,000th page, UserC executes another Transact-SQL statement that scans the same table. Starting with page 200,001, the Database Engine passes the rows from each page it reads back to all three scans. After it reads the 500,000th row, the scan for UserA is complete, and the scans for UserB and UserC wrap back and start to read the pages starting with page 1. When the Database Engine gets to page 100,000, the scan for UserB is completed. The scan for UserC then keeps going alone until it reads page 200,000. At this point, all the scans have been completed.
Without advanced scanning, each user would have to compete for buffer space and cause disk arm contention. The same pages would then be read once for each user, instead of read one time and shared by multiple users, slowing down performance and taxing resources
ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_0evalplan/html/fb83d250-4252-4322-b57c-3c72de8407c8.htm
October 25, 2009 at 5:30 pm
-- kill all connections to MyDatabase first, then run
use master
alter database [MyDatabase] set allow_snapshot_isolation on
alter database [MyDatabase] set read_committed_snapshot on
Before changing this and hoping that you get no more blocks...if this is a vendor database check with them that this has been tested.
October 26, 2009 at 1:31 am
ranjitrjha (10/22/2009)
GSquared,Your explanation makes sense but when i am running
select top 10 * from table1
select top 10 * from table1
then for both queries, result set is same.
It's only when I put a nolock hint as in queries below, the result set differs.
select top 10 * from table1
select top 10 * from table1 with (nolock)
Assuming that a clustered index exists on table1:
The first query is likely to produce a query plan which forward scans the clustered index and therefore produces results in logical clustered index order (since it scans the linked list of clustered index pages). This behaviour isn't guaranteed as such, but has always behaved this way, at least as far as I recall. The decision about how to return data to satisfy the query is made by the Storage Engine.
With NOLOCK (or READUNCOMMITTED as I much prefer it), the Storage Engine might choose to perform an IAM-ordered scan rather than following the clustered index chain. From memory, I think this is true if table1 is at least 64 pages in size. Using an IAM scan, data will be returned in the order pages were allocated. The same effect occurs if you specify TABLOCK - the Storage Engine can use the (potentially more efficient) IAM scan if it can guarantee that row movement from things like page splits won't screw things up. With READUNCOMMITTED/NOLOCK you are saying that you don't care about consistency, so the Storage Engine may choose the IAM scan. With TABLOCK, it knows that there can't be any concurrent modifications, so again an IAM scan is safe.
The Storage Engine always uses the IAM method when scanning a heap (though obviously not for any non-clustered indexes!) so that's why I qualified the above to restrict it to the case where table1 is not a heap.
Paul
October 26, 2009 at 7:26 am
I think Paul is the first to get this right: it is almost certainly due to NOLOCK allowing an allocation-order scan. if you want the same rows back on a top repeatably you MUST include an order by that guarantees uniqueness of the returned rows. Otherwise SQL Server can return ANY rows - there is NO 'ordering' in set-based returns other than that given by explicit ORDER BY.
To address a comment someone else made "Using With(Nolock) can improve performance . . .": actually it doesn't technically improve performance as such (other than the tiny fraction gained from not taking/checking some locks). What it really does is improve CONCURRENCY, meaning your read isn't blocked by other DML activity. 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply