Use of NoLock in Select Distinct yield duplicate

Question

Use of NoLock in Select Distinct yield duplicate

jpdp101

Old Hand

Points: 399
More actions
January 12, 2012 at 4:23 am

#253437

Hi,
We have this statement
SELECT Distinct Col_A, Col_B
FROM Table_Main WITH(nolock)
the result have duplicates.. also when checking Table_Main, there is only 1 row.
but if run w/o the "WITH(NoLock)"
SELECT Distinct Col_A, Col_B
FROM Table_Main
the result is unique, there's no duplicates..
can anyone help me what's the impact of using "WITH(nolock)" in distinct ?
why is it returning duplicate rows ? and why is it returning more than 1 row when the actual Table only contains 1 row ?
Thanks 🙂

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

padhis SSCommitted Points: 1762 More actions · Answer 1

can anyone help me what's the impact of using "WITH(nolock)" in distinct ?
why is it returning duplicate rows ? and why is it returning more than 1 row when the actual Table only contains 1 row ?
Thanks 🙂

you are might be facing dirty read issue. when you are using nolock hint on Select query. it can read uncommited transaction or the transaction which are in middle of rollback. so you can check any open transaction in your database and when you are not using the nolock then result is with respect to default isolation i. e. read commited.

azdzn SSCarpal Tunnel Points: 4128 More actions · Answer 2

Sometimes page split can lead to duplicate rows.

Have a look at this article :

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx

spaghettidba SSC Guru Points: 105732 More actions · Answer 3

azdzn (1/12/2012)

Sometimes page split can lead to duplicate rows.

Have a look at this article :

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx%5B/quote%5D

+1

This is the case most likely.

Another good reason not to use NOLOCK.

-- Gianluca Sartori

jpdp101 Old Hand Points: 399 More actions · Answer 4

Thanks for replies..

I think i forgot to mention one important thing, Table_Main has an unique index on for Col_A and Col_B..

does this mean the query did an Index Scan ? and since it knows that the involve columns in the SELECT DISTINCT are Unique Index, it just disregard the DISTINCT and grab all rows and the duplicates came from Page Splits (due to NOLICK hint) ?

is my understanding correct ?

LutzM SSC Guru Points: 107049 More actions · Answer 5

jpdp101 (1/14/2012)
Thanks for replies..
I think i forgot to mention one important thing, Table_Main has an unique index on for Col_A and Col_B..
does this mean the query did an Index Scan ? and since it knows that the involve columns in the SELECT DISTINCT are Unique Index, it just disregard the DISTINCT and grab all rows and the duplicates came from Page Splits (due to NOLICK hint) ?
is my understanding correct ?

As per my understanding it doesn't really matter if it's an index seek or a scan (as long as the seek would only affect a single page). As soon as it is required to read more than one page, you're in the risk of dirty reads.

From my understanding the DISTINCT is removed by the query optimizer due to the uniqueness of the clustered index. Would be interesting to see what happens if the clustered index is not defined as unique. From my understanding the query optimizer would then need to leave the final aggregation in place. Please note that I don't recommend to physically do it. It's more a theoretical question for the gurus around here. Instead optimize the query to get rid of the NOLOCK hint (e.g. add a separate index with just the two columns - but that'S just guessing without more detailed onfirmation available).

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

jpdp101 (1/14/2012)
does this mean the query did an Index Scan ? and since it knows that the involve columns in the SELECT DISTINCT are Unique Index, it just disregard the DISTINCT and grab all rows and the duplicates came from Page Splits (due to NOLOCK hint) ?

Yes and yes. If the index is unique and the key columns are part of the select clause, SQL knows there cannot be duplicates and hence ignores the distinct.

It would have to have done a scan because a seek is done in key order not allocation order and the nolock 'feature' of reading pages twice (and not reading others at all) requires allocation order scans. In your case, it would have to be a scan because your query has no where clause and hence there's no predicate that could be seeked on.

Get rid of that nolock. It's not just duplicates, you could also be entirely missing pages of data. Consider one of the snapshot isolation levels if you want to avoid blocking.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

jpdp101 Old Hand Points: 399 More actions · Answer 7

Cool..

Removing the NOLOCK did return unique data from the DISTINCT.

Thanks everyone for the help and great informations. 🙂

Paul White SSC Guru Points: 150468 More actions · Answer 8

READUNCOMMITTED (aka NOLOCK) is indeed inappropriate if you are looking for anything other than a best-effort quick view of the data. Beware of falling into the trap of thinking that READCOMMITTED (the default) or even REPEATABLEREAD are foolproof though:

http://blogs.msdn.com/b/craigfr/archive/2007/04/25/read-committed-isolation-level.aspx

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi