January 12, 2012 at 4:23 am
Hi,
We have this statement
SELECT Distinct Col_A, Col_B
FROM Table_Main WITH(nolock)
the result have duplicates.. also when checking Table_Main, there is only 1 row.
but if run w/o the "WITH(NoLock)"
SELECT Distinct Col_A, Col_B
FROM Table_Main
the result is unique, there's no duplicates..
can anyone help me what's the impact of using "WITH(nolock)" in distinct ?
why is it returning duplicate rows ? and why is it returning more than 1 row when the actual Table only contains 1 row ?
Thanks 🙂
January 12, 2012 at 4:38 am
can anyone help me what's the impact of using "WITH(nolock)" in distinct ?
why is it returning duplicate rows ? and why is it returning more than 1 row when the actual Table only contains 1 row ?
Thanks 🙂
you are might be facing dirty read issue. when you are using nolock hint on Select query. it can read uncommited transaction or the transaction which are in middle of rollback. so you can check any open transaction in your database and when you are not using the nolock then result is with respect to default isolation i. e. read commited.
January 12, 2012 at 4:42 am
Sometimes page split can lead to duplicate rows.
Have a look at this article :
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx
January 12, 2012 at 6:04 am
azdzn (1/12/2012)
Have a look at this article :
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx%5B/quote%5D
+1
This is the case most likely.
Another good reason not to use NOLOCK.
-- Gianluca Sartori
January 14, 2012 at 8:09 pm
Thanks for replies..
I think i forgot to mention one important thing, Table_Main has an unique index on for Col_A and Col_B..
does this mean the query did an Index Scan ? and since it knows that the involve columns in the SELECT DISTINCT are Unique Index, it just disregard the DISTINCT and grab all rows and the duplicates came from Page Splits (due to NOLICK hint) ?
is my understanding correct ?
January 15, 2012 at 3:33 am
jpdp101 (1/14/2012)
Thanks for replies..I think i forgot to mention one important thing, Table_Main has an unique index on for Col_A and Col_B..
does this mean the query did an Index Scan ? and since it knows that the involve columns in the SELECT DISTINCT are Unique Index, it just disregard the DISTINCT and grab all rows and the duplicates came from Page Splits (due to NOLICK hint) ?
is my understanding correct ?
As per my understanding it doesn't really matter if it's an index seek or a scan (as long as the seek would only affect a single page). As soon as it is required to read more than one page, you're in the risk of dirty reads.
From my understanding the DISTINCT is removed by the query optimizer due to the uniqueness of the clustered index. Would be interesting to see what happens if the clustered index is not defined as unique. From my understanding the query optimizer would then need to leave the final aggregation in place. Please note that I don't recommend to physically do it. It's more a theoretical question for the gurus around here. Instead optimize the query to get rid of the NOLOCK hint (e.g. add a separate index with just the two columns - but that'S just guessing without more detailed onfirmation available).
January 15, 2012 at 3:47 am
jpdp101 (1/14/2012)
does this mean the query did an Index Scan ? and since it knows that the involve columns in the SELECT DISTINCT are Unique Index, it just disregard the DISTINCT and grab all rows and the duplicates came from Page Splits (due to NOLOCK hint) ?
Yes and yes. If the index is unique and the key columns are part of the select clause, SQL knows there cannot be duplicates and hence ignores the distinct.
It would have to have done a scan because a seek is done in key order not allocation order and the nolock 'feature' of reading pages twice (and not reading others at all) requires allocation order scans. In your case, it would have to be a scan because your query has no where clause and hence there's no predicate that could be seeked on.
Get rid of that nolock. It's not just duplicates, you could also be entirely missing pages of data. Consider one of the snapshot isolation levels if you want to avoid blocking.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 15, 2012 at 4:59 am
Cool..
Removing the NOLOCK did return unique data from the DISTINCT.
Thanks everyone for the help and great informations. 🙂
January 16, 2012 at 1:26 am
READUNCOMMITTED (aka NOLOCK) is indeed inappropriate if you are looking for anything other than a best-effort quick view of the data. Beware of falling into the trap of thinking that READCOMMITTED (the default) or even REPEATABLEREAD are foolproof though:
http://blogs.msdn.com/b/craigfr/archive/2007/04/25/read-committed-isolation-level.aspx
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply