The Paradox of NOLOCK: How Dirty Reads Can Sometimes Be Cleaner

  • Comments posted to this topic are about the item The Paradox of NOLOCK: How Dirty Reads Can Sometimes Be Cleaner

  • With regard to the following code from the article...

     

    CREATE PROCEDURE dbo.GetDashboardStats
    AS
    BEGIN
    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
    SELECT
    (SELECT COUNT(*) FROM Orders WITH (NOLOCK)
    WHERE OrderDate = CAST(GETDATE() AS DATE)) as TodayOrders,
    (SELECT SUM(OrderTotal) FROM Orders WITH (NOLOCK)
    WHERE OrderDate = CAST(GETDATE() AS DATE)) as TodayRevenue,
    (SELECT COUNT(DISTINCT CustomerID) FROM Orders WITH (NOLOCK)
    WHERE OrderDate = CAST(GETDATE() AS DATE)) as UniqueCustomers
    END

    Gosh... You do realize that if you start a proc with "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED" that there's no need whatsoever for WITH (NOLOCK) correct?

    As a bit of a sidebar, if anyone writes code like that where the source, action, and criteria are all the same for all the columns, you have larger performance issues to worry about and need to train your people at the 100 level for aggregates.

    As for the use of WITH (NOLOCK) or the Isolation Level, I agree... as with all else, "It Depends".  But just like the knight said in the cave of the Holy Grail in the "Indiana Jones and the Last Crusade" movie, "Choose wisely".

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Or the paradox is an enigma wrapped in a riddle.  The article is based on a musing which is factually unconfirmed.  Or maybe I missed the convincing part(s)

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • The day my bank starts using NOLOCK on its SQL Server queries is the day I start ROLLBACK TRANSACTION on my account and JOIN a new bank.

  • This article contains several inaccuracies and incorrect assumptions about what it means to read uncommitted data (which is what NOLOCK offers). I strongly advise against following the instructions in this article as currently written.

  • I have to add that if blocking and query speed is an issue there are deeper problems going on with the system in use. The use of nolock as a one size fits all solution is merely a band aid at best. I've seen blocking caused by too many page splits from poorly designed databases and indexes.  Not too mention bad query plans from unkept statistics or bad SQL coding patterns (paremeter sniffing issue).

    Thats not even to mention that this will yield no use in a AG environement where secondary read replicas need to first commit a transaction before the primary database commits it on its end. The use of snapshot isolation, as we have with Availabiltiy Groups, causes nolock to be ignored on the secondary node.

    • This reply was modified 2 days, 14 hours ago by  MMartin1.

    ----------------------------------------------------

  • It's true users care more about performance when doing their routine tasks, but that's because they take accuracy for granted as a standard. When they stumble on a case where data error led to consequences, you'll be the first to blame. Because we have no right to assume for which purpose they are using or may use in future, returning data, even reports.

    Here is the test question to verify your approach. Ask your analyst: "is it Ok if we name a column in report 'Total sales, approximately'"? Then you will see reaction. Without documented confirmation, don't take responsibility to sacrifice consistency.

  • Good use of common sense. Ignore the obvious comments from people who don’t understand the needs of say a live, production reporting system where 1000 units a minute can be produced - give or take 100 - vs a financial system where absolute accuracy is paramount to speed.

  • UPDATE myTable WITH(NOLOCK) is not permitted by SQL SERVER, so, it's impossible to try.

    In any case, I prefer the option "SET TRANSACTION ISOLATION LEVEL SNAPSHOT", so, I can read committed data without waiting for transactions to be committed.

  • Ignore the obvious comments from people who don’t understand the needs of say a live, production reporting system where 1000 units a minute can be produced

    1,000 units a minute is only 17 per second. I’ve managed systems with 50,000 transactions per second on a four-core CPU with 64 GB RAM. NOLOCK has no place in a production environment.

  • There is an aspect covered in your article that I think is praise-worthy - the "correctness" of data can only be decided by stakeholders.  If the stakeholders make an informed decision that approximate is suitable for their purposes, everything you say is acceptable.  But having data of unknown integrity is NEVER better than having accurate data.  To sacrifice this cardinal virtue for performance is simply misguided.  As others have pointed out, there might be design solutions that could help, but simply getting rid of locks is wrong.

  • It's sort of like denormalization for performance.  Wait for a performance issue before considering it.  The primary issue I've experienced with "WITH NOLOCK" is developers using it as the first or "goto" solution for blocking or deadlock issues.  I remember teaching developers about temp tables.  Temp tables showed up everywhere.  The same for triggers.  I forgot the "when not to use" part.  I did not make that mistake with NOLOCK, but developers discovered it for themselves.

    I think WITH NOLOCK is fine to check production issues (e.g. how many records are queued for processing).  I did once use it in production code when two BizTalk servers (the 2005 era) would instantly deadlock processing the same queue table.  The records had to be processed in order of warrant issues and recalls.  A query using WITH NOLOCK detected the next record to process.  The processing status needed to reflect the status anywhere, even in a transaction on the other BizTalk server.  If the recorded needed processing (for the first time or again), an update lock was acquired.  (A read lock would instantly get you a lock escalation deadlock.)  Otherwise, it polled until the next record was safe to be processed.  It worked, but what a nightmare.  I've learned that redundancy sometimes gets you 8s, not 9s.

    • This reply was modified 2 days, 4 hours ago by  rstone.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • With NOLOCK the data returned can be inconsistent with itself!

    This is not acceptable.

    The correct solution to this problem is snapshot isolation - no locking and a consistent result as of the start of the transaction.

    Do not follow the advice given in this article

  • Actually, if you are talking about a database enforced technology solution, the answer is multi-version read consistency, pioneered by Oracle and used by others.  Snapshot isolation will give you a consistent view of data at a point in time, but it may not be the most recent point in time with full data integrity.

  • "If the stakeholders make an informed decision that approximate is suitable for their purposes, everything you say is acceptable."

    Agreed. Often, summary reporting only needs to be approximate.  It's important to understand that.  (Detailed reporting can be different).

    Consider how many significant figures are used. If your BI system puts out "1.34M", but the average transaction is around 5-15 cents, stakeholders already know it's approximate and using NOLOCK is probably quite acceptable.   (Chances are they don't want to see 1341549.58 anyway!)

    One must know the situation - and NOLOCK is fine for approximate, real-time-ish summary reporting without affecting operational use.

    Besides, data is often fundamentally approximate anyway. If there's ten transactions per second, will any executive care that the "cumulative today dashboard" (run many times a day) is from 10:38:02.520 am, or 200 milliseconds later? (For context: median reaction time on 'humanbenhmark.com' is 273 ms.)

    Heck, the 'current balance' of an account is affected by when the A.R. person takes their lunch break!

    So, in my opinion:

    • OLAP needs about (my opinion) 3 significant figures accuracy but needs fast response across "large" data sets.
    • OLTP needs perfect accuracy and decent response across "small" data sets.

    Having said that, snapshot isolation might be viable. (Even if the OLTP is using seralizable, I think you can run OLAP queries with snapshot, and since OLAP is typically read-only, there shouldn't be any 'failed to commit' issues.)

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply