The Paradox of NOLOCK: How Dirty Reads Can Sometimes Be Cleaner

  • Hi Rick,

    The MVCC thing is fairly incidental to the book which is great because it is such a clear exposition of the math behind the relational model.

  • My point was how you solve serialisable with adequate performance in principle independent of platform, I am well aware of the drawbacks in current methods.

    My other point is that concurrency is a difficult subject and one which ideally those involved with applications should not need to know about - the onus is on the software industry to address this problem.

     

  • MMartin1 wrote:

    Jeff. I'll take a crack at your question:

    In your example, are you assuming the date column is indexed and would be used?  What if that is not the case, the rows for the past month share records with current month , and updates occur on other rows that cause page splits ? I believe you might get something called a ghost read in these cases.

    And all subsequent replies were oblivious to what I mentioned earlier that nolock is not even acted on if you have a AG environment.

    Also if the business case is that items can be returned, your counts for previous month(s) can change. If you have a program where you get discount from the manufacturer of a product if you sell over (i dont know) 2000 items in the month, then accuracy starts to matter before you get to the point wher you close the books and no longer accept refunds for some month.

    Let's ask the question... you gather the data without using WITH (NOLOCK).  As soon as you're done, someone makes a return like what you were talking about.  Your report is now inaccurate for the current time.   As for the return itself, "It Depends".  Are you handling your inventory like a checkbook where you never modify or erase a transaction so that you can actually track the time between the sale and the return?  In other words, in such a situation (reporting for previous month), you wouldn't be using any rows that are updated or newly inserted because the rows are frozen at the time of entry.

    As for when a page split occurs, I'm pretty sure those pages are locked tighter than a drum with a system lock. I can't say for sure if you'd be able to do a read of either page using WITH (NOLOCK) or not.  And then, I'm not sure that the "ghost record" you speak of would be there or not unless it were a Heap, in which case, it probably would be.  But, again... it would depend on whether or not you're modifying the rows or treating them as immutable upon entry (which I recommend because compression works much better on such things, as well).

    If the table is a Heap, though, there's probably bigger problems to worry about all the way around.

    And, just to say it out loud again, I'm NOT advocating for use of WITH (NOLOCK) or setting the transaction isolation level to READ UNCOMMITTED everywhere.  We've seen where it doesn't work and must not be used.  The bottom line is, "It Depends" on what the situation is.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • Yeah,

    if you think you're going to get a delighted response from a production engineer if you're providing them with randomly crap data, I hope you can duck and weave well.  And if you think 1000 rows a minute is at all high volume, really, it isn't, it's trivial.  If you're getting poor performance at that level, your design is abysmal.

    Try fixing things, rather than breaking them further.  This is generally what's ACTUALLY required with any production system I've ever worked with.

     

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • andrew gothard wrote:

    Try fixing things, rather than breaking them further.

    Now that, I agree with!

    A lot of people immediately respond with "Well, just turn on RCSI".  Folks really need to be careful there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Any change to the locking can cause an app to fail.  I would not consider NOLOCK "clean".  It actually stinks.  However, it depends upon the cost of the other options.  For example, one of our systems has a stored procedure that uses xp_cmdshell -> cscript -> vbs -> batch and more vbs -> COBOL -> new connections to the same db -> nested cursors -> file output -> xp_cmdshell read file -> result.  If a NOLOCK was required to avoid an undetected deadlock, I would say damn the locks full speed ahead.  I'm holding my nose, not my breath, until $$$$ is found to replace it.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • There is an interesting mention about NOLOCK in

    https://learn.microsoft.com/en-us/power-apps/developer/data-platform/dataverse-sql-query

    Don't use NOLOCK

    When building queries, don't use the table hint NOLOCK. This hint prevents Dataverse from optimizing queries.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Well, I admit I've never used NOLOCK in a query.  There hasn't been any project where NOLOCK is/was clearly required.  The article doesn't actually put forward any performance metrics or specific projects.  There seems to be an assumed performance benefit, maybe it's real, the cost in accuracy seems prohibitive in most scenarios afaik

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Jeff Moden wrote:

    MMartin1 wrote:

    Jeff. I'll take a crack at your question:

    In your example, are you assuming the date column is indexed and would be used?  What if that is not the case, the rows for the past month share records with current month , and updates occur on other rows that cause page splits ? I believe you might get something called a ghost read in these cases.

    And all subsequent replies were oblivious to what I mentioned earlier that nolock is not even acted on if you have a AG environment.

    Also if the business case is that items can be returned, your counts for previous month(s) can change. If you have a program where you get discount from the manufacturer of a product if you sell over (i dont know) 2000 items in the month, then accuracy starts to matter before you get to the point wher you close the books and no longer accept refunds for some month.

    Let's ask the question... you gather the data without using WITH (NOLOCK).  As soon as you're done, someone makes a return like what you were talking about.  Your report is now inaccurate for the current time.   As for the return itself, "It Depends".  Are you handling your inventory like a checkbook where you never modify or erase a transaction so that you can actually track the time between the sale and the return?  In other words, in such a situation (reporting for previous month), you wouldn't be using any rows that are updated or newly inserted because the rows are frozen at the time of entry.

    As for when a page split occurs, I'm pretty sure those pages are locked tighter than a drum with a system lock. I can't say for sure if you'd be able to do a read of either page using WITH (NOLOCK) or not.  And then, I'm not sure that the "ghost record" you speak of would be there or not unless it were a Heap, in which case, it probably would be.  But, again... it would depend on whether or not you're modifying the rows or treating them as immutable upon entry (which I recommend because compression works much better on such things, as well).

    If the table is a Heap, though, there's probably bigger problems to worry about all the way around.

    And, just to say it out loud again, I'm NOT advocating for use of WITH (NOLOCK) or setting the transaction isolation level to READ UNCOMMITTED everywhere.  We've seen where it doesn't work and must not be used.  The bottom line is, "It Depends" on what the situation is.

    Hey Jeff, was simply responding to the question on the scenario you posed. If accouting is dishing out refunds let us say every Friday at a certain time for transactions currently refunded but originally happend from some time passed, then any audit needs to be able to attain the same results for that time frame. Dont want to have a situation where a refund was recorded but then rolled back because of a timeout error, and posted successfully later (for instance. )

    But though I get your point the odds of anything going wrong are slim, transactions dealing with $ make people nervous when the same query can produce different results.

     

    ----------------------------------------------------

  • Jeff Moden wrote:

    MMartin1 wrote:

    Jeff. I'll take a crack at your question:

    In your example, are you assuming the date column is indexed and would be used?  What if that is not the case, the rows for the past month share records with current month , and updates occur on other rows that cause page splits ? I believe you might get something called a ghost read in these cases.

    And all subsequent replies were oblivious to what I mentioned earlier that nolock is not even acted on if you have a AG environment.

    Also if the business case is that items can be returned, your counts for previous month(s) can change. If you have a program where you get discount from the manufacturer of a product if you sell over (i dont know) 2000 items in the month, then accuracy starts to matter before you get to the point wher you close the books and no longer accept refunds for some month.

    Let's ask the question... you gather the data without using WITH (NOLOCK).  As soon as you're done, someone makes a return like what you were talking about.  Your report is now inaccurate for the current time.   As for the return itself, "It Depends".  Are you handling your inventory like a checkbook where you never modify or erase a transaction so that you can actually track the time between the sale and the return?  In other words, in such a situation (reporting for previous month), you wouldn't be using any rows that are updated or newly inserted because the rows are frozen at the time of entry.

    As for when a page split occurs, I'm pretty sure those pages are locked tighter than a drum with a system lock. I can't say for sure if you'd be able to do a read of either page using WITH (NOLOCK) or not.  And then, I'm not sure that the "ghost record" you speak of would be there or not unless it were a Heap, in which case, it probably would be.  But, again... it would depend on whether or not you're modifying the rows or treating them as immutable upon entry (which I recommend because compression works much better on such things, as well).

    If the table is a Heap, though, there's probably bigger problems to worry about all the way around.

    And, just to say it out loud again, I'm NOT advocating for use of WITH (NOLOCK) or setting the transaction isolation level to READ UNCOMMITTED everywhere.  We've seen where it doesn't work and must not be used.  The bottom line is, "It Depends" on what the situation is.

    Hey Jeff, was simply responding to the question on the scenario you posed. If accouting is dishing out refunds let us say every Friday at a certain time for transactions currently refunded but originally happend from some time passed, then any audit needs to be able to attain the same results for that time frame. Dont want to have a situation where a refund was recorded but then rolled back because of a timeout error, and posted successfully later (for instance. )

    But though I get your point the odds of anything going wrong are slim, transactions dealing with $ make people nervous when the same query can produce different results.

     

    ----------------------------------------------------

  • This was removed by the editor as SPAM

  • Randolph West wrote:

    1,000 units a minute is only 17 per second. I’ve managed systems with 50,000 transactions per second on a four-core CPU with 64 GB RAM. NOLOCK has no place in a production environment.

    My system is bigger than yours! 😀

    • This reply was modified 2 months, 1 week ago by  Papillon.
  • rstone wrote:

    There is an interesting mention about NOLOCK in

    https://learn.microsoft.com/en-us/power-apps/developer/data-platform/dataverse-sql-query

    Don't use NOLOCK When building queries, don't use the table hint NOLOCK. This hint prevents Dataverse from optimizing queries.

    So, don't use NOLOCK with Dataverse. This article was not telling people to use NOLOCK for everything. everywhere.

Viewing 15 posts - 31 through 45 (of 45 total)

You must be logged in to reply to this topic. Login to reply