Is this correct use of WITH (NOLOCK, READUNCOMMITTED) in delete?

  • Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My brain is racking here... I still feel that something is not right, but I don't know how to explain it because I am ignorant on the subject. My thoughts are that if everything is committed, why would we worry about locks? Then the hint is not doing anything anyway... Right? Its too early for this... 🙂

    Jared

    Jared
    CE - Microsoft

  • p-nut (11/17/2011)


    Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My brain is racking here... I still feel that something is not right, but I don't know how to explain it because I am ignorant on the subject. My thoughts are that if everything is committed, why would we worry about locks? Then the hint is not doing anything anyway... Right? Its too early for this... 🙂

    Jared

    Ask the users about the data, what would happen if x or y happened with bad data. How would it affect business.

    Present this as research for school (or anything else you can easily cover up with). They shouldn't be made aware (yet) that there might be a problem with the system.

    There's little worse than a user who doesn't trust you or the system.

  • There are two types of transactions (explicit and implicit). explicit are the ones your boss is discussing. These are the ones that are instantiated as a result of tsql being run in a proc or whatever with a "begin transaction" statement. The implicit ones are where you can REALLY get into trouble with dirty reads. This is where rows can be counted more than once or simply get lost because sql is performing transactions for things like page splits. Of course transactions are not the only things that can cause locks either. It sounds like your boss has "drunk the koolaid" of NOLOCK being the easy "go fast" button. As I said in an earlier post, if your data is such that accuracy is totally unimportant then it is probably not a big deal. If business decisions are made as a result of the data then dirty reads just simply are not good enough.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I've always had this in mind.

    If you realllllly don't care about the data then why do you bother with it? You save it, you maintain it, you look at it, you don't want to delete it.

    You're either a massive ressource waster and should be fired or you need to data.

    If you really NEED the data then it needs to be accurate. Unless the form or reports says "estimates with X margin for error".

  • p-nut (11/17/2011)


    Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My response to this is "How can you guarantee the data is committed and does not leave open transactions? What are you basing this 'nature of the data' on?"

    Data is not naturally committed. That's why SQL Server has commands to commit it in an explicit transaction and code to commit it in implicit transactions.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (11/17/2011)


    p-nut (11/17/2011)


    Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My response to this is "How can you guarantee the data is committed and does not leave open transactions? What are you basing this 'nature of the data' on?"

    Data is not naturally committed. That's why SQL Server has commands to commit it in an explicit transaction and code to commit it in implicit transactions.

    +1, it's fast, but there's a transition between start change and commit change.

    This is when you run into bad data. And with 100s of users it's impossible to NOT happen on a significant amount of times.

  • p-nut (11/17/2011)


    Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My brain is racking here... I still feel that something is not right, but I don't know how to explain it because I am ignorant on the subject. My thoughts are that if everything is committed, why would we worry about locks? Then the hint is not doing anything anyway... Right? Its too early for this... 🙂

    Jared

    It has nothing to do with transactions being left open. Unless the data is completely static (in which case NoLock is indeed doing nothing at all), it will have transactions going on in it.

    If all the transactions are inserts (only applicable if you have no non-clustered indexes and the clustered index is insert-at-the-end, or an unindexed heap) and deletes, or if the only updates are ones that cannot possibly change data length or any index sequence, then nolock won't result in issues caused by page splits. Only if the only transactions ever done are inserts, and if queries never need the most recent data, in that case, NoLock will never cause query issues. In all other cases, NoLock can cause incorrect information to enter processes.

    If all uses of the data are fault tollerant because they do not ever rely on specific values, then the incorrect data from NoLock issues is probably okay. This would include data used only to generate trends, averages, and so on. If any end user is depending on getting specific rows or values, then dirty reads are absolutely not okay. Examples of needing specific rows are ERP, CRM, accounting, and other transaction-dependent systems. Examples of not needing specific rows/values are data mining, inventory prediction algorithms, trend-based heuristic systems (mainly applicable in neural nets and robotics and AI). If you support both, you need to either isolate them and use different rules in each, or use the most pessmistic rules applicable.

    So, dirty reads won't be generated in some very unusual databases, and won't matter in other even more unusual databases. In the majority of cases where it could be appicable, snapshot isolation is still a better solution, but there are a few cases where the load on tempdb would be prohibitive. It's marginally possible your manager is correct about ALL of that. But, with odds being something like 999 to 1, the most likely scenario is that she doesn't actually understand the interactions between index reading, page splits, and row/page/table locks. Most of the DBA population, including many good DBAs, don't know all those details. It's a very, very common misunderstanding, even for people with years of experience, top notch educations, and so on.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (11/17/2011)


    If all uses of the data are fault tollerant because they do not ever rely on specific values, then the incorrect data from NoLock issues is probably okay. This would include data used only to generate trends, averages, and so on. If any end user is depending on getting specific rows or values, then dirty reads are absolutely not okay. Examples of needing specific rows are ERP, CRM, accounting, and other transaction-dependent systems. Examples of not needing specific rows/values are data mining, inventory prediction algorithms, trend-based heuristic systems (mainly applicable in neural nets and robotics and AI). If you support both, you need to either isolate them and use different rules in each, or use the most pessmistic rules applicable.

    I have to even take isue with this. If your transactional data has a high degree of variability (our policies could range from 79$ to several million in premium), even screwing up one or two of the wrong records can have huge implications on your trends and averages. Like Gail pointed out, if the "lucky record" that either gets counted twice or missed entirely is the $8.7M policy that month, you'd better believe that your averages and trends will be off. If you make actual decisions based on those kinds of data (such as - do I need to reprice things because product xxx isn't selling well, etc...), you'd better be darn sure that you're doing it on solid data.

    Even if some fault tolerance were allowable, the question still comes back to: if you have a better way to do it, why settle for the "this might have issues" solution?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • GSquared (11/17/2011)


    p-nut (11/17/2011)


    Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My brain is racking here... I still feel that something is not right, but I don't know how to explain it because I am ignorant on the subject. My thoughts are that if everything is committed, why would we worry about locks? Then the hint is not doing anything anyway... Right? Its too early for this... 🙂

    Jared

    It has nothing to do with transactions being left open. Unless the data is completely static (in which case NoLock is indeed doing nothing at all), it will have transactions going on in it.

    If all the transactions are inserts (only applicable if you have no non-clustered indexes and the clustered index is insert-at-the-end, or an unindexed heap) and deletes, or if the only updates are ones that cannot possibly change data length or any index sequence, then nolock won't result in issues caused by page splits. Only if the only transactions ever done are inserts, and if queries never need the most recent data, in that case, NoLock will never cause query issues. In all other cases, NoLock can cause incorrect information to enter processes.

    If all uses of the data are fault tollerant because they do not ever rely on specific values, then the incorrect data from NoLock issues is probably okay. This would include data used only to generate trends, averages, and so on. If any end user is depending on getting specific rows or values, then dirty reads are absolutely not okay. Examples of needing specific rows are ERP, CRM, accounting, and other transaction-dependent systems. Examples of not needing specific rows/values are data mining, inventory prediction algorithms, trend-based heuristic systems (mainly applicable in neural nets and robotics and AI). If you support both, you need to either isolate them and use different rules in each, or use the most pessmistic rules applicable.

    So, dirty reads won't be generated in some very unusual databases, and won't matter in other even more unusual databases. In the majority of cases where it could be appicable, snapshot isolation is still a better solution, but there are a few cases where the load on tempdb would be prohibitive. It's marginally possible your manager is correct about ALL of that. But, with odds being something like 999 to 1, the most likely scenario is that she doesn't actually understand the interactions between index reading, page splits, and row/page/table locks. Most of the DBA population, including many good DBAs, don't know all those details. It's a very, very common misunderstanding, even for people with years of experience, top notch educations, and so on.

    Maybe I can provide a business case to help you to help me 🙂 We route phone calls. Calls come in and a row is inserted here and there... (header, detail, etc.) Any routing reads a static table and then inserts a history record into a history table. Call ends, and 1 field in header is updated and 1 row updated in detail. Millions of these calls happen in a day. Reporting looks at 100k to 1mil intervals, not even at the 1000s level. So the reports are written WITH(NOLOCK) to allow concurrency. So basically our business has static routing and application tables and extremely active incoming/outgoing call log tables. Developers started the practice of WITH(NOLOCK) before DBAs came to the company, but my boss has not seen a justifiable reason to change this practice with the use of the data.

    Does this help explain my case so that advice can be offered?

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • Matt Miller (#4) (11/17/2011)


    GSquared (11/17/2011)


    If all uses of the data are fault tollerant because they do not ever rely on specific values, then the incorrect data from NoLock issues is probably okay. This would include data used only to generate trends, averages, and so on. If any end user is depending on getting specific rows or values, then dirty reads are absolutely not okay. Examples of needing specific rows are ERP, CRM, accounting, and other transaction-dependent systems. Examples of not needing specific rows/values are data mining, inventory prediction algorithms, trend-based heuristic systems (mainly applicable in neural nets and robotics and AI). If you support both, you need to either isolate them and use different rules in each, or use the most pessmistic rules applicable.

    I have to even take isue with this. If your transactional data has a high degree of variability (our policies could range from 79$ to several million in premium), even screwing up one or two of the wrong records can have huge implications on your trends and averages. Like Gail pointed out, if the "lucky record" that either gets counted twice or missed entirely is the $8.7M policy that month, you'd better believe that your averages and trends will be off. If you make actual decisions based on those kinds of data (such as - do I need to reprice things because product xxx isn't selling well, etc...), you'd better be darn sure that you're doing it on solid data.

    Even if some fault tolerance were allowable, the question still comes back to: if you have a better way to do it, why settle for the "this might have issues" solution?

    Of course, most situations require clean reads. I was coming up with a few where dirty reads might not cause a problem. It takes a bit of a stretch to reach anything that matches that criterion.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • p-nut (11/17/2011)


    GSquared (11/17/2011)


    p-nut (11/17/2011)


    Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My brain is racking here... I still feel that something is not right, but I don't know how to explain it because I am ignorant on the subject. My thoughts are that if everything is committed, why would we worry about locks? Then the hint is not doing anything anyway... Right? Its too early for this... 🙂

    Jared

    It has nothing to do with transactions being left open. Unless the data is completely static (in which case NoLock is indeed doing nothing at all), it will have transactions going on in it.

    If all the transactions are inserts (only applicable if you have no non-clustered indexes and the clustered index is insert-at-the-end, or an unindexed heap) and deletes, or if the only updates are ones that cannot possibly change data length or any index sequence, then nolock won't result in issues caused by page splits. Only if the only transactions ever done are inserts, and if queries never need the most recent data, in that case, NoLock will never cause query issues. In all other cases, NoLock can cause incorrect information to enter processes.

    If all uses of the data are fault tollerant because they do not ever rely on specific values, then the incorrect data from NoLock issues is probably okay. This would include data used only to generate trends, averages, and so on. If any end user is depending on getting specific rows or values, then dirty reads are absolutely not okay. Examples of needing specific rows are ERP, CRM, accounting, and other transaction-dependent systems. Examples of not needing specific rows/values are data mining, inventory prediction algorithms, trend-based heuristic systems (mainly applicable in neural nets and robotics and AI). If you support both, you need to either isolate them and use different rules in each, or use the most pessmistic rules applicable.

    So, dirty reads won't be generated in some very unusual databases, and won't matter in other even more unusual databases. In the majority of cases where it could be appicable, snapshot isolation is still a better solution, but there are a few cases where the load on tempdb would be prohibitive. It's marginally possible your manager is correct about ALL of that. But, with odds being something like 999 to 1, the most likely scenario is that she doesn't actually understand the interactions between index reading, page splits, and row/page/table locks. Most of the DBA population, including many good DBAs, don't know all those details. It's a very, very common misunderstanding, even for people with years of experience, top notch educations, and so on.

    Maybe I can provide a business case to help you to help me 🙂 We route phone calls. Calls come in and a row is inserted here and there... (header, detail, etc.) Any routing reads a static table and then inserts a history record into a history table. Call ends, and 1 field in header is updated and 1 row updated in detail. Millions of these calls happen in a day. Reporting looks at 100k to 1mil intervals, not even at the 1000s level. So the reports are written WITH(NOLOCK) to allow concurrency. So basically our business has static routing and application tables and extremely active incoming/outgoing call log tables. Developers started the practice of WITH(NOLOCK) before DBAs came to the company, but my boss has not seen a justifiable reason to change this practice with the use of the data.

    Does this help explain my case so that advice can be offered?

    Thanks,

    Jared

    Do the reports work off of up-to-the-second data, or are they okay with data that's, to pick a number, 5 minutes old?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (11/18/2011)


    p-nut (11/17/2011)


    GSquared (11/17/2011)


    p-nut (11/17/2011)


    Ok, as it turns out she believes the reason is 2-fold. Her words: The first is that the nature of our data does not leave open transactions. Everything is committed. The second is that if a dirty read occurred, it would not affect the application because of the nature of the data.

    My brain is racking here... I still feel that something is not right, but I don't know how to explain it because I am ignorant on the subject. My thoughts are that if everything is committed, why would we worry about locks? Then the hint is not doing anything anyway... Right? Its too early for this... 🙂

    Jared

    It has nothing to do with transactions being left open. Unless the data is completely static (in which case NoLock is indeed doing nothing at all), it will have transactions going on in it.

    If all the transactions are inserts (only applicable if you have no non-clustered indexes and the clustered index is insert-at-the-end, or an unindexed heap) and deletes, or if the only updates are ones that cannot possibly change data length or any index sequence, then nolock won't result in issues caused by page splits. Only if the only transactions ever done are inserts, and if queries never need the most recent data, in that case, NoLock will never cause query issues. In all other cases, NoLock can cause incorrect information to enter processes.

    If all uses of the data are fault tollerant because they do not ever rely on specific values, then the incorrect data from NoLock issues is probably okay. This would include data used only to generate trends, averages, and so on. If any end user is depending on getting specific rows or values, then dirty reads are absolutely not okay. Examples of needing specific rows are ERP, CRM, accounting, and other transaction-dependent systems. Examples of not needing specific rows/values are data mining, inventory prediction algorithms, trend-based heuristic systems (mainly applicable in neural nets and robotics and AI). If you support both, you need to either isolate them and use different rules in each, or use the most pessmistic rules applicable.

    So, dirty reads won't be generated in some very unusual databases, and won't matter in other even more unusual databases. In the majority of cases where it could be appicable, snapshot isolation is still a better solution, but there are a few cases where the load on tempdb would be prohibitive. It's marginally possible your manager is correct about ALL of that. But, with odds being something like 999 to 1, the most likely scenario is that she doesn't actually understand the interactions between index reading, page splits, and row/page/table locks. Most of the DBA population, including many good DBAs, don't know all those details. It's a very, very common misunderstanding, even for people with years of experience, top notch educations, and so on.

    Maybe I can provide a business case to help you to help me 🙂 We route phone calls. Calls come in and a row is inserted here and there... (header, detail, etc.) Any routing reads a static table and then inserts a history record into a history table. Call ends, and 1 field in header is updated and 1 row updated in detail. Millions of these calls happen in a day. Reporting looks at 100k to 1mil intervals, not even at the 1000s level. So the reports are written WITH(NOLOCK) to allow concurrency. So basically our business has static routing and application tables and extremely active incoming/outgoing call log tables. Developers started the practice of WITH(NOLOCK) before DBAs came to the company, but my boss has not seen a justifiable reason to change this practice with the use of the data.

    Does this help explain my case so that advice can be offered?

    Thanks,

    Jared

    Do the reports work off of up-to-the-second data, or are they okay with data that's, to pick a number, 5 minutes old?

    They are ok with estimates, so 5 minutes is fine. Hell, a day is fine. They are really looking at the macro level.

    Jared

    Jared
    CE - Microsoft

  • In that case, I'd start looking into using snapshots for the reports to operate off of. You're almost certainly on Enterprise Edition, which means that's a real option for you. Look into them.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (11/21/2011)


    In that case, I'd start looking into using snapshots for the reports to operate off of. You're almost certainly on Enterprise Edition, which means that's a real option for you. Look into them.

    Great, thanks! Hopefully our setup with our tempdb files will handle it.

    Jared

    Jared
    CE - Microsoft

  • p-nut (11/21/2011)


    GSquared (11/21/2011)


    In that case, I'd start looking into using snapshots for the reports to operate off of. You're almost certainly on Enterprise Edition, which means that's a real option for you. Look into them.

    Great, thanks! Hopefully our setup with our tempdb files will handle it.

    Jared

    Snapshots (not snapshot isolation; similar but not the same) don't use tempdb. But they do give some options for querying data without lock/block issues.

    If taken too infrequently on high-volatility data, or held onto too long, they can eat up a lot of disk space, but they can be managed to avoid that. They don't hit tempdb like snapshot isolation does, which can be a performance issue.

    They do require more I/O than using nolock does, but they don't allow for dirty reads. You'll need to ballance those needs against each other.

    The other potential solution I'd look into, based on your data about slightly older data being acceptable, would be partionining. If you partition off data that's gone static (won't be updated any more), and just query that, then you don't need to use nolock, snapshots, snapshot isolation, et al. Just make the queries work on the static partition and you'll be fine. That option might be the best bet overall. A little (or a lot) rougher to implement, but definitely worth it if done correctly.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 61 through 75 (of 76 total)

You must be logged in to reply to this topic. Login to reply