How to figure out whether data is selected from dirty pages?

  • We have a procedure that selects data for report. Sometimes it's being blocked by a ETL process that loads data concurrently into the same table. We set trans isolation level to read uncommitted.

    But now we face another problem: users want to isolate data written from dirty pages. E.g., we need to find out dynamically what records (or a set of records) came from dirty pages.

    Is it possible to do?

    Thanks

  • I don't think that's possible.

    Also, even if you know what data is from dirty pages, you still do not have the full picture. Using read uncommitted can also cause data to be processed twice, or to be skipped.

    But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (1/21/2016)


    I don't think that's possible.

    Also, even if you know what data is from dirty pages, you still do not have the full picture. Using read uncommitted can also cause data to be processed twice, or to be skipped.

    But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?

    +10000000000000000000000000000000000000000000000000000

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Is the ETL process working in such a way that it gets a table lock? If so, you may be able to remove the table lock and accept that you must then give up the advantages of minimal logging, depending on specifically how the ETL is being done.

    Another thing to check is the clustered key on the table. It may be possible to adjust the clustering key on the table and avoid the blocking, but again it is very dependent on the specific details of your situation.

    As noted, you could try some type of snapshot isolation, but be aware of the relatively very high overhead to implement it.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Is there some reason you've decided not to use read committed snapshot or snapshot isolation?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hugo Kornelis (1/21/2016)


    But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?

    According to them, report's details should be known exactly. But I explained to them that while ETL process is loading, we have to wait (because we are blocked) or accept that some portions of data that is coming from "dirty" pages, may turn to be wrong when ETL process will finish.

    They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.

    ScottPletcher (1/21/2016)


    Is the ETL process working in such a way that it gets a table lock? If so, you may be able to remove the table lock and accept that you must then give up the advantages of minimal logging, depending on specifically how the ETL is being done.

    I don't think that it locks entire table, it's large table. I did not check, but I suspect that it issues page or extent locks.

    ScottPletcher (1/21/2016)


    As noted, you could try some type of snapshot isolation, but be aware of the relatively very high overhead to implement it.

    GilaMonster (1/22/2016)


    Is there some reason you've decided not to use read committed snapshot or snapshot isolation?

    I also thought about snapshot and read committed snapshot isolation levels. But this database is large, over 400 GB, we have to test it a lot, and dedicate a lot of additional space to tempdb.

    ScottPletcher (1/21/2016)


    Another thing to check is the clustered key on the table. It may be possible to adjust the clustering key on the table and avoid the blocking, but again it is very dependent on the specific details of your situation.

    This is probably what I want to try now at this situation. But what exactly should I adjust?

  • SQL Guy 1 (1/22/2016)


    They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.

    Nice idea, but how would you mark 4kb worth of rows that the report has missed due to the isolation level? It's not that that you'll get different values, it's that you will be able to read rows twice (or more even) and miss reading rows that exist.

    I also thought about snapshot and read committed snapshot isolation levels. But this database is large, over 400 GB, we have to test it a lot, and dedicate a lot of additional space to tempdb.

    t.b.h. Snapshot/read committed snapshot is made for your problem. If you enable snapshot and only use it for this report initially, then the testing is minimised, and you won't need 400GB in TempDB. You may need very little. Row versions are only held as long as they're needed, as determined by the length of transactions that use those rows.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/22/2016)


    SQL Guy 1 (1/22/2016)


    They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.

    Nice idea, but how would you mark 4kb worth of rows that the report has missed due to the isolation level? It's not that that you'll get different values, it's that you will be able to read rows twice (or more even) and miss reading rows that exist.

    I also thought about snapshot and read committed snapshot isolation levels. But this database is large, over 400 GB, we have to test it a lot, and dedicate a lot of additional space to tempdb.

    t.b.h. Snapshot/read committed snapshot is made for your problem. If you enable snapshot and only use it for this report initially, then the testing is minimised, and you won't need 400GB in TempDB. You may need very little. Row versions are only held as long as they're needed, as determined by the length of transactions that use those rows.

    But the original rows tables also get 14 bytes per row added. That can cause significant or even severe page splitting.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction. Normally (when default READ COMMITTED is in effect) readers will attempt to take out shared locks on pages to be read, and the lock manager will inform the reader of pages that are currently locked exclusively, so it can then wait. However, with READ UNCOMMITTED in effect (same thing as NOLOCK hint), readers do not bother taking out shared locks. If the 'read uncommitted' reader knew which pages were dirty, it would exclude them for you, but it probably has no knowledge of that in the first place.

    If your SSIS package is using the 'fast load' option, then it's taking out a table lock, but depending on cirsumstances it might only make marginal improvements to the load time. So, consider disabling 'fast load' if it's enabled. Also, consider keeping isolation level of readers at default level of 'read committed'. It should result in short periods of blocking during a table load, if your table's clustering key is sequentially incrementing, meaning that newly written (locked) pages are appended sequentially to the end of the table.

    SQL Server doesn't need to be told when to expand locks, when to ignore locks, or how to cluster rows in some special non-sequential fashion. When you do that, it will cost you in terms of performance and reliability.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • SQL Guy 1 (1/22/2016)


    Hugo Kornelis (1/21/2016)


    But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?

    According to them, report's details should be known exactly. But I explained to them that while ETL process is loading, we have to wait (because we are blocked) or accept that some portions of data that is coming from "dirty" pages, may turn to be wrong when ETL process will finish.

    They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.

    If you are not prepared or not able to use snapshot isolation, then this is simply impossible. There may be ways to minimize the impact, but there is no way to do a data load without getting locks. And without snapshot, locks will always block reports that do not use dirty reads.

    My recommendation is to tell "them" (management, I guess) that this is not possible. They have to choose to either get possibly incorrect data, or wait.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • According to them, report's details should be known exactly. But I explained to them that while ETL process is loading, we have to wait (because we are blocked) or accept that some portions of data that is coming from "dirty" pages, may turn to be wrong when ETL process will finish.

    how long does this "blocking" last for ? (and how many times a day are doing this ETL?).....

    what is the impact on management reporting in getting a "true" set of results....minutes/hours?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Eric M Russell (1/22/2016)


    A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction.

    Or a committed transaction. It's a page that has been modified since it was read into memory and has not been subject to a checkpoint (or lazy writer) since the last change.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/22/2016)


    Eric M Russell (1/22/2016)


    A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction.

    Or a committed transaction. It's a page that has been modified since it was read into memory and has not been subject to a checkpoint (or lazy writer) since the last change.

    However, these committed (but non-hardened) pages are not transactionally inconsistent ate they, meaning they can be consistently used for a logical read?

    In 2014, delayed durability or in-memory setting will defer hardening of committed pages.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (1/22/2016)


    GilaMonster (1/22/2016)


    Eric M Russell (1/22/2016)


    A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction.

    Or a committed transaction. It's a page that has been modified since it was read into memory and has not been subject to a checkpoint (or lazy writer) since the last change.

    However, these committed (but non-hardened) pages are not transactionally inconsistent ate they, meaning they can be consistently used for a logical read?

    Yes, they'll be consistent, but SQL still considers them as dirty pages (flag in the page header)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply