January 21, 2016 at 3:19 pm
We have a procedure that selects data for report. Sometimes it's being blocked by a ETL process that loads data concurrently into the same table. We set trans isolation level to read uncommitted.
But now we face another problem: users want to isolate data written from dirty pages. E.g., we need to find out dynamically what records (or a set of records) came from dirty pages.
Is it possible to do?
Thanks
January 21, 2016 at 3:41 pm
I don't think that's possible.
Also, even if you know what data is from dirty pages, you still do not have the full picture. Using read uncommitted can also cause data to be processed twice, or to be skipped.
But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?
January 21, 2016 at 4:01 pm
Hugo Kornelis (1/21/2016)
I don't think that's possible.Also, even if you know what data is from dirty pages, you still do not have the full picture. Using read uncommitted can also cause data to be processed twice, or to be skipped.
But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?
+10000000000000000000000000000000000000000000000000000
--Jeff Moden
Change is inevitable... Change for the better is not.
January 21, 2016 at 6:54 pm
Is the ETL process working in such a way that it gets a table lock? If so, you may be able to remove the table lock and accept that you must then give up the advantages of minimal logging, depending on specifically how the ETL is being done.
Another thing to check is the clustered key on the table. It may be possible to adjust the clustering key on the table and avoid the blocking, but again it is very dependent on the specific details of your situation.
As noted, you could try some type of snapshot isolation, but be aware of the relatively very high overhead to implement it.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 22, 2016 at 1:52 am
Is there some reason you've decided not to use read committed snapshot or snapshot isolation?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 22, 2016 at 8:05 am
Hugo Kornelis (1/21/2016)
But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?
According to them, report's details should be known exactly. But I explained to them that while ETL process is loading, we have to wait (because we are blocked) or accept that some portions of data that is coming from "dirty" pages, may turn to be wrong when ETL process will finish.
They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.
ScottPletcher (1/21/2016)
Is the ETL process working in such a way that it gets a table lock? If so, you may be able to remove the table lock and accept that you must then give up the advantages of minimal logging, depending on specifically how the ETL is being done.
I don't think that it locks entire table, it's large table. I did not check, but I suspect that it issues page or extent locks.
ScottPletcher (1/21/2016)
As noted, you could try some type of snapshot isolation, but be aware of the relatively very high overhead to implement it.
GilaMonster (1/22/2016)
Is there some reason you've decided not to use read committed snapshot or snapshot isolation?
I also thought about snapshot and read committed snapshot isolation levels. But this database is large, over 400 GB, we have to test it a lot, and dedicate a lot of additional space to tempdb.
ScottPletcher (1/21/2016)
Another thing to check is the clustered key on the table. It may be possible to adjust the clustering key on the table and avoid the blocking, but again it is very dependent on the specific details of your situation.
This is probably what I want to try now at this situation. But what exactly should I adjust?
January 22, 2016 at 8:40 am
SQL Guy 1 (1/22/2016)
They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.
Nice idea, but how would you mark 4kb worth of rows that the report has missed due to the isolation level? It's not that that you'll get different values, it's that you will be able to read rows twice (or more even) and miss reading rows that exist.
I also thought about snapshot and read committed snapshot isolation levels. But this database is large, over 400 GB, we have to test it a lot, and dedicate a lot of additional space to tempdb.
t.b.h. Snapshot/read committed snapshot is made for your problem. If you enable snapshot and only use it for this report initially, then the testing is minimised, and you won't need 400GB in TempDB. You may need very little. Row versions are only held as long as they're needed, as determined by the length of transactions that use those rows.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 22, 2016 at 10:35 am
GilaMonster (1/22/2016)
SQL Guy 1 (1/22/2016)
They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.
Nice idea, but how would you mark 4kb worth of rows that the report has missed due to the isolation level? It's not that that you'll get different values, it's that you will be able to read rows twice (or more even) and miss reading rows that exist.
I also thought about snapshot and read committed snapshot isolation levels. But this database is large, over 400 GB, we have to test it a lot, and dedicate a lot of additional space to tempdb.
t.b.h. Snapshot/read committed snapshot is made for your problem. If you enable snapshot and only use it for this report initially, then the testing is minimised, and you won't need 400GB in TempDB. You may need very little. Row versions are only held as long as they're needed, as determined by the length of transactions that use those rows.
But the original rows tables also get 14 bytes per row added. That can cause significant or even severe page splitting.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 22, 2016 at 11:47 am
A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction. Normally (when default READ COMMITTED is in effect) readers will attempt to take out shared locks on pages to be read, and the lock manager will inform the reader of pages that are currently locked exclusively, so it can then wait. However, with READ UNCOMMITTED in effect (same thing as NOLOCK hint), readers do not bother taking out shared locks. If the 'read uncommitted' reader knew which pages were dirty, it would exclude them for you, but it probably has no knowledge of that in the first place.
If your SSIS package is using the 'fast load' option, then it's taking out a table lock, but depending on cirsumstances it might only make marginal improvements to the load time. So, consider disabling 'fast load' if it's enabled. Also, consider keeping isolation level of readers at default level of 'read committed'. It should result in short periods of blocking during a table load, if your table's clustering key is sequentially incrementing, meaning that newly written (locked) pages are appended sequentially to the end of the table.
SQL Server doesn't need to be told when to expand locks, when to ignore locks, or how to cluster rows in some special non-sequential fashion. When you do that, it will cost you in terms of performance and reliability.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 22, 2016 at 12:08 pm
SQL Guy 1 (1/22/2016)
Hugo Kornelis (1/21/2016)
But a way more important question is what you/they are trying to achieve. If the report is such that some margin of error in the numbers is acceptable, then why bother? And if the report is such that the details need to be known exactly (and any errors need to be known and corrected for), then you should simply not use read uncommitted. Perhaps using one of the snapshot isolation models is a better idea in your case?According to them, report's details should be known exactly. But I explained to them that while ETL process is loading, we have to wait (because we are blocked) or accept that some portions of data that is coming from "dirty" pages, may turn to be wrong when ETL process will finish.
They agreed with second option, but with condition that the "dirty" records will be marked somehow in the report. So they conditionally accept it.
If you are not prepared or not able to use snapshot isolation, then this is simply impossible. There may be ways to minimize the impact, but there is no way to do a data load without getting locks. And without snapshot, locks will always block reports that do not use dirty reads.
My recommendation is to tell "them" (management, I guess) that this is not possible. They have to choose to either get possibly incorrect data, or wait.
January 22, 2016 at 12:16 pm
According to them, report's details should be known exactly. But I explained to them that while ETL process is loading, we have to wait (because we are blocked) or accept that some portions of data that is coming from "dirty" pages, may turn to be wrong when ETL process will finish.
how long does this "blocking" last for ? (and how many times a day are doing this ETL?).....
what is the impact on management reporting in getting a "true" set of results....minutes/hours?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
January 22, 2016 at 2:44 pm
Eric M Russell (1/22/2016)
A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction.
Or a committed transaction. It's a page that has been modified since it was read into memory and has not been subject to a checkpoint (or lazy writer) since the last change.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 22, 2016 at 7:43 pm
GilaMonster (1/22/2016)
Eric M Russell (1/22/2016)
A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction.Or a committed transaction. It's a page that has been modified since it was read into memory and has not been subject to a checkpoint (or lazy writer) since the last change.
However, these committed (but non-hardened) pages are not transactionally inconsistent ate they, meaning they can be consistently used for a logical read?
In 2014, delayed durability or in-memory setting will defer hardening of committed pages.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 23, 2016 at 12:21 am
Eric M Russell (1/22/2016)
GilaMonster (1/22/2016)
Eric M Russell (1/22/2016)
A dirty page is simply a page that been written to during an active (ie: uncommitted) transaction.Or a committed transaction. It's a page that has been modified since it was read into memory and has not been subject to a checkpoint (or lazy writer) since the last change.
However, these committed (but non-hardened) pages are not transactionally inconsistent ate they, meaning they can be consistently used for a logical read?
Yes, they'll be consistent, but SQL still considers them as dirty pages (flag in the page header)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply