March 27, 2011 at 11:10 am
GilaMonster (3/25/2011)
No.SQL writes data modifications to data pages in the data cache. It writes the log records to disk at the point the transaction is committed (or sometimes before). At some point (maybe immediately, maybe not) the changed data pages are written to disk.
They can have been written before the transaction commits, that's perfectly valid and acceptable. The only unbreakable rule is log before data. Log records must be hardened in the log file before the data changes that the log records describe are written from the data cache to disk.
The primary purpose of the log is durability (once transaction completes the data changes must remain non matter what) and consistency. There are other processes that take advantage of the log to do other stuff.
Edit: A little clarification
But what happens if a checkpoint occurs in middle of a very long running transaction and then the transaction rolls back.
The dirty pages now have been written to disk which is undesirable in the above scenario but log entries aren't made to the log files as the transaction has rolled back.
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
March 27, 2011 at 11:17 am
Sachin Nandanwar (3/27/2011)
GilaMonster (3/25/2011)
No.SQL writes data modifications to data pages in the data cache. It writes the log records to disk at the point the transaction is committed (or sometimes before). At some point (maybe immediately, maybe not) the changed data pages are written to disk.
They can have been written before the transaction commits, that's perfectly valid and acceptable. The only unbreakable rule is log before data. Log records must be hardened in the log file before the data changes that the log records describe are written from the data cache to disk.
The primary purpose of the log is durability (once transaction completes the data changes must remain non matter what) and consistency. There are other processes that take advantage of the log to do other stuff.
Edit: A little clarification
But what happens if a checkpoint occurs in middle of a very long running transaction and then the transaction rolls back.
The dirty pages now have been written to disk which is undesirable in the above scenario but log entries aren't made to the log files as the transaction has rolled back.
And then the pages will be updated again with the "undo" operations for that transaction, and eventually flushed to disk again.
The log entries are still in the log file. They don't "vanish" after the transaction has been rolled back.
Please watch the videos I posted in my prior post- they explain it all, from the guy that was in charge of how it worked for many years.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
March 27, 2011 at 11:31 am
WayneS (3/27/2011)
And then the pages will be updated again with the "undo" operations for that transaction, and eventually flushed to disk again.
The log entries are still in the log file. They don't "vanish" after the transaction has been rolled back.
Please watch the videos I posted in my prior post- they explain it all, from the guy that was in charge of how it worked for many years.
How is that done ? By the next Checkpoint ? If yes then one will have to burden themselves with wrong set of data.
Downloading the videos....
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
March 27, 2011 at 11:37 am
Sachin Nandanwar (3/27/2011)
The dirty pages now have been written to disk which is undesirable in the above scenario but log entries aren't made to the log files as the transaction has rolled back.
As I said in the piece that you quoted:
The only unbreakable rule is log before data. Log records must be hardened in the log file before the data changes that the log records describe are written from the data cache to disk.
If a checkpoint occurs and the dirty pages are written to disk, the associated log records are written first. They MUST be. The protocol is called WAL (write ahead logging), I can refer you to the original academic paper if you wish.
When the transaction rolls back, SQL reads the log to see what was done, modifies the pages to reverse the effects then logs the undo operations (you can see a brief example of that at the end of my blog post http://sqlinthewild.co.za/index.php/2010/10/12/a-trio-of-table-variables/), then those undo log records will be hardened into the log file and the pages (which have now had the changes undone) will follow at a later point. A rollback does not remove log records from the log file.
This is one reason why the log file is essential, the mdf cannot be considered consistent at any point during SQL's operation because there can be changes that have been written into the data file that are later undone, changes written to the data file mid transaction, etc.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 27, 2011 at 12:12 pm
The only unbreakable rule is log before data.
It writes the log records to disk at the point the transaction is committed (or sometimes before)
Does it hold true for a checkpoint occurring before the end of a transaction no matter what the outcome of that transaction is ?The checkpoint will still flush dirty pages before the end of transaction and it might happen that log may write the log records after the end of the transaction the way you stated.
The above 2 quotes really seems to be contradicting if it does happen that the log records are written after the transaction has ended and a checkpoint occurs in middle of the transaction
It writes the log records to disk at the point the transaction is committed (or sometimes before)
So if it writes after the point transaction is committed ideally nothing should be written to the log after the point transaction is rolled back.If it does write something then what exactly it writes.In layman terms "10 records were being inserted and now the those 10 records will have to be rolled back."
But the user will have to still deal with those 10 records because they have been already flushed to the disk.
When the transaction rolls back, SQL reads the log to see what was done.SQL reads the log to see what was done
So what exactly it reads in layman's terms? Does the checkpoint does all this reading when it occurs next time ?
EDIT:Still haven't read your article in the blog.So may have missed if the above confusion has already been addressed in them.
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
March 27, 2011 at 12:23 pm
Sachin Nandanwar (3/27/2011)
Does it hold true for a checkpoint occurring before the end of a transaction no matter what the outcome of that transaction is ?The checkpoint will still flush dirty pages before the end of transaction and it might happen that log may write the log records after the end of the transaction the way you stated.The above 2 quotes really seems to be contradicting if it does happen that the log records are written after the transaction has ended and a checkpoint occurs in middle of the transaction
Um,
It writes the log records to disk at the point the transaction is committed (or sometimes before)
Emphasis. "or sometimes before"
At the point of commit is the latest possible point the log records can be written. They may be written to the transaction log earlier, there are a number of things, including a checkpoint, that will cause the log records to be flushed earlier.
The unbreakable rule is Log Before Data. If the data pages are going to disk, the log records MUST go first. Otherwise the database is inconsistent and suspect.
So if it writes after the point transaction is committed ideally nothing should be written to the log after the point transaction is rolled back.If it does write something then what exactly it writes.In layman terms "10 records were being inserted and now the those 10 records will have to be rolled back."
Did you go and look at the blog post? Right at the end I look at exactly what's in the transaction log for a 3-record insert where the third row failed and the whole lot gets rolled back.
As I stated before "A rollback does not remove log records from the log file."
But the user will have to still deal with those 10 records because they have been already flushed to the disk.
Huh? Why does the user have to deal with anything? The user doesn't read the data file. The user reads data from memory, and in memory the pages are correct as relating to any ongoing transaction.
When the transaction rolls back, SQL reads the log to see what was done.
So what exactly it reads in layman's terms? Does the checkpoint does all this reading when it occurs next time ?
No, the transaction in rollback reads the log to see what was done so that it can see what it needs to undo.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 27, 2011 at 12:44 pm
Those videos will really explain how, internally, the log file works.
I'm going to attempt to explain it here, with the caveat that if what I say happens to be different than what is in those videos, then those videos are correct and I'm wrong (again...)... but let me know so that I can learn.
1. A transaction (either implicit or explicit) begins.
2. Data records are modified in memory. The data page is also modified with the last LSN that completes this page.
3. At some point, either by a CHECKPOINT or by the "lazy writer" process, the data pages are flushed from memory to the physical data files. However, before this can be done, the transaction log entries that describe this change have to be flushed out to the physical transaction log file.
4. When a rollback occurs, the transactions are looked at, and "undo" operations that reverse your transaction are created. (i.e. if you were inserting records, a delete is performed. If deleting, an insert. If updating, then the old values are put back.) These undo operations modify the data page again, and are logged in the transaction log. The data page is updated with the latest LSN again. (If you think about a busy system, another process could have modified other records on this same page since your process wrote it's initial changes, so you can't just throw the dirty page out and reload it from disk - you have to write out what you're reversing to the data page again.)
5. At some point, these data pages are again flushed to disk again. (and again, after the log entries have been pushed to disk in the transaction log).
If you were to have a crash after the dirty pages had been initially written to disk, but before the transaction had completed, it just means that the undo portion of crash recovery has a little bit more work to do to reverse those changes.
So, if you were to have a db in full recovery model, perform a full backup, and then run a transaction that updates 100,000 rows and rolls them back, and then run a differential backup, and nothing else was going on in the database while you were doing this, then the differential backup would still backup all of the data pages for those 100,000 records because, as far as SQL is concerned, they have been updated twice. (The net effect of them is that they are back to their original values.)
Note that nothing has been said about deleting the log records. The log records are never deleted. Once the log records that describe that transaction are no longer needed (t-log backup has occurred in full/bulk-logged mode / checkpoint has occurred in simple, and not needed for mirroring / replication), then it's marked as not needed. When all of the log entries that make up a VLF are no longer needed, the VLF is marked as not needed. When the transaction log gets to the end of the file, it looks to see if it can wrap around and reuse a VLF (VLF marked as not needed). If so, it will start using that VLF again, and at some point it will overwrite the previous log entry.
I think that this is all correct - I'm sure that if it's not, I'll hear about it.
Hope this helps!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
March 27, 2011 at 12:53 pm
You might also want to take a look at this BOL entry: Checkpoints and the Active Portion of the Log
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
March 27, 2011 at 12:54 pm
Wayne now your explanation is exactly SPOT ON.Now the whole process has become crystal clear to me just like the cookie analogy Brandie had posted earlier.
Thanks to you Wayne and Gail.
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
March 27, 2011 at 1:22 pm
One minor addition to Wayne's notes.
At the point that the commit or rollback completes, the log records for that transaction are guaranteed to be in the log file. They may be written at the point of the commit/rollback, they may be written earlier if a checkpoint runs or if the log buffer is flushed by another transaction, but they must be in the log file at the point that the transaction completes (regardless of whether it committed or rolled back)
That's what gives SQL the durability requirement of ACID, once a transaction's chances are hardened in the log, it's permanent regardless of what happens to the data pages. The server can crash an instant later and the data changes still in memory lost, it doesn't matter, as long as they are hardened in the log file they will be replayed upon restart.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 27, 2011 at 2:42 pm
Sachin Nandanwar (3/27/2011)
Wayne now your explanation is exactly SPOT ON.Now the whole process has become crystal clear to me just like the cookie analogy Brandie had posted earlier.Thanks to you Wayne and Gail.
Great, glad to hear it. Now, please do go view those videos! They really do help you understand things. (I'd recommend getting all of them - I'm listening to the MP3's of them on my daily commute to/from work, and I'm really learning a lot. If you start with the first ones, they pretty much all build off of that.) And you just can't get a better teacher of how the SQL Server Storage Engine works than from the guy that was in charge of it for almost a decade.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
March 28, 2011 at 2:50 am
Wayne,
Surely I will view the videos. Due to limited access I am unable to do it right now in my office 🙁
But just one last confusion if you can clear me on that.When the data is written to the disk in middle of a transaction and a checkpoint occurs then how is the old value then again restored on the data pages which are already flushed to the disk ?
Writting to the log and again rewriting the deletes on the log files in case of rollbacks is fine but just not sure which process will update the pages on the disk with the old values in case of rollbacks.
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
March 28, 2011 at 2:56 am
Sachin Nandanwar (3/28/2011)
Writting to the log and again rewriting the deletes on the log files in case of rollbacks is fine but just not sure which process will update the pages on the disk with the old values in case of rollbacks.
The rollback changes the data pages in memory to their original values. A later checkpoint will flush them to disk again.
Remember all data reads and writes are done solely to pages in memory, only the lazy writer and checkpoint will ever write pages to disk.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 28, 2011 at 3:07 am
The rollback changes the data pages in memory to their original values. A later checkpoint will flush them to disk again.
This is where I am getting confused.Just consider the scenario where the checkpoint occurs in middle of an inflight transaction.
So the data pages are already on to the disk due to the checkpoint.It means now they are not in the memory for the rollback to change them to its original values.
Now the transaction rollbacks.So how will the data pages on the disk(not in the buffer memory since they are absent due to the checkpoint occuring during transaction) restored to the rollbacked values ?
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
March 28, 2011 at 3:12 am
Sachin Nandanwar (3/28/2011)
So the data pages are already on to the disk due to the checkpoint.It means now they are not in the memory for the rollback to change them to its original values.
Why do you say that?
Checkpoint writes the changed pages to disk. It does not flush the data cache, that would be extremely sub-optimal if the data cache was completely cleared every couple minutes.
Write data pages to disk != remove pages from memory.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 46 through 60 (of 69 total)
You must be logged in to reply to this topic. Login to reply