Full backup Scenario

Question

Post reply

Full backup Scenario

The LSN of the last checkpoint (note the backup operation starts by issuing a checkpoint)
The LSN of the start of the oldest active transaction
The LSN of the last replicated transaction

See http://sqlkpi.com/BLOGS/PAUL/post/Debunking-a-couple-of-myths-around-full-database-backups.aspx

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

suja.jamesgeorge Valued Member Points: 53 More actions · Answer 1

Hello everyone,

I am delighted to see a good response to the question I posted long back.;-)

Yeah, first of all let me admit that the explanation for the answer was not at all upto the technical standards. Because, I am new to SQL Server and this was a question which I faced in one of my interviews. From the posts in response to the question, I could make out that there was an ambiguity whether a t-log backup was restored, after restoring the full backup with no recovery. Actually what I meant was, whether the full backup after restoring with recovery (which is by default) would insert all the 10000 rows or just a few or not at all.

I just wanted to know the consequence after restoring a full backup. T-log backup was out of question.

Thanks once again

5uj@ J@me5

🙂

vk-kirov SSCertifiable Points: 7686 More actions · Answer 2

Tom.Thomson (3/19/2010)
vk-kirov (3/16/2010)
Hugo Kornelis (3/16/2010)
the backup first copies all the data pages, then adds all the log pages starting from the start of the oldest uncommitted transaction right up until the time the backup ends.
I would like to add: "from the start of the oldest uncommitted transaction (which was in an uncommitted state when the backup started)", because without this remark one may think "the oldest uncommitted transaction which was in an uncommitted state when the backup finished to copy data pages" 🙂
If I undertand it rightly, the correct wording is more like "from the start of the oldest transaction that was uncommitted at the start of the backup, or from the first log entry after the start of the backup if there is no such transaction".

Yes, this is much better. I forgot about the case "there's no uncommitted transaction at all". Thank you for pointing it out 🙂

I just wanted to say that the phrase "the oldest uncommitted transaction" might be ambiguous.

Tom.Thomson (3/19/2010)
But a single select statement may write more pages than can be held in RAM and this means that dirty pages get to disc

"Insert statement", I believe?

There can be other reasons for writing dirty pages to disk (http://msdn.microsoft.com/en-us/library/aa337560.aspx):

Writing Pages
A dirty page is written to disk in one of three ways.
•Lazy writing
The lazy writer is a system process that keeps free buffers available by removing infrequently used pages from the buffer cache. Dirty pages are first written to disk.
•Eager writing
The eager write process writes dirty data pages associated with nonlogged operations such as bulk insert and select into. (OK, there's a logged operation in the question)
•Checkpoint

TomThomson SSC Guru Points: 104773 More actions · Answer 3

vk-kirov (3/20/2010)
Tom.Thomson (3/19/2010)
But a single select statement may write more pages than can be held in RAM and this means that dirty pages get to disc
"Insert statement", I believe?

Yes, of course. Silly me! :blush:

Tom

Chris Houghton SSCrazy Points: 2557 More actions · Answer 4

I got this wrong, i went for A. Excellent question and one that I should have answered correctly since I have been reading up on this topic recently. The important thing is that I learnt something. My thanks to the author.

Festeron SSC Eights! Points: 927 More actions · Answer 5

Tom.Thomson (3/20/2010)
vk-kirov (3/20/2010)
Tom.Thomson (3/19/2010)
But a single select statement may write more pages than can be held in RAM and this means that dirty pages get to disc
"Insert statement", I believe?
Yes, of course. Silly me! :blush:

Can I ask you about the other end of the backup?

At what point does the backup decide it's time to stop backing up, even though there may be transactions still in flight? Does there come a time when backup decides to stop chasing its tail?

Or does it not back up anything in the log past the point at which the backup began?

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 6

Festeron (3/29/2010)
Tom.Thomson (3/20/2010)
vk-kirov (3/20/2010)
Tom.Thomson (3/19/2010)
But a single select statement may write more pages than can be held in RAM and this means that dirty pages get to disc
"Insert statement", I believe?
Yes, of course. Silly me! :blush:
Can I ask you about the other end of the backup?
At what point does the backup decide it's time to stop backing up, even though there may be transactions still in flight? Does there come a time when backup decides to stop chasing its tail?
Or does it not back up anything in the log past the point at which the backup began?

As far as I know, it stops as soon as it reaches the end of the log file. That means that, in theory, you could envisage a scenario where the log file grows faster than the backup can read it, and the backup won't terminate. Though I am pretty sure that in practice, you'll never be able to make that happen.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

TomThomson SSC Guru Points: 104773 More actions · Answer 7

Festeron (3/29/2010)
Can I ask you about the other end of the backup?
At what point does the backup decide it's time to stop backing up, even though there may be transactions still in flight? Does there come a time when backup decides to stop chasing its tail?
Or does it not back up anything in the log past the point at which the backup began?

Logically teh minimum that has to happen is that at the beginning of the backup it notes the point in the log where the oldest currently active transaction began, or the current position if there is no currently active transaction - call that the start position; then it copies all the database pages from disc to the backup media, and notes the current position in the log as the end position; then it writed all the log entries from the start position to the end position to the backup.

There are a couple of additional things backup could do to improvve performance: at some point before starting to write pages to backup media, it could write all dirty pages from cache to disc - this could be done either before or after (preferably before) working out the log start position; and it could write extra pages of the log to disc after the log stop position - either to the end of the log, or up to the last log entry that has commited a transaction. But there are a couple of issues with these "performance enhancements": (a) if pages are being dirtied fast enough the first trick will slow backup by more than in speeds recovery, so not much point to it; (b) extra log data after the stop point is all very well but theoretically with the end point defined either of the ways suggested it's possible that the log is growing fast enough that the backup will never terminate (and there's certainly some practical risk that it will take an excessive time to terminate). So MS may have kept it simple or may have done calculations which tell them which way will generally be best to go; only someone who knows their implementation of backup inside out will know what they have actually chosen to do (I certainly don't).

Tom

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 8

Tom.Thomson (3/29/2010)
(...)then it copies all the database pages from disc to the backup media

As far as I know, the backup process copies pages from cache to the backup file, not from disk. Pages not in cache will be read in first, just as for any other database operation.

There are a couple of additional things backup could do to improvve performance: at some point before starting to write pages to backup media, it could write all dirty pages from cache to disc

And how would that enhance performance? Either the backup copies pages from cache (in which case flushing dirty pages first doesn't change anything, except for the extra time taken to flush them), or it copies pages from disk (in which case flushing the dirty pages first influences which version of the data is in the backup(*), but not the backup time -again, except for the time taken to flush the dirty pages).

(*) ... and since the restore process will use the log portion of the backup to roll forward or roll back all transactions that were in process during the backup, it does not really matter which version of a dirty page is backed up.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Paul White SSC Guru Points: 150468 More actions · Answer 9

Can't believe I got this one wrong :blush: :blush: :blush:

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

TomThomson SSC Guru Points: 104773 More actions · Answer 10

Hugo Kornelis (3/30/2010)
Tom.Thomson (3/29/2010)
(...)There are a couple of additional things backup could do to improvve performance: at some point before starting to write pages to backup media, it could write all dirty pages from cache to disc
And how would that enhance performance? Either the backup copies pages from cache (in which case flushing dirty pages first doesn't change anything, except for the extra time taken to flush them), or it copies pages from disk (in which case flushing the dirty pages first influences which version of the data is in the backup(*), but not the backup time -again, except for the time taken to flush the dirty pages).

Obviously it depends on what exactly the backup mechanism chosen by the designers of SQL Server is. On platforms which provide media to media transfer without passing through the computer's RAM copying from disc to cache and then from cache to backup media is inefficient - - but I have no idea whether any version of windows can use that feature even when hardware provides it so it may be that pushing stuff out to disc so that the copy process doesn't have to alternate between disc and RAM as source won't help by maximising the benefit of autonomous copy, and even if Windows can support this there's no guarantee that SQLS would support it. It is however true that even without autonomous copy it is faster, on some platforms, to arrange things so that very large chunks are written rather than a larger number of small ones, and in that case it's still possible that making it possible to transfer everything from disc and not refer to cache may give a performance gain; but again SQLS may have chosen not to do this as it may not be worth the effort (it may even be counterproductive on some platforms).

Tom

Paul White SSC Guru Points: 150468 More actions · Answer 11

Hugo Kornelis (3/29/2010)
As far as I know, it stops as soon as it reaches the end of the log file. That means that, in theory, you could envisage a scenario where the log file grows faster than the backup can read it, and the backup won't terminate. Though I am pretty sure that in practice, you'll never be able to make that happen.

That is not possible, no. The backup operation only requires log records up to the point that the backup data read completes.

See Understanding SQL Server Backups by Paul Randal for a neat diagram to explain that statement visually.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 12

Tom.Thomson (3/19/2010)
If I undertand it rightly, the correct wording is more like "from the start of the oldest transaction that was uncommitted at the start of the backup, or from the first log entry after the start of the backup if there is no such transaction".

The start LSN of the backup is the minimum of:

Paul White SSC Guru Points: 150468 More actions · Answer 13

Tom.Thomson (3/29/2010)
There are a couple of additional things backup could do to improve performance: at some point before starting to write pages to backup media, it could write all dirty pages from cache to disc - this could be done either before or after (preferably before) working out the log start position; and it could write extra pages of the log to disc after the log stop position - either to the end of the log, or up to the last log entry that has commited a transaction. But there are a couple of issues with these "performance enhancements": (a) if pages are being dirtied fast enough the first trick will slow backup by more than in speeds recovery, so not much point to it; (b) extra log data after the stop point is all very well but theoretically with the end point defined either of the ways suggested it's possible that the log is growing fast enough that the backup will never terminate (and there's certainly some practical risk that it will take an excessive time to terminate). So MS may have kept it simple or may have done calculations which tell them which way will generally be best to go; only someone who knows their implementation of backup inside out will know what they have actually chosen to do (I certainly don't).

The whole backup operation is conceptually quite simple:

Force a database checkpoint (flush all updated-in-memory pages to disk before anything is read by the backup)

Record the minimum recovery LSN (LSN1)

Database data read begins

Database data read ends

Record the current LSN (LSN2)

Read the transaction log from LSN1 to LSN2

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Peter Trast SSCarpal Tunnel Points: 4392 More actions · Answer 14

Paul White NZ (3/31/2010)
Tom.Thomson (3/29/2010)
There are a couple of additional things backup could do to improve performance: at some point before starting to write pages to backup media, it could write all dirty pages from cache to disc - this could be done either before or after (preferably before) working out the log start position; and it could write extra pages of the log to disc after the log stop position - either to the end of the log, or up to the last log entry that has commited a transaction. But there are a couple of issues with these "performance enhancements": (a) if pages are being dirtied fast enough the first trick will slow backup by more than in speeds recovery, so not much point to it; (b) extra log data after the stop point is all very well but theoretically with the end point defined either of the ways suggested it's possible that the log is growing fast enough that the backup will never terminate (and there's certainly some practical risk that it will take an excessive time to terminate). So MS may have kept it simple or may have done calculations which tell them which way will generally be best to go; only someone who knows their implementation of backup inside out will know what they have actually chosen to do (I certainly don't).
The whole backup operation is conceptually quite simple:
Force a database checkpoint (flush all updated-in-memory pages to disk before anything is read by the backup)
Record the minimum recovery LSN (LSN1)
Database data read begins
Database data read ends
Record the current LSN (LSN2)
Read the transaction log from LSN1 to LSN2

Wow, simple explanation. Thanks!

Peter Trast
Microsoft Certified ...(insert many literal strings here)
Microsoft Design Architect with Alexander Open Systems

Paul White SSC Guru Points: 150468 More actions · Answer 15

Thank you, Peter. Of course, there are some edge cases and subtleties, but unless Mr Randal appears and tells me I know nothing about nothing...I'm happy with it.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi