October 17, 2016 at 9:02 am
Since two weeks, my differential backups are getting too big. I have a 4 TB database. Our backup schedule: full backup every week, differential backup every night, log backup every 5 minutes. The full backup is 800 GB (compressed). After a few days of small differential backups, suddenly the differential backups become around 700 GB. This has happened this week, and last week. This database is not very active. All transaction logs are around 200kB, and the amount of changed extents is just above a thousand (according to a script from Paul Randal), less than 0.01%. So I'd expect a tiny differential backup.
There are two more things that are worth mentioning: the database is a subscriber in transactional replication, and CDC is used on these articles. Both replication and CDC jobs appear to be working fine.
At this moment, I have no idea where to look. Can anybody help?
October 17, 2016 at 9:18 am
Was the TR and CDC in place for a good period of time before the large DIF problem occurred?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 17, 2016 at 9:51 am
1) What is the exact script you used to identify changed extents?
2) What changed between the period where DIFFs were always small and the recent issues?
3) What activity was done the day/night that preceded the huge DIFF this week? Was last week the same scenario (small for one or two then huge on same day)?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 17, 2016 at 10:36 am
Another good bit of info to get would be whether most of the data backed up for the large differentials is from the data files or the log file.
Something like the following query should point that out:
SELECT bs.database_name,
bs.backup_start_date,
backup_MB=SUM(bf.backed_up_page_count)/128,
bf.file_type
FROM msdb.dbo.backupset bs
INNER JOIN
msdb.dbo.backupfile bf ON bs.backup_set_id=bf.backup_set_id
WHERE bs.type='I' --differential backups
AND
bs.database_name='your_database_name'
GROUP BY bs.database_name,
bs.backup_start_date,
bf.file_type
ORDER BY bs.backup_start_date DESC,
backup_MB DESC
Cheers!
October 17, 2016 at 11:08 am
@jeff: yes, the TR and CDC were in place several weeks before we've been having this problem.
1) What is the exact script you used to identify changed extents?
2) What changed between the period where DIFFs were always small and the recent issues?
The backup share filled up. Otherwise, I have no idea. All the "usual suspects" claim they did nothing, and I have no logging to suggest otherwise.
3) What activity was done the day/night that preceded the huge DIFF this week? Was last week the same scenario (small for one or two then huge on same day)?
The only thing I know of, is the database reorganisation. But since this finished in zero seconds, and didn't cause the transaction log backups to grow above 200 kB, I'm guessing that's not it.
@jacob: thanks for the script, I'll run that tomorrow morning.
October 17, 2016 at 11:23 am
Robert vd Berg (10/17/2016)
@Jeff: yes, the TR and CDC were in place several weeks before we've been having this problem.1) What is the exact script you used to identify changed extents?
2) What changed between the period where DIFFs were always small and the recent issues?
The backup share filled up. Otherwise, I have no idea. All the "usual suspects" claim they did nothing, and I have no logging to suggest otherwise.
3) What activity was done the day/night that preceded the huge DIFF this week? Was last week the same scenario (small for one or two then huge on same day)?
The only thing I know of, is the database reorganisation. But since this finished in zero seconds, and didn't cause the transaction log backups to grow above 200 kB, I'm guessing that's not it.
@jacob: thanks for the script, I'll run that tomorrow morning.
When posting links, please use the [ url] and [ /url] (no spaces inside the square brackets) to make the link "clickable".
October 17, 2016 at 11:37 am
Thinking outside the box:
1) Got filestream by any chance?
2) In-Memory OLTP?
3) Columnstore indexes?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 17, 2016 at 11:42 am
October 17, 2016 at 12:37 pm
Index reorgs / rebuilds?
October 17, 2016 at 2:58 pm
Robert vd Berg (10/17/2016)
Hello Kevin,"No" to all three.
Greetings,
Robert
Next guess is perhaps the algorithm Paul posted in 2008 is off for 2014 databases for some reason. I do note he specifically mentions large databases as a potential issue.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 17, 2016 at 6:52 pm
Right now, I'm thinking that either someone made a code change that updates more of the database than before or someone "got the religion" about index maintenance or similar. I also think that the code that Jacob Wilkins posted above might give you a clue about that. Have you tried running his code? And are you sure that the weekly backup actually took place? Lot's of folks have backup code that won't fail if something goes wrong with a particular database backup but (hopefully) does log the failure. You might want to check MSDB for backup failures.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2016 at 12:42 am
I ran the code from Jacob. This is the result for last nights backup:
backup_start_datebackup_MBfile_type
2016-10-17 21:40:38.0001723566.125000D
2016-10-17 21:40:38.0001.609375L
That is what I would expect, since the log file is practically empty, and all log backups are small (these small log backups are also why I have no doubt about the script from Paul Randal).
@jeff: yes, I did double check that we in fact do have a full backup, both on disk and using the following code:
SELECTdatabase_name, type, backup_finish_date, backup_size
FROMmsdb..backupset
WHEREbackup_finish_date > '2016-10-01'
and database_name = 'my database'
and type <> 'L';
My next move will be to make a native backup (instead of using Red Gate). I doubt this will solve the problem, but it's worth a try.
October 19, 2016 at 2:34 am
I found the cause, so I thought I'd let you know. Between the small diffs and the larger diffs was a failed full backup (due to the backup share being full). I replicated that, by forcing a full backup and, while that was running, killing the backup service. The next diff is currently running, and it will be large, even though the number of changed extents was lower after the crashed full backup than before.
October 19, 2016 at 8:41 am
Robert vd Berg (10/19/2016)
I found the cause, so I thought I'd let you know. Between the small diffs and the larger diffs was a failed full backup (due to the backup share being full). I replicated that, by forcing a full backup and, while that was running, killing the backup service. The next diff is currently running, and it will be large, even though the number of changed extents was lower after the crashed full backup than before.
Please open a ticket with Microsoft about this, or at least report it on Connect. I wonder if it should work that way and/or if there is something that can be done to make it not do that. I suspect not, but can't hurt to ask. At first blush it sounds like all the bit-mapped DIFF pages are wiped clean at the start of the FULL backup but I would think perhaps this shouldn't be done until the very last thing to avoid this very scenario.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 19, 2016 at 9:54 am
Thanks for that update!
From looking at the log it looks like it probably is something along the lines of what Kevin suggested.
For both failed and successful full backups, there's a transaction "Backup:InvalidateDiffMaps" that occurs very early in the backup, and Paul Randal suggests at http://www.sqlskills.com/blogs/paul/debunking-a-couple-of-myths-around-full-database-backups/ is the clearing of the DCM, which makes sense.
The failed backup just seems to be missing the commit of the differential base, and some writes to the file headers and boot pages, which is expected.
It seems then that SQL Server knows it can't rely on the diff maps for subsequent differentials, and does differentials differently in that case.
I can reproduce the increased size easily enough by cancelling a full backup and then running a diff.
Differential backups after a failed full also switch to processing every page in the DB, even though they doesn't necessarily include all of them in the backup.
This behavior was also pointed out in the blog post here: https://sqlscope.wordpress.com/2014/12/06/failed-full-backup-can-impact-next-differential-backup/
It definitely seems like something intentional on the part of MS, but it would be nice if it were better (at all) documented.
Cheers!
EDIT: Added what detail I was able to get from the log.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply