March 13, 2017 at 8:45 am
You say you have 18,000 data files. are those all on different disks? If not, you will not really get a performance benefit and I can see that causing a lot of slowdown in the restore as you will need to jump around a LOT on disk with each write to the datafile.
Is your backup and log on the same disk as your datafiles? This will slow them down as well.
As for that "PostRestoreContainerFixups: fixups complete" message, I am not 100% sure what that does and information online is a bit lacking. I tried looking for information on it as well and came up empty. Except in all cases that I did find mention of it, it looked like it completed rather quickly (2-5 seconds in a 3700 second restore).
I have worked on databases that have multiple data files, but never anything with the numbers that you are, and am thinking that might be your issue?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 13, 2017 at 9:52 am
bmg002 - Monday, March 13, 2017 8:45 AMYou say you have 18,000 data files. are those all on different disks? If not, you will not really get a performance benefit and I can see that causing a lot of slowdown in the restore as you will need to jump around a LOT on disk with each write to the datafile.
Is your backup and log on the same disk as your datafiles? This will slow them down as well.As for that "PostRestoreContainerFixups: fixups complete" message, I am not 100% sure what that does and information online is a bit lacking. I tried looking for information on it as well and came up empty. Except in all cases that I did find mention of it, it looked like it completed rather quickly (2-5 seconds in a 3700 second restore).
I have worked on databases that have multiple data files, but never anything with the numbers that you are, and am thinking that might be your issue?
yes, dfs and backup file are on the same disk, like said before, this is a test server, but restore/recovery phase ends in about 6-7h, I dnot understand what he is doing in the other 7 hours.
Maybe the number of df cause this behavior
March 13, 2017 at 10:20 am
That'd be my next guess. I would think that SQL would need to do some form of sanity check to make sure that the data pages are all properly in the data files and that it knows how to link the different data files.
Since this is a test DB, can you dump all of the data from the extra data files into the 1 data file, do a backup and then restore and see if there is any improvement?
Sucky part about doing that is that you are going to have to do 17,999 SHRINKFILE with the EMPTYFILE option and ALTER DATABASE to remove the extra mdf's which may not be easy to code up. I'd also look at pre-growing the primary mdf as you go. You won't be able to pre-grow it to the full 9 TB at once (as you don't have disk space for that), but maybe grow it by 1 TB, then kill off however many secondary mdf's as you can until you free up 1 TB then pre-grow the primary again and repeat until you are at 1 mdf.
For replicated code that has an easy counter variable, it is likely easiest to use dynamic SQL to generate the code. Otherwise you are going to have (17,999 * 3) + 1 lines of code to type up.
Since everything is all on the same disk, there isn't as much benefit having multiple data files. I believe if they are SSD's you get a little bit of a performance increase having more data files, but it is a trial and error thing. And I highly doubt that 18,000 data files would give you any performance increase. Especially since the max threads would be limited by the number of CPU's minus ones that are in use by other processes.
I suppose if you were running this on a FAT32 partition, you'd need a LOT of data files as I believe that has a 4 GB per file limitation (with tricks to get around it), but if you are running it on NFTS (which you should be if not exFAT), I can't think of any benefit of having 18,000 data files. Plus if you are running it on FAT32, you are going to have a whole slew of problems (slowness on the DB due to the page/extent size vs the block size of the disk for example). Although I think FAT32 has a max size of 2 TB, so pretty sure you are running it on NTFS or exFAT or some unsupported version of FAT32.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 13, 2017 at 11:43 am
sgt500 - Monday, March 13, 2017 9:52 AMyes, dfs and backup file are on the same disk, like said before, this is a test server, but restore/recovery phase ends in about 6-7h, I dnot understand what he is doing in the other 7 hours.
Maybe the number of df cause this behavior
Interestingly enough, there is one reference I found to PostContainerStoreFixups. It states that this step resolves the post-restore configuration of sysaltfiles/sysdbfiles:
When doing a database restore, what actions is the engine taking after it gets past the 100%?
Sue
March 14, 2017 at 7:49 am
Sue_H - Monday, March 13, 2017 11:43 AMsgt500 - Monday, March 13, 2017 9:52 AMyes, dfs and backup file are on the same disk, like said before, this is a test server, but restore/recovery phase ends in about 6-7h, I dnot understand what he is doing in the other 7 hours.
Maybe the number of df cause this behaviorInterestingly enough, there is one reference I found to PostContainerStoreFixups. It states that this step resolves the post-restore configuration of sysaltfiles/sysdbfiles:
When doing a database restore, what actions is the engine taking after it gets past the 100%?Sue
Nice!
This confirms the hypothesis the number of datafiles affects the timing of this step. I may have further confirmation restoring anothe dbr of about 8TB , but with fewer datafiles (5K)
This may also explain the lock that I get in this phase by running "select * from sys.sysdatabases" (see 1th post)
I update you in the next few days on the second db result
ty all
March 14, 2017 at 9:34 am
select from sysaltfile blocked too 🙂
session_id,wait_type,sql_text,blocked_by
52 LCK_M_S select * from sys.sysaltfil 55
55 NULL RESTORE DATABASE [dbxxxx] FROM DISK... 0
March 14, 2017 at 10:15 am
sgt500 - Tuesday, March 14, 2017 9:34 AMselect from sysaltfile blocked too 🙂
session_id,wait_type,sql_text,blocked_by
52 LCK_M_S select * from sys.sysaltfil 55
55 NULL RESTORE DATABASE [dbxxxx] FROM DISK... 0
For getting that data, try adding in a WITH (NOLOCK) option on the table. That should get you data from the tables that are locked.
That is:SELECT *
FROM sys.sysaltfil WITH(NOLOCK)
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 14, 2017 at 10:21 am
sgt500 - Tuesday, March 14, 2017 9:34 AMselect from sysaltfile blocked too 🙂
session_id,wait_type,sql_text,blocked_by
52 LCK_M_S select * from sys.sysaltfil 55
55 NULL RESTORE DATABASE [dbxxxx] FROM DISK... 0
Yup...so that does seem to explain the time spent in PostContainerStoreFixups. I'd leave it as it is for now and keep checking the locks on that table. I bet the locks are there for the duration. Did you do the restore with the verbose trace flag again?
Sue
March 14, 2017 at 10:36 am
sgt500 - Wednesday, March 8, 2017 7:57 AMMOVE ....
.... (18K+)
Just so i know i'm not misreading, could you please clarify what is meent by the 18K+
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
March 14, 2017 at 10:53 am
Perry Whittle - Tuesday, March 14, 2017 10:36 AMsgt500 - Wednesday, March 8, 2017 7:57 AMMOVE ....
.... (18K+)Just so i know i'm not misreading, could you please clarify what is meent by the 18K+
I asked that previously. He means repeat for all 18,000 data files! I can't imagine why a system that would need that many data files... his "smaller" system has 5,000 data files... I think our largest is 2.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 14, 2017 at 11:13 am
bmg002 - Tuesday, March 14, 2017 10:53 AMPerry Whittle - Tuesday, March 14, 2017 10:36 AMsgt500 - Wednesday, March 8, 2017 7:57 AMMOVE ....
.... (18K+)Just so i know i'm not misreading, could you please clarify what is meent by the 18K+
I asked that previously. He means repeat for all 18,000 data files! I can't imagine why a system that would need that many data files... his "smaller" system has 5,000 data files... I think our largest is 2.
That's what i thought :w00t:
Dare I ask how many filegroups? :Whistling:
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
March 15, 2017 at 2:20 am
Perry Whittle - Tuesday, March 14, 2017 11:13 AMbmg002 - Tuesday, March 14, 2017 10:53 AMPerry Whittle - Tuesday, March 14, 2017 10:36 AMsgt500 - Wednesday, March 8, 2017 7:57 AMMOVE ....
.... (18K+)Just so i know i'm not misreading, could you please clarify what is meent by the 18K+
I asked that previously. He means repeat for all 18,000 data files! I can't imagine why a system that would need that many data files... his "smaller" system has 5,000 data files... I think our largest is 2.
That's what i thought :w00t:
Dare I ask how many filegroups? :Whistling:
🙂
is the app works by design. It is multitenant. and each tenant creates x FileGroups.
I inherited this situation, there are no real reasons. I asked to change this behavior, but the old continuranno tenants to have all these fg
March 15, 2017 at 4:40 am
sgt500 - Wednesday, March 15, 2017 2:20 AMPerry Whittle - Tuesday, March 14, 2017 11:13 AMbmg002 - Tuesday, March 14, 2017 10:53 AMPerry Whittle - Tuesday, March 14, 2017 10:36 AMsgt500 - Wednesday, March 8, 2017 7:57 AMMOVE ....
.... (18K+)Just so i know i'm not misreading, could you please clarify what is meent by the 18K+
I asked that previously. He means repeat for all 18,000 data files! I can't imagine why a system that would need that many data files... his "smaller" system has 5,000 data files... I think our largest is 2.
That's what i thought :w00t:
Dare I ask how many filegroups? :Whistling:🙂
is the app works by design. It is multitenant. and each tenant creates x FileGroups.
I inherited this situation, there are no real reasons. I asked to change this behavior, but the old continuranno tenants to have all these fg
Expect any restore to take considerable amount of time, such as the timeframe you are seeing.
Are you taking filegroup backups?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
March 15, 2017 at 6:07 am
Perry Whittle - Wednesday, March 15, 2017 4:40 AMsgt500 - Wednesday, March 15, 2017 2:20 AMPerry Whittle - Tuesday, March 14, 2017 11:13 AMbmg002 - Tuesday, March 14, 2017 10:53 AMPerry Whittle - Tuesday, March 14, 2017 10:36 AMsgt500 - Wednesday, March 8, 2017 7:57 AMMOVE ....
.... (18K+)Just so i know i'm not misreading, could you please clarify what is meent by the 18K+
I asked that previously. He means repeat for all 18,000 data files! I can't imagine why a system that would need that many data files... his "smaller" system has 5,000 data files... I think our largest is 2.
That's what i thought :w00t:
Dare I ask how many filegroups? :Whistling:🙂
is the app works by design. It is multitenant. and each tenant creates x FileGroups.
I inherited this situation, there are no real reasons. I asked to change this behavior, but the old continuranno tenants to have all these fgExpect any restore to take considerable amount of time, such as the timeframe you are seeing.
Are you taking filegroup backups?
you're right, these tests have been useful to have proof of this fact.
Now the backup policy is a weekly full and daily diff, before was scheduled only a backup full every two days
March 28, 2017 at 2:26 am
hi,
I would like to give you an update about the restore tests. I restored the smallest one,7TB with 4600 files.
It took a lot less time. the high number of files definitely impacts on the timing.
2017-03-28 09:45:45.210 spid58 RestoreDatabase: Finished
2017-03-28 09:45:45.180 Backup RESTORE DATABASE successfully processed 62086684 pages in 3003.526 seconds (161.494 MB/sec).
2017-03-28 09:45:45.180 spid58 Restore: Done with MSDB maintenance
2017-03-28 09:44:14.340 spid58 Writing backup history records
2017-03-28 09:44:07.600 Backup Database changes were restored. Database: db_xxxxxx2, creation date(time): 2011/12/19(12:47:05), first LSN: 11396148:9851828:16, last LSN: 11396148:10901616:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'X:\xxxxxxx.diff'}). This is an informational message. No user action is required.
2017-03-28 09:44:07.600 spid58 Restore: Writing history records
2017-03-28 09:44:07.350 spid58 Resuming any halted fulltext crawls
2017-03-28 09:44:07.090 Backup Restore is complete on database 'db_xxxxxx2'. The database is now available.
2017-03-28 09:44:07.090 spid58 Restore: Database is restarted
2017-03-28 09:44:06.850 spid58 PostRestoreReplicationFixup for db_xxxxxx2 complete
2017-03-28 09:44:04.660 spid58 PostRestoreReplicationFixup for db_xxxxxx2 starts
2017-03-28 09:44:04.630 spid58 CHECKDB for database 'db_xxxxxx2' finished without errors on 2017-02-18 08:04:33.693 (local time). This is an informational message only; no user action is required.
2017-03-28 09:44:04.200 spid58 PostRestoreContainerFixups: fixups complete
2017-03-28 09:43:50.980 spid58 Recovery completed for database db_xxxxxx2 (database ID 5) in 7 second(s) (analysis 429 ms, redo 1812 ms, undo 4387 ms.) This is an informational message only. No user action is required.
2017-03-28 09:43:50.820 spid58 Recovery is writing a checkpoint in database 'db_xxxxxx2' (5). This is an informational message only. No user action is required.
2017-03-28 09:43:45.490 spid58 Zeroing completed on X:\Data\db_xxxxxx2\db_xxxxxx2_log.ldf
2017-03-28 09:43:45.480 spid58 Zeroing X:\Data\db_xxxxxx2\db_xxxxxx2_log.ldf from page 60514567 to 60515047 (0x736c20e000 to 0x736c5ce000)
2017-03-28 09:43:22.950 spid58 Starting up database 'db_xxxxxx2'.
2017-03-28 09:43:19.540 spid58 Restore: Restarting database for ONLINE
2017-03-28 09:43:19.480 spid58 Restore: Transitioning database to ONLINE
2017-03-28 09:43:19.430 spid58 Restore: Done with fixups
2017-03-28 09:43:18.950 spid58 Rollforward complete on database db_xxxxxx2
2017-03-28 09:43:08.630 spid58 OfflineRollforward: StopLsn/LastLsn(PruId: 0): 0xade434:0xa65870:0x1
2017-03-28 09:42:46.480 spid58 RunOfflineRedo logIter.Init(): LastLsn(PruId: 0): 0xade434:0xa65870:0x1
2017-03-28 09:42:46.480 spid58 RunOfflineRedo logIter.Init(): FirstLsn(PruId: 0): 0xade434:0x9653b4:0x10
2017-03-28 09:42:46.380 spid58 Restore VLF headers completed on database db_xxxxxx2
2017-03-28 09:42:45.020 spid58 Restore VLF headers for 209 headers on database db_xxxxxx2
2017-03-28 09:42:42.240 spid58 Restore-Redo begins on database db_xxxxxx2
2017-03-28 09:42:42.040 spid58 The database 'db_xxxxxx2' is marked RESTORING and is in a state that does not allow recovery to be run.
2017-03-28 09:42:28.130 spid58 Starting up database 'db_xxxxxx2'.
2017-03-28 09:42:25.050 spid58 Restore: Backup set restored
2017-03-28 09:42:24.130 spid58 Restore: Data transfer complete on db_xxxxxx2
2017-03-28 09:42:24.130 spid58 FileHandleCache: 0 files opened. CacheSize: 140
2017-03-28 09:41:45.800 spid58 SetTargetRestoreAge: 0
2017-03-28 09:41:41.050 spid58 Restore: LogZero complete
2017-03-28 09:41:41.040 spid58 Zeroing completed on X:\Data\db_xxxxxx2\db_xxxxxx2_log.ldf
2017-03-28 09:12:06.050 spid58 Restore: Waiting for log zero on db_xxxxxx2
2017-03-28 09:12:06.050 spid58 Completed MSDA in stream 0, thread 6436
2017-03-28 08:52:20.310 spid58 Starting MSDA in stream 0, thread 6436, length 0x764bc00000
2017-03-28 08:52:20.310 spid58 Restore: Transferring data to db_xxxxxx2
2017-03-28 08:52:20.310 spid58 Restore: Restoring backup set
2017-03-28 08:52:20.010 spid58 Restore: Containers are ready
2017-03-28 08:52:19.990 spid58 Zeroing X:\Data\db_xxxxxx2\db_xxxxxx2_log.ldf from page 1 to 91290496 (0x2000 to 0xae1f700000)
2017-03-28 08:51:29.000 spid58 Restore: PreparingContainers
2017-03-28 08:51:24.910 spid58 Restore: BeginRestore (offline) on db_xxxxxx2
2017-03-28 08:51:24.910 spid58 Restore: Planning complete
2017-03-28 08:51:24.910 spid58 Dismounting FullText catalogs
2017-03-28 08:51:12.230 spid58 Restore: Planning begins
2017-03-28 08:51:12.230 spid58 Restore: Backup set is open
2017-03-28 08:51:12.230 spid58 Restore: Configuration section loaded
2017-03-28 08:51:11.850 spid58 SetTargetRestoreAge: 0
2017-03-28 08:51:11.510 spid58 Opening backup set
2017-03-28 08:51:11.460 spid58 X-locking database: db_xxxxxx2
2017-03-28 08:51:11.310 spid58 RestoreDatabase: Database db_xxxxxx2
2017-03-28 08:51:11.310 spid58 DBCC TRACEON 3605, server process ID (SPID) 58. This is an informational message only; no user action is required.
2017-03-28 08:51:11.310 spid58 DBCC TRACEON 3604, server process ID (SPID) 58. This is an informational message only; no user action is required.
2017-03-28 08:51:11.310 spid58 DBCC TRACEON 3014, server process ID (SPID) 58. This is an informational message only; no user action is required.
2017-03-28 08:51:11.310 spid58 DBCC TRACEON 3004, server process ID (SPID) 58. This is an informational message only; no user action is required.
(this is differential restore log only)
this confirms what previously said about number of files,
ty all
bye bye
Viewing 15 posts - 16 through 30 (of 32 total)
You must be logged in to reply to this topic. Login to reply