Big/huge Database restore taking lot of time

  • Hi

    We have 500 GB database backup file which while restoring it shows that it would take around more then 48+ hours to complete restore .Database size might be around 4 TB . This server on which we are restoring is basically standby server where most of database are in standby/readonly state . What metrics should we capture to iron out issues .I mean from perfmon like avg disk sec/write(read or transform)  or avg disk queue length .Any oyther parameter to check which can point if disk storage system is having issue .What would be there ideal threshold values of this parameters.

    Even of restoring log backup files of size around 5 to 15 gb it takes more than 2-3 hours

    Is there any other through which we can reduce restoration time as we have less windows time for the same

    Regards

    Anoop

     

     

     

     

     

     

     

  • More, bigger, faster hardware. That's the deal. No other magic run faster switches for a restore operation... although in 2022 the recovery aspect of a restore is radically improved, so, an upgrade may help.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I partially disagree - splitting the backup into 4 or 8 files, and changing restore BUFFERCOUNT and MAXTRANSFERSIZE do affect restore performance.

    but one thing that hugely affects restore is the number of VLF' on the log file - this one, if too big, is a major performance hit on restore.

    to the OP - can you give us the output of the following. if number is high the best advise is to do a once off shrink of the log

    select [name]
    , s.database_id
    , count(l.database_id) as 'VLF Count'
    , sum(vlf_size_mb) as 'VLF Size (MB)'
    , sum(cast(vlf_active as int)) as 'Active VLF'
    , sum(vlf_active * vlf_size_mb) as 'Active VLF Size (MB)'
    , count(l.database_id) - sum(cast(vlf_active as int)) as 'In-active VLF'
    , sum(vlf_size_mb) - sum(vlf_active * vlf_size_mb) as 'In-active VLF Size (MB)'
    from sys.databases s
    cross apply sys.dm_db_log_info(s.database_id) l

    where s.name = 'yourdbname' -- CHANGE THIS

    group by [name]
    , s.database_id
    order by 'VLF Count' desc

    and output of this one as well

    -- Find all data files with Percent or 1MB growth
    select d.name as database_name
    , mf.name as file_name
    , mf.physical_name as file_path
    , mf.type_desc as file_type
    , convert(decimal(20, 2), (convert(decimal, mf.size) / 128)) as filesizeMB
    , mf.growth as growth
    , 'Percent' as growth_increment
    from sys.master_files mf
    join sys.databases d
    on mf.database_id = d.database_id
    where is_percent_growth = 1
    and d.name = 'yourdbname' -- CHANGE THIS

    union
    select d.name as database_name
    , mf.name as file_name
    , mf.physical_name as file_path
    , mf.type_desc as file_type
    , convert(decimal(20, 2), (convert(decimal, mf.size) / 128)) as filesizeMB
    , (case
    when mf.growth = 128 then
    1
    end) as growth
    , 'MB' as growth_increment
    from sys.master_files mf
    join sys.databases d
    on mf.database_id = d.database_id
    where is_percent_growth = 0
    and d.name = 'yourdbname' -- CHANGE THIS
    -- and mf.growth = 128
    order by d.name
    , mf.name

     

    • This reply was modified 1 year, 3 months ago by  frederico_fonseca. Reason: change blocksize mention to the correct parameters on restore
    • This reply was modified 1 year, 3 months ago by  frederico_fonseca.
  • As Grant noted, hardware, disk or network. You have something slowing down.

    I would check the throughput you are getting, the peak reads and writes/s on the disks. Likely somewhere you will find hardware that is going slow. Are you reading directly from disk? Is this from some de-dupe tech? Restores from those are sometimes very sllllllloooooooowwwwwwwwww

  • For moving the files faster, use your network backbone.

    500GB is huge... Is it compressed backup?

    =======================================================================

  • This was removed by the editor as SPAM

  • We do nightly restores of two of our primary databases.

    The large one is a 2TB database and has an 80GB log file with a compressed backup size of 500GB.

    The smaller on is a 300GB database and has a 67GB log file with a compress backup size of 70GB.

    It takes almost exactly 1 hour to restore both and the backups are store on a NAS.  I do not set number of buffers or buffer size for the restores.  Unlike during backups, I've found such settings just slows the restores down.

    One of the KEYs to this performance is the NAS must be on the same domain as the server.  When it's not, I get comparatively very slow restores.  Not the 10 of hours like you are, though.

    We've also excluded MDF, LDF, and NDF files from any form of virus checking, which can be a huge problem if you don't.

    We've also been careful to ensure we don't have runaway growth of the number of VLFs in our log files, which could really slow things down.

    Of course, the speed of your network makes a lot of difference.

    Also, many NAS devices actually have a backup battery in them for cache.  If it has reached end of life and can no longer charge, that won't hurt restores but it'll seriously slow down backups.

    The big thing here will be to test the NAS for throughput, test the network for throughput, and check the server disks for thoughput.

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • FWIW: I've read on multiple threads regarding "backup database", providing parameters , BUFFERCOUNT = 17 , MAXTRANSFERSIZE = 1048576  have better results regarding performance of the command in many cases.

    After some tests, I have indeed modified all our backup jobs to use those parameters.

    With regards to your problem restoring databases: "Restore Database" also has these parameters.

    Are you using them?

    Have you tested them ?

    If not, I suggest you give it a try.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I can vouch for the settings that Johan mentions because those are the ones that changed my life for backups years ago.  However, I've found that they actually make restores worse.  YMMV.

    What I'm really concerned about is from the original post where the OP stated...

    Even of restoring log backup files of size around 5 to 15 gb it takes more than 2-3 hours

    That's worse than trying to do a restore from an old USB 1.0 thumb drive and that's going to be either a serious network issue or a storage device issue or both.  Even without any special settings, that should take only seconds to complete.  See my previous post above for more info.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Jones - SSC Editor wrote:

    As Grant noted, hardware, disk or network. You have something slowing down.

    I would check the throughput you are getting, the peak reads and writes/s on the disks. Likely somewhere you will find hardware that is going slow. Are you reading directly from disk? Is this from some de-dupe tech? Restores from those are sometimes very sllllllloooooooowwwwwwwwww

     

    Sorry for asking how to check for throughput of disk the peak reads and writes/s on the disks>i will check and post it here

  • Emperor100 wrote:

    For moving the files faster, use your network backbone.

    500GB is huge... Is it compressed backup?

     

    Most of your backups file are compressed only

  • Jeff Moden wrote:

    We do nightly restores of two of our primary databases.

    The large one is a 2TB database and has an 80GB log file with a compressed backup size of 500GB. The smaller on is a 300GB database and has a 67GB log file with a compress backup size of 70GB.

    It takes almost exactly 1 hour to restore both and the backups are store on a NAS.  I do not set number of buffers or buffer size for the restores.  Unlike during backups, I've found such settings just slows the restores down.

    One of the KEYs to this performance is the NAS must be on the same domain as the server.  When it's not, I get comparatively very slow restores.  Not the 10 of hours like you are, though.

    We've also excluded MDF, LDF, and NDF files from any form of virus checking, which can be a huge problem if you don't.

    We've also been careful to ensure we don't have runaway growth of the number of VLFs in our log files, which could really slow things down.

    Of course, the speed of your network makes a lot of difference.

    Also, many NAS devices actually have a backup battery in them for cache.  If it has reached end of life and can no longer charge, that won't hurt restores but it'll seriously slow down backups.

    The big thing here will be to test the NAS for throughput, test the network for throughput, and check the server disks for thoughput.

    Hi jepp let me know  how to test  server disk throuput and network throughput .Between i think all database files are exempted from antivirus scanning so they are issue i guess ..

    Ley me  given update that this server act as subscriber to one of database and  there are around 30+ database which are in log shipping secondary .siez of this database in log shipping varies from small MBs to large TB sometime around 4 -5 ....

    This log shipping database comes from different primaries

     

     

     

     

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply