large db recovery times ? split into two databases ?

  • Hi,

    I have a problem with recovery times on a largish db. Most important thing on my system is the steady insertion of incoming data within x seconds. The volume of these insert has swelled the db to 300GBs. When I fail over to a passive node (sql 2005 standard 64bit cluster) the integrity check can take 20 - 50 mins. Hardware spec is reasonable, iSCSI , raid 10, 32Gb ram , dual quad core cpus (working to upgrade the san to fibre channel).

    My questions about splitting up the database to improve recovery times etc. I would like to create a month db and an archive db. The archive db would contain annual tables. The month db would allow quicker queries on that month, quicker maintenance times, quicker recover time (most important to get the db online quickly). I cannot risk a 20 - 50 restore time. (The transition method from db to db and time-scale of data transfer has not been decided but replication and writing our own app or SSIS package are being examined). App obviously has to know where to find data so i does create a development overhead.

    If using SQL enterprise can the same logic above be achieved using file groups ? i'm sure i heard you can recover partial file groups. i.e month = small file group table set, reports = large file groups. Can someone explain ?

    Any other methods of splitting up the data that we should consider ?

    Thanks for any help, i hope this is enough info for an educated guess.

    Scott

  • Integrity check? SQL doesn't do an integrity check on startup. The checkDB messages in the log on startup just report the last time checkDB ran successfully.

    If you're talking about slow restart-recovery (SQL is recovering database X...), you could manually set the recovery interval option to force more frequent checkpoint operations. Also check your IO throughput, most cases I've seen of long recovery times were either excessively long open transactions or poor IO performance (log drive)

    If it is restart-recovery we're talking about, which is taking the time, redo or undo? If it's the undo phase then consider moving to Enterprise Edition if you can. In Enterprise edition, the DB comes online after redo is finished and the undo phase proceeds while the DB is in use.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • we have databases from a few MB to almost 2TB. we had a problem with a database in the 700GB range that would take 2 hours to recover. we ended up creating a new archive database for old data, but i think the schema was the real culprit. a lot of the tables were heaps and since it was BI the processes would drop and recreate indexes. and I think the indexes weren't created to best practices. but since it's developed by another team who's specialty is coding and not DB architecture we never did anything about it

  • REF GilaMonster reply:

    Interesting, the checkpoint idea may help. Still have the problem of re indexing a large table or restoring or transfering to DB to R&D. Much of my 300Gb is in a single table. Im not saying my indexes are perfect but tables of 20million rows have got to be faster than 200million no mater how much you index. Anything i do on this table takes a while. Much of the data is not used by web users but its still required for statistics so i dont want to drop it if possible. I seems to me there is no better argument for creating an archive DB. The idea was to add the archive db to a new SQL server keeping large reports away from the incoming data.

  • SQL Noob (3/9/2010)


    we have databases from a few MB to almost 2TB. we had a problem with a database in the 700GB range that would take 2 hours to recover. we ended up creating a new archive database for old data, but i think the schema was the real culprit. a lot of the tables were heaps and since it was BI the processes would drop and recreate indexes. and I think the indexes weren't created to best practices. but since it's developed by another team who's specialty is coding and not DB architecture we never did anything about it

    How do i know if a table is a heap and if so what do i do about it ?

    Sorry for all the questions.

    Thanks for the reply by the way.

  • our 2TB databases have tables with over a billion rows that recover faster than the 700GB database we ended up splitting up where the tables were only a fraction of that

    forgot the exact definition but i think a heap is a table with no clustered index. at the minimum you want a clustered index on tables over 1000 rows. forgot the minimum size. and then you want a unique index or PK. in most cases if we can't find a suitable key column we add a generic int or bigint identity column to use as a PK.

  • Have you checked the transaction log for a large number of Virtual Log Files by running dbcc loginfo() ? A large number of VLFs will cause slow SQL Server db start-up.

    For reference:

    Kalen Delaney's blog "Exploring the Transaction Log Structure" at http://sqlblog.com/blogs/kalen_delaney/archive/2009/12/21/exploring-the-transaction-log-structure.aspx

    Paul S. Randal's article "Understanding Logging and Recovery in SQL Server" at http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx

    SQL = Scarcely Qualifies as a Language

  • SQL Noob (3/9/2010)


    our 2TB databases have tables with over a billion rows that recover faster than the 700GB database we ended up splitting up where the tables were only a fraction of that

    forgot the exact definition but i think a heap is a table with no clustered index. at the minimum you want a clustered index on tables over 1000 rows. forgot the minimum size. and then you want a unique index or PK. in most cases if we can't find a suitable key column we add a generic int or bigint identity column to use as a PK.

    thanks for the reply. Our tables appear to have a clustered index by default, probably due to the PK i guess.

    can you clarify what you mean on the first point please ? , i.e you split up your 2TB db that had a table with a billions rows into lots of tables ?

    (i need to dash sorry, will catch up on this thread later, thank you for the replies).

  • we have data going back over 10 years. in the 700GB database that had the long recovery time we told our BI people and they split it into 2 databases. one with data 2008 and older and the other with recent data. we went from a 2-3 hour recovery time to a few minutes

  • SQL Noob (3/9/2010)


    we have data going back over 10 years. in the 700GB database that had the long recovery time we told our BI people and they split it into 2 databases. one with data 2008 and older and the other with recent data. we went from a 2-3 hour recovery time to a few minutes

    This is how we also do it in our environment, and it is easy to manage also.

    EnjoY!
  • thanks for the reply. So splitting up the data into to 2 or more databases is a "done" thing then. Can i ask how you both managed the transfer of data between your databases ?

    I have looked at replication but im worried about the manageability of it especially when it fails. My ISP are warning me against this. SSIS packages is another option as is writing our own WCF based service.

  • Carl Federl (3/9/2010)


    Have you checked the transaction log for a large number of Virtual Log Files by running dbcc loginfo() ? A large number of VLFs will cause slow SQL Server db start-up.

    For reference:

    Kalen Delaney's blog "Exploring the Transaction Log Structure" at http://sqlblog.com/blogs/kalen_delaney/archive/2009/12/21/exploring-the-transaction-log-structure.aspx

    Paul S. Randal's article "Understanding Logging and Recovery in SQL Server" at http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx

    thank you for the links, will have a read.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply