db design - daily and report db question + use of replication for this aim ?

  • Hi,

    First post so please go easy.

    We have a large database which collects data 1000s of nodes (Node Type 1).

    The current data count = 130 million records. We are currently inserting

    records into a single database at 10 data records per second. This year we

    are to launch a new type of node (Type 2) which uses a different front end

    application and will be used by a number of different customers across

    Europe and the US.

    We cannot decide wheather we should deploy a new DB for Type 2 i.e

    Database 1) a central repository of all types and firmwares (estimated size = 1 to 10 GB)

    Database 2) Type 1 data (estimated size = 500 Gb to 2 TB)

    Database 3) Type 2 data (estimated size = 10 to 100 GB)

    OR

    Database 1) a central repository of all types and firmwares and keep all the

    data for both types in the same database (estimated size = 600 Gb - 2 TB)

    The types do share commonality, they send data, they have firmware, they

    have contacts and customers so table structure looks the same.

    1. Im worried that if we keep the data in one database and something goes

    wrong both products go down. But a single DB creates less matainace. For

    example if a big single DB goes into recovery mode it could be down for

    30mins, a central repository DB would be down for seconds and its data db

    maybe an hour but we can still operate / web app still works. Does anyone

    have a view or experience of this type of design problem?

    2. We are also looking into SQL replication to keep a small db "a central

    repository of all types and firmwares" including daily data. We would then

    have another DB with historic data for reporting. Can replication allow one

    DB to contain a small amount of data but "top up" the reports DB with

    additonal data (bearing in mind the small db would need a DELETE routine to

    remove data more than a day old) ?

    Thanks for any help.

    Scott

    (2005 Standard Edition 64bit Build 3790 SP2 in an Active/Passive

    cluster setup)

  • Planning this kind of thing isn't an easy task.

    Rather than worrying about a single database being a single-point-of-failure (where if it goes down, the whole thing is down), you need to look into things like log shipping, mirroring, etc. That's where you get your real high availability.

    If the data structures are the same, I'd be inclined towards keeping the data in the same database. But that's my first instinct. I'd have to know a lot more about the business requirements, budget, etc., before I made that decision.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • thank you for the reply.

    One additional point. The database needs to be highly available otherwise it creates a bottleneck for our nodes. The bigger the DB we are finding the slower the response to our nodes. Also large customer reports can effect our response times to our nodes.

    By splitting the database into DAILY DB (for quick response times to nodes and quick recovery times so response times are not down for long) ...

    ... and REPORTS DB for large customer querys that will not effect our daily running response times we could solve this problem. The trouble is figuring out which technolgy to use to achive this aim.

    For example we have a DATA table that has 10000 rows added a minute. We cannot tell our nodes to stop sending until we can confirm insertion to table. If a customer runs a large report then this insert is slowed down. By taking archived data (data more than 1 day old) to a reports DB this problem is solved. Could use a REPORTS table but then DATABASE recovery time is longer.

    Thanks again

    Scott

  • GSquared (2/2/2009)


    Planning this kind of thing isn't an easy task.

    Rather than worrying about a single database being a single-point-of-failure (where if it goes down, the whole thing is down), you need to look into things like log shipping, mirroring, etc. That's where you get your real high availability.

    quote]

    I dont know much about LOG SHIPPING , can you explain please ?

    Regarding mirroring. The A/P cluster is great for SQL availability. Would MIRRORING solve the problem if the active mirror goes into recovery mode ? i.e the passive mirror would pickup the slack ? i.e whats to stop this DB jumping into recovery mode too (say if the disk storage array goes down).

    Thanks

    Scott

  • It can be a good thing to split data into current and older tables. I've used that idea a few times very successfully.

    Log shipping is a way of keeping two databases synched up, with a delay. Basically, you have two servers, and one sends its log files to the other, which then applies the same changes. This means the second server is usually only a few minutes behind the main server, and can be brought fully up-to-date if a final log is sent to it. That means you might have a few minutes downtime if the primary server goes down, but not much more than that. (There's more to it, and this is a bit of an over-simplification, but it's the basic idea.)

    In mirroring, no, you can't guarantee that both servers won't go down at the same time. It's highly unlikely, but it's not impossible. If you want a 100% guarantee that your database will always be available, no matter what, you need to operate in a universe other than this one. Mirroring is a way to get really close to guaranteed uptime, but nothing will get you to 100%.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • thanks again for the reply.

    So whats the difference between LOG SHIPPING and TRANSACTIONAL REPLICATION then if both use the transaction log to keep the second database up to date ?

    Both appear very simaler to me.

    Thank you.

    Scott

  • scott_lotus (2/3/2009)


    thanks again for the reply.

    So whats the difference between LOG SHIPPING and TRANSACTIONAL REPLICATION then if both use the transaction log to keep the second database up to date ?

    Both appear very simaler to me.

    Log shipping use transaction log backups which are then restored to the stand by server. The stand-by database is read-only and when you're primary database crashes you will loose all the data since the last backup.

    Transactional replication reads the transaction from the log and applies the changes either continously or on a schedule to the secondary server. The subscriber database is read/write although updating the subscriber database can create new chanllenges. If you synchronisecontinously the dataloss should be very limited, at least if you use a remote distributor.

    I know this is a very limited comparison of the both. For more look here:

    http://www.sqlservercentral.com/articles/Replication/logshippingvsreplication/1399/ and

    here http://msdn.microsoft.com/en-us/library/ms151224.aspx

    [font="Verdana"]Markus Bohse[/font]

  • Thanks for the reply.

    If the stand by server is read only then I dont think its an option for us. Our incoming data needs to read/insert in order to talk back to our nodes as quickly as possible.

    Having a quick read about mirroring:

    "Synchronous mirroring with a witness server that provides the highest availability of the database. A drawback in this type of configuration is the need to log transactions on the mirror before such transactions being committed to the principal server may retard performance.

    "Asynchronous mirroring with a witness server provides high availability and good performance. Transactions are committed to the principal server immediately. This configuration is useful when there is latency or distance between the principal server and the mirror".

    "Synchronous mirroring without the witness server. This guarantees that data on both servers is always concurrent and data integrity is of a very high order. However, automatic failover cannot occur as there are not enough servers to form a quorum decision on which server is to take the role of the principal server and which should be the mirror server."

    i.e think our options would be:

    1. A/P cluster with ArchiveDB and DailyDB using REPLICATION to create reports DB. Daily DB is small and can recover quickly. ArchiveDB get hit for heavy user queries allowing Daily DB be quick and nibble to talk back to our nodes.

    2. A/P cluster with mirroring (not sure which type). I keep my database together , and mirror it for high availability. If it goes into recovery mode hopefully the passive mirror wont do the same thing. I can create a Archive table for large users queries.

    Thanks

  • Quick follow up. I got this from my ISP:

    "I would suggest using SQL 2008 - this gives you a tool called resource governor - this allows you to create groups or specify individual users/stored procedures and limit the amount of resources they can consume - this should stop reports from killing your system and is a much cheaper way than building a new reporting server and running seperate queries.(here you should use dts to extract just the reporting data you need and not copy the whole databawse as that is a lot slower)

    In terms of replication/log shipping /database mirroring - please bear this in mind.....

    mirroring is the most stable, does auto failover and auto corrects itself most of the time if it fails - it even re-syncs once the failed node is fixed.

    log shipping is a manual re-sync after failure and has no auto failover.

    replication again has no auto failover but does not have the recovery problems of mirroring and log shipping. However replication breaks a lot and requires quite a bit of maintenance and manual intervention. I usually only adopt replication as a strategy as a last resort.

    If you do go with replication, I would suggest anything but snapshot - snapshot can leave you with the same restore times as the other two."

    Does anyone agree, disagree with the following in the context of the arguements in this thread ?

    Thank you.

    Scott

  • MarkusB (2/3/2009)


    scott_lotus (2/3/2009)


    thanks again for the reply.

    So whats the difference between LOG SHIPPING and TRANSACTIONAL REPLICATION then if both use the transaction log to keep the second database up to date ?

    Both appear very simaler to me.

    Log shipping use transaction log backups which are then restored to the stand by server. The stand-by database is read-only and when you're primary database crashes you will loose all the data since the last backup.

    Transactional replication reads the transaction from the log and applies the changes either continously or on a schedule to the secondary server. The subscriber database is read/write although updating the subscriber database can create new chanllenges. If you synchronisecontinously the dataloss should be very limited, at least if you use a remote distributor.

    I know this is a very limited comparison of the both. For more look here:

    http://www.sqlservercentral.com/articles/Replication/logshippingvsreplication/1399/ and

    here http://msdn.microsoft.com/en-us/library/ms151224.aspx

    One correction on this:

    If you run a last-backup up to the point of failure, log shipping doesn't lose anything at all. This assumes that the reason for the failure is hard-drive failure on the primary server and that the affected drive(s) are where the data files are kept.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • mirroring does seem sensible to me. but 2 points (basically my inital 2 problems)L

    1. im using an a/p cluster, if one db can do into recovery mode for 30 mins then surly a primary db and its mirror can have the same problem at the same time.

    2. db still same size so reporting still slow , still effect my insert times.

    -------

    A small dailyDB and a large reportsDB solves both problems above .... altough gives app developers more of a headache.

    Even mirroring a single db and having a arvhive table still leave me open to long recovery times ... unless a single table can be recovered later (i.e 90% of data is in single table).

    Scott

  • scott_lotus (2/3/2009)


    mirroring does seem sensible to me. but 2 points (basically my inital 2 problems)L

    1. im using an a/p cluster, if one db can do into recovery mode for 30 mins then surly a primary db and its mirror can have the same problem at the same time.

    2. db still same size so reporting still slow , still effect my insert times.

    -------

    A small dailyDB and a large reportsDB solves both problems above .... altough gives app developers more of a headache.

    Even mirroring a single db and having a arvhive table still leave me open to long recovery times ... unless a single table can be recovered later (i.e 90% of data is in single table).

    Scott

    No, recovering a mirror database is a matter of seconds not minutes.

    About speeding up the reports you can either place the archive tables on a separate filegroup or maybe table partitioning can solve your problem.

    [font="Verdana"]Markus Bohse[/font]

  • Scott,

    You've got quite a problem here and I'd really suggest engaging a consultant that can help you with more detailed designs for your environment.

    You are tackling a few issues here and it's hard to address them all in a short span.

    Mirroring helps with recovery, but it doesn't give you a second reporting instance. Replication is probably the best choice for reporting.

    For speeding things up, writing better SQL, better indexes, etc., can help things work better, but it's more work than just a few quick answers on the forum.

    There has been good advice here, and I'd recommend you read up on log shipping and mirroring, but then enagage a consultant to help you get the final details done.

  • Steve Jones - Editor (2/3/2009)


    Scott,

    You've got quite a problem here and I'd really suggest engaging a consultant that can help you with more detailed designs for your environment.

    You are tackling a few issues here and it's hard to address them all in a short span.

    Mirroring helps with recovery, but it doesn't give you a second reporting instance. Replication is probably the best choice for reporting.

    For speeding things up, writing better SQL, better indexes, etc., can help things work better, but it's more work than just a few quick answers on the forum.

    There has been good advice here, and I'd recommend you read up on log shipping and mirroring, but then enagage a consultant to help you get the final details done.

    Thank you for the reply Steve. I think you are right. Have spent some time with the Microsoft Advisory service but im not sure they undertood or we comunicated the problem well enough, (language barrier doesnt help). They suggested replication. Consulted my ISP and they suggested avoiding replication in favor of 2008 workload govonor. Best course of action for me is to consult them both again i think. If I need to stick with 2005 i may adpot mirroring of normalDB and archiveDB.

    Thank you for taking the time to reply everyone

    Scott

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply