November 13, 2009 at 7:58 am
I've been asked for a solution that will allow two separate data centers to keep two separate copies of a database (or many databases), both of which can be written to and read from at the same time.
These physically separate sites would also server as disaster recovery and high availability.
I work for a large international retail chain. Writes to this database will come from many sources (Point of Sale, Web, Mobile, Kiosk, Support Center, etc). This database records transactions real time and with around 8k POS's, there will obviously be high traffic. Mostly inserts of transactions but also a whole lot of customer lookups and profile changes. The idea is that there will be an appliance that routes all traffic to the "best" server in terms of latency and resources at the time. For this reason I need the entire database in both locations since any transaction can go to either location.
I'm not convinced that keeping multiple input points for this database is the best solution, but I want to explore all options.
I'm going to test merge replication, but I've not used it before and I'm concerned with the performance. Since there are so many touch points, the same record will likely get update at both locations at the same time and cause a conflict. That is why I'm going to test merge replication instead of peer-to-peer or transaction replication. I'll need the conflict resolution.
PK's are all int or bigints, so a GUID would need to be added for merge replication, which I'm not terribly fond of either. The database is about 150gb now and will grow quickly next year to probably 500g or so.
The development team has also threatened to handle it on the application side, making sure both database are written to. I don't want that to happen as synchronous database writes seems to just be asking for trouble. There has also been talk of using Biztalk to handle the transaction. I don't have any experience there, have only witness a poor application of it in another part of the company.
Can anybody contribute some ideas or best practices? Maybe someone has the same setup? I would prefer to write to one database and replicate/mirror to the second location, personally.
Thanks for any input,
Brett
November 13, 2009 at 11:04 am
I've been asked for a solution that will allow two separate data centers to keep two separate copies of a database (or many databases), both of which can be written to and read from at the same time.
Ouch..
These physically separate sites would also server as disaster recovery and high availability.
HA is good.
there will obviously be high traffic. Mostly inserts of transactions but also a whole lot of customer lookups and profile changes.
Lots of concurrency
The idea is that there will be an appliance that routes all traffic to the "best" server in terms of latency and resources at the time. For this reason I need the entire database in both locations since any transaction can go to either location.
This is WHOLLY dependent on the solution, so it shouldn't even be in view at this time..
I'm not convinced that keeping multiple input points for this database is the best solution, but I want to explore all options.
Neither am I..
I'm going to test merge replication, but I've not used it before and I'm concerned with the performance. Since there are so many touch points, the same record will likely get update at both locations at the same time and cause a conflict. That is why I'm going to test merge replication instead of peer-to-peer or transaction replication. I'll need the conflict resolution.
At first glance I was thinking transactional replication but not so sure now..
PK's are all int or bigints, so a GUID would need to be added for merge replication, which I'm not terribly fond of either. The database is about 150gb now and will grow quickly next year to probably 500g or so.
Ints and Bigints are good, I have a paper somewhere that flat out says that GUIDs as Primary Keys is not much shy of stupid (my words, not there's)..
The development team has also threatened to handle it on the application side, making sure both database are written to. I don't want that to happen as synchronous database writes seems to just be asking for trouble.
By all means let them handle it, explain to them the difficulties and the performance impacts of what they are proposing and that when things go to hell it will be THEM that will get the call.. I think they'll get the point, it is not as easy as it sounds and synchronous writes have issues they haven't even considered..
There has also been talk of using Biztalk to handle the transaction. I don't have any experience there, have only witness a poor application of it in another part of the company.
Biztalk may be an idea.
Can anybody contribute some ideas or best practices? Maybe someone has the same setup? I would prefer to write to one database and replicate/mirror to the second location, personally.
Not a bad thought..
Ok, first, I haven't built something quite this big, just wanted to let you know..
A common model is that data relating to a particular store is help locally and replicated to a master location at an interval, I believe Wal-Mart does it this way. Effectively a store OWNS a peice of data, until another store owns it.
Data that all stores need are replicated to them, they tend to be read only.
It is usually a bad idea without an incredible good business case to have two different locations be able to write against the same data.
Also, if you have the need to do a lot of lookups you can have your main site do something like log shipping to your DR site and all of those lookups could occur there..
I was also thinking you might want to segragate off your lookup data from your transactional data especially if you are going to have the volumes you are talking about.
Also, a consideration is that if the main is down you have to redirect to the backup, if the app has to write to one or the other it has to know about both. If there is a local store resource it only needs to know about the local server and the local server handles all communication to the master.
My first thoughts are that either a pair of servers or a master/slave that all end-points write to are both bad ideas, you may have some things that do but based on the store model, you have a lot of resources and data that are specific to that store. That store ends up being a single end-point as far as the central server(s) are concerned, it doesn't matter what is in the store because what is in the store only cares about it's local server.
You have other things that haven't even been discussed yet, like backup and maintenance strategy.
I probably have more to say but I would like to hear a bit more from you..
CEWII
November 13, 2009 at 11:37 am
I'm mostly with Elliot here. I've heard the issues with GUIDs and they can be a problem. The upside is that the client can generate them, which can reduce contention. I might make sure I cluster on some other values if you use these as keys.
I'd also be wary of the write to multiple locations. There are definitely times you'll be out of synch, and that can be a big problem in retail. If you do want separate servers you can write to, I think you'd want to be sure that you segregate out what can write where. You might not need complete copies of all information writeable on both sides.
The other thing to think about is what happens if you lose connectivity? The issues with reconciling data might be huge compared to the downtime. I'd be really careful here about what happens if merge is down for an extended time, say a couple busy hours of transactions on both sides.
You might want to really look at having a single write source, but then replicate transaction ally, as Elliot mentioned. The latency could be low, especially with good hardware and if ALL reads came from the copies, the load on the write server could be low.
November 20, 2009 at 9:06 am
Thanks for all of the feedback! It's been useful in meetings this past week. Once we can convince the non-technical higher ups that a single insert point in the way to go, we should be set. All of us mortals are convinced and on the same page.
November 20, 2009 at 9:32 am
That is good to hear. The reconcilliation part of the multi-insert model is the deal breaker for me.. there is just two much to go wrong..
CEWII
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply