April 20, 2015 at 9:20 am
We are using SQL Server 2008 Standard Ed. Every night we update our master databases with information gathered from our clients databases. We do not copy the entire database, rather just a set of tables. These databases are not your typical OLTP databases. Our clients all have db's stored locally to them. Once a day (at their choosing) they perform a process that ships their data back up to a clone of their client db on our server. We then combine this data in to a "master" db on our side. Aside from that one-time shipping, the databases on our side are not touched.
Currently, we do this using an SSIS package (created long before I joined the company). It is set up to truncate the tables on our master db and then simply pull in everything from each client (for the applicable tables, of course). There are other steps to the process, but out of the scope of this thread. The entire process takes an average of 4 hours for anywhere from 50-100 million records. The time isn't the big issue, rather we are seeing random failures and memory overflows (admittedly, our servers are not as beefy as they should be, but we are stuck with them). The stability of the process is very low and needs to be refactored.
Ideally, this is not what we want to do as we ultimately only care about data changes, which would be nowhere near even 50k records a day (I'd really estimate it to be less than 10k a day). We're moving millions of records for nothing.
So as a result, I altered the SSIS package to only care about delta, using a Lookup and a CHECKSUM on source and destination. The process works, but is proving to be a huge strain. Initial tests show it to be taking much longer than the original truncate/load all, even with proper buffer/cache/index tuning.
My next thought now is to give replication a try. I don't have a lot of experience with it, though, so I don't know if it will be the best tool for the job. We only care about one-way replication. We want data from client A to push up to "main" and we want data from client B to push up to "main", but we do NOT want client A to be updated with the data from client B, or vise versa. It should be a one-way, bottom-up push. Can this be done with replication?
I am open to any and all suggestions!
TIA!
April 20, 2015 at 10:43 am
What's the purpose of the masters? Is the for the purpose of redundancy? Reporting? Or something else?
April 20, 2015 at 11:21 am
I'm still learning the purpose myself (I've only been here for 6 months and this is a new process for me). Our company was kind of a garage-startup many years ago so things were done on limited experience. They created the software to run locally with a locally installed db instead of using web/cloud-based client/server. I don't care much for the way things are set up, but they have hopes of changing it for the better. 😀
Anyway, we bring all their data back up to us for many purposes (mainly for another application to use and also for federal regulations and auditing purposes). The process of getting the data from the client local to our servers is NOT part of what I am looking at replication for.
We then combine all of these client clone db's in to the "master", which is then used for another application (a true web/cloud based solution), data retention, some sort of reporting (I have yet to see that portion, so I don't know if it is true reports, ad-hoc, etc), and other reasons that I have yet to make sense of.
I'm not good at ASCII art, but here's the general look of the setup.
--MY SERVER LAYER--
"MASTER" <-- Web product uses this data
/ | CL1 CL2 CL3
-----------------------
--LOCAL DB LAYER--
-----------------------
| | |
CL1 CL2 CL3 <-- Original product uses this data
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply