Use Transactional Replication to build a data warehouse

  • Well I guess it is my opinion that replication is not the place for archiving (Datawarehousing) as far as writes to the disk. Yes it would be the same as the production system. Which is exactly the point. short writes to the disk itself can cause the disk to be fragmented causing all the issues that go with fragmentation. Which in a smaller highly transaction database is fine but to pull massive queries that would span large sets of data would have to span all of these. The reading of this data would be better if the data is written in long segments to the disk.

  • JDixon-385826 (2/24/2010)


    Well I guess it is my opinion that replication is not the place for archiving (Datawarehousing) as far as writes to the disk. Yes it would be the same as the production system. Which is exactly the point. short writes to the disk itself can cause the disk to be fragmented causing all the issues that go with fragmentation. Which in a smaller highly transaction database is fine but to pull massive queries that would span large sets of data would have to span all of these. The reading of this data would be better if the data is written in long segments to the disk.

    Ah, now I see what you were getting at. Good point.

    Rob Schripsema
    Propack, Inc.

  • Ram,

    You are right we are not able to regenerate the snapshot, it would blow away the archived data. I guess your question is have they gotten out of sync? Yes,This has happened in the past and could happen with any solution really

    where changes occurred on the publisher and did not make it the subscriber. This can be handled manually using selects between the linked servers as most of our tables have auto incrementing ID's or some value to key on and yes this can be painstaking or

    you do like I did and purchase one of the tools that do data compares. I have found a very low priced one that works fine for me that will report back what data is missing at the subscriber and even generates the script to resync the changes. Of course you have to be careful with the changes but works great otherwise. I have not had to do anything like that though in several years now (knock on wood)...with the improvements in 2005 using stored procedures or system tables, I can now see what transactions are in the replication queue that have not been written to subscriber and respond accordingly when there is trouble.

    Replication can be a headache as Jdixon said , I agree with that but once you understand what to do and what not to do, you make sure to do the things that need to be done to make sure it does not fail.

    For DR we have daily and transactional backups on both reporting and production in a clustered environment and then we also use log shipping or disk mirroring to DR servers at another site and have tested failover to it.

    I am certainly not saying this is the best solution but it is a working solution.

    Jdixon,

    I am curious to hear your solution as well, so you write a SSIS package that pulls over to a DW by using transfer objects or insert/select or how?

    How are you able to do this on a Real time schedule and how do you pull over incremental changes?

    I understand what you mean about the one long write as opposed to short writes but my server engineers have always assured me since we were on a SAN that defrag would not be needed.

    However about once a year, I have still shut everything down and defragment the drives of course they show to be fragmented to all heck but then on the other hand, I have done I/O tests

    and I/O checks out as good as it marketed to be for that SAN and drive speeds prior to defragging. It is my understanding that Sql Server handles the fragmentation of the data at the data file level?

    The articles I have read about whether fragmenting drives when using Sql server is neccessary or not have all been unclear as to whether it is or is not needed some say yes, some say no, I would like to hear a straight answer on that myself.

  • Thanks Mike for the detailed explanation. Definitely this appears to be an option for us. While to talking some of my colleagues we came up with another option as well.

    For each publisher table creating two subscriber tables and connecting them by views. For example

    For example:

    Move data from TableA -----> TableA_History

    Establish Transactional Replication from TableA --------> TableA_Prod

    Create View TableA in subscriber as

    Select * from TableA_Prod

    union

    Select * from TableA_Hist

    This way we need not worry about regenerating the snapshot . Create an archival process to move data from production to history tables. Not sure by using views there will be any performance complexities.

    I would truly appreciate everybody's thoughts...

    Thanks,

    Ram

  • I am thinking you are saying that you are moving tableAto a history table and then replicating both tables to the subscriber? The only problem I forsee with that is the data requirements. I might have one table that consumes lets say 30GB.. the reporting table with 5 years of data is 150Gb..

    You are going to double the amount of storage (and quadruple the amount if accounting for DR)needed on both sides to hold both sets of data.. (hope that make sense). If you have the storage for it I agree sounds like it adds a additional safety measure that would be good.

  • Hi Mike,

    See below:

    1) Move the historical data from

    Production..TABLE_A -> Move Historical Data to -> Historical..TABLE_A_HISTORY

    2) Then establish Replication

    Production..TABLE_A -> Replicate -> Historical..TABLE_A_PROD

    You can create a View Named Historical..TABLE_A which combines data from Historical..TABLE_A_HISTORY and Historical..TABLE_A_PROD for reporting purposes.

    Since the historical data is removed from production there won't be problem of storage. In case of issues if we rebuild the snapshot it will affect only TABLE_A_PROD. Does this sound right?

    Thanks,

    Ram

  • Ah yes understand what you mean now. sounds like a good solution though some food for thought..

    In my example we are using the same table names on both sides for archived data and not moving into a history_tablename. Lets say your reporting cluster dies for some obscure reason,cluster service dies or replication is broke for some large amount of time, windows patches broke the cluster(has happened to me)

    I have had on occasion to simply change the xml file for the reports to point to the production system as a data source and have it be used as "reporting" (as really the business initially only cares about the last 30 days information unless auditing or researching data) while the reporting instance was fixed. After the system is fixed simply changed the xml to point back to the Actual reporting environment. But in your scenario, since your historical data is in a different named table this would not work for you without reports being modified to new table names or moving over to a DR box and then moving back. Although DR is easy to fail over to.. it takes a little longer time to move back as you basically have to setup the process in the reverse order and the initial setup process can be time consuming. Certainly not a show stopper and there are other possible solutions depending on how your reporting on the data but Let me know how things work out for you though, always like to hear actual real world solutions from others.

    Thanks,

    Mike

  • 1. I set up "Do Not Replication DELETE Statements" option under the Statement Delivery section of Article properties for a particular table

    2.Set up "Keep Existing object unchanged" under Destination Object properties

    3. Modified Distributor Agent Profile to skip error 2627.

    But if I add a new column into the particular table, say "TestColumn" and then reinilitialize the subscription, during Replication I am getting error saying "Invalid Column TestColumn".

    Can anyone suggest anything on this?

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply