July 14, 2014 at 8:34 am
We have a SQL Server 2008 source(compatibility mode 80 ) and we are looking to capture the incremental changes in the source by implementing CDC(Change Data Capture).
Implementing CDC on the source itself would be a risky option due to the additional I/O overhead.
Also many of the tables in the source do not have a primary key and some of them do not have a field which can be used to uniquely identify a record. Since CDC requires at least a column to uniquely identify a record in order to apply the incremental changes, we would require to add constraints or new columns to the existing table definitions to be able to do CDC. Now making these DDL changes at the source wouldn't be a good idea.
So the only wise option from would be to avoid implementing CDC on the source for the above two reasons. Instead we are looking at somehow replicating the source at another database and then implementing CDC on the destination database
I just learnt that any form of replication from primary to secondary database viz., log shipping, backup and restore would need the secondary database to be read-only.
In our case, we may need to add a constraint/identity column for all those tables which do not have any existing unique field. So would definitely need to alter the table definition at the secondary database, to be able to do CDC there.
Is there a way we can do replication whilst still being able to alter the table definition on the secondary database? Any suggestions on this would be greatly appreciated. Thanks
July 14, 2014 at 9:54 am
Here are a couple of somewhat kludgy things you could do, which might be enough, depending on your overall requirements.
The first is set up replication, and then copy the data in to a third database/set of tables. This doubles the storage overhead, but would have no impact on the existing processing. An SSIS package can be set up to run frequently to check for updates, but there may be high overhead without the indexes on the replicated tables.
It is true that replicated databases should stay read only, but you might be able to work around that, depending on your requirements. If you do snapshot replication, the data will not be touched by the replication agent until the next snapshot, and the tables will be truncated at that time. So you could run a job after the subscriber to add the indexes you need, and another one before to delete the indexes prior to the subscriber being run. ETL jobs should generally drop and recreate indexes for efficiency anyway, so no big deal. This will only work if you don't need real time access to the changed data.
The bottom line is that a lot depends on the requirements, and how much the business is willing to spend.
July 14, 2014 at 10:15 am
sujainarayanan (7/14/2014)
So the only wise option from would be to avoid implementing CDC on the source for the above two reasons. Instead we are looking at somehow replicating the source at another database and then implementing CDC on the destination database
The only wise idea would be to add the required Primary Keys ;-). You shouldn't repeat the same mistake that the previous DBAs did over there.
Just make the changes slowly i.e. start with few tables (order by importance) & see how it goes.
sujainarayanan (7/14/2014)
Is there a way we can do replication whilst still being able to alter the table definition on the secondary database? Any suggestions on this would be greatly appreciated. Thanks
Once you have added Primary Key, you can easily use the Transactional Replication & implement CDC on subscriber.
Note: Transactional Replication also requires the tables to have the Primary Key.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply