Does Change Data Capture require a table have a primary key

Question

Does Change Data Capture require a table have a primary key

drew.georgopulos

SSCarpal Tunnel

Points: 4931
More actions
January 13, 2013 at 7:21 pm

#266976

or can it record before and after column changes based on the LSN only?
An extract from a file based legacy accounting system is performed every night. The system does not have a primary key because transactions are managed through program code. (the more things change...). The extract is copied to text in Unix and FTP'd to Windows, where the file is loaded into SQL Server by kill & fill. Because of the expense of modifying the source system, there is enormous inertia/resistance to injecting a primary key at the source, so kill & fill it stays.
In reading about Change Data Capture, it seemed to me that column level insert update and delete are stored in helper tables that remember the before and after content of each column tracked. In my reading I have seen many references to the LSN to decide when and what to record as changed, but I have not seen any refereference to the necessity of a primary key for Change Data Capture to work. This is in contrast to replication, where the requirement for the existence of a primary key is made plain.
Is it possible to use Change Data Capture against a table without a primary key?
I would love to use it to change the extract from kill and fill to incremental.
thanks very much for your help
drew

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 1

To my knowledge the inbuilt CDC component in SQL server doesnt require a Primary Key column.

The reason is this is simply a log of the activity, so if you insert, update then delete a record you will get 4 rows in the CDC table, the Original Insert, the Update Before, the Update After and finally the Delete.

From a data gathering perspecive you also have the opyion of either retrieving all the different changes or just the last change.

Hope this clarifies things.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

ScottPletcher SSC Guru Points: 101238 More actions · Answer 2

CDC requires a unique index. That makes sense given what it has to be able to do.

I think by default it uses the PK as the unique index, but you can specify a different unique index when you enable CDC on the table and SQL will use that instead of the PK. So a PK should not be required, although a unique index will be.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 3

Scott,

Do you have a link to that, as I didnt think that either a unique or PK were required for CDC as I cant find anything on Technet or in the CDC documentation that states one of these must be present.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

A unique index is required if the table is to support net changes as well as all changes

http://msdn.microsoft.com/en-us/library/cc627369%28v=sql.105%29.aspx

A capture instance will always include a table valued function for returning all change table entries that occurred within a defined interval. This function is named by appending the capture instance name to "cdc.fn_cdc_get_all_changes_".
If the parameter @supports_net_changes is set to 1, a net changes function is also generated for the capture instance. This function returns only one change for each distinct row changed in the interval specified in the call.
To support net changes queries, the source table must have a primary key or unique index to uniquely identify rows. If a unique index is used, the name of the index must be specified using the @index_name parameter. The columns defined in the primary key or unique index must be included in the list of source columns to be captured.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

drew.georgopulos SSCarpal Tunnel Points: 4931 More actions · Answer 5

First, thank you Jason, Scott and Gail for your attention, time and advice.

Second, NUTS! the requirement of a pk or unique index torpedo's my intention, as the source does not create uniqueness and attempts to create the minimum requirement, a unique index, consistently fail on the existence of duplicates.

The stewards of the accounting system (Finance and Operations) assert that it is not possible to have either constraint on the table in the accounting system.

When we get the source data on the SQL side, trying to uniqueify the table using the most important columns (claim number, date of service, date submitted, member, provider, diagnosis, procedure) consistently fails on duplicates because a provider can decide to rebill for the same exact event. Adjustments and reversals also complicate the uniqueification.

At the risk of the lash, I could try to create a 'covering' index to uniqueify the table, but the table is twelve million rows deep and 138 columns wide.

Third, I must be dumber than dirt because in tinkering with CDC I came to realize that the kill and fill method leaves me no where...CDC does not permit truncation and deletion indeed marks the target rows as deleted, but that means the source data has nothing in the target to compare it to and that the new source data is all there is!

This makes me to think we must use two tables for source and target, target holding yesterday's extract, and source holding today's, but since there is no key and no index we'd need to compare each source row and column to each target row and column with something like a Merge When Not Matched, Outer Join or Not In but against 138 columns?...sounds like a non-starter.

I do not want to give up replacing kill and fill with something more rational, but my pea brain is not big enough to wrap itself around the problem.

thanks again

drew

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 6

add a new column int identity and add the key on it , or create a checksum value for each roe

Jayanth Kurup[/url]

drew.georgopulos SSCarpal Tunnel Points: 4931 More actions · Answer 7

My initial reaction is while that would create uniqueness based on the id, the content of the row is still duplicate.

I will try it out and see if that gets me home.

thanks very much

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 8

drew.georgopulos (1/15/2013)
Second, NUTS! the requirement of a pk or unique index torpedo's my intention, as the source does not create uniqueness and attempts to create the minimum requirement, a unique index, consistently fail on the existence of duplicates.

Maybe read what I said again.

A unique index is needed if you want to enable the table for the net change function as well as the all changes function. If all you want is all changes, then the requirement for a unique index (for the net changes) falls away

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

drew.georgopulos SSCarpal Tunnel Points: 4931 More actions · Answer 9

i missed that...sorry to have been so dense.

i'll try again at work

thanks very much

dtquach SSC Rookie Points: 35 More actions · Answer 10

Hello,

I'm looking use CDC in my ETL process and want to make sure I understand things correctly. So CDC does not need a Primary Key or Unique Key to identify changed data as my source do not have Primary keys or Unique keys nor can I add them.

Please advise, thank you.

Data

inspi Newbie Points: 9 More actions · Answer 11

so if the source table doesn't have either pk or unique index.. what will be syntax to enable the table?

is it

USE Db

GO

EXEC sys.sp_cdc_enable_table

@source_schema = N'DBO',

@source_name = N'tbl_Name',

@role_name = NULL;

GO

Or we need to specify some value for @supports_net_changes

USE Db

GO

EXEC sys.sp_cdc_enable_table

@source_schema = N'DBO',

@source_name = N'tbl_Name',

@role_name = NULL

@supports_net_changes = ??;

GO

Am attempting CDC for the first time.. please point out if am missing something.

TIA