August 9, 2017 at 10:33 am
I have an active CDC enabled SQL Server database with 4 tracked tables that I need to load into Amazon Redshift.
To do this I’m using the native SSIS CDC components to create an initial load, followed by a daily incremental load.
I think I understand the basic mechanics – for the initial load, because this is an active database, I need to mark the initial load start, add a data flow to do the actual load (in this case will be a text file that gets pushed to Amazon S3), then mark the initial load end, so that I can capture any changes that may have occurred during the initial load period.
However, what happens when the source database is read-only? In this case it is a readable secondary replica. I’m unable to mark the start of the initial load as I’m getting an error:
[CDC Control Task] Error: "Failed to update database because the database is read-only.".
However I still need to capture any changes that may occur during the initial load. Do I now need to capture the start and end LSNs manually? Can I use sys.fn_cdc_get_max_lsn() to get the LSN values before and after the initial load and build the state variable manually? Which leads me to my next question – I’d like to use the CDC components to do the subsequent incremental loads, which does seem to work, so assuming I can build the state variable, would it be this variable that is used by the CDC Source component as the starting point for the first incremental load?
Any advice would be appreciated.
Thanks
September 19, 2017 at 9:57 am
Can you write the start and end of the load to another table?
You can capture the LSN values manually but my concern is that you may miss data that way.
for example
You run CDC process
Data inserted to table - this data would be lost
You find MAX LSN number to use as beginning of next load
September 21, 2017 at 8:20 am
bradwilliams51 - Tuesday, September 19, 2017 9:57 AMCan you write the start and end of the load to another table?You can capture the LSN values manually but my concern is that you may miss data that way.
for example
You run CDC process
Data inserted to table - this data would be lost
You find MAX LSN number to use as beginning of next load
Thanks for your reply.
I'm assuming that because of the read-only error I am getting in the CDC control task, I won't be able to use this at all for the initial load (which I don't quite understand, as I thought this component just calls the sys.fn_cdc_get_max_lsn function? So what is it attempting to write?), and so I will have to capture the LSN values before and after the initial load in another table.
So assuming initial load is done and I have the two LSN values (start / end) stored in a table, when it comes to the incremental load, if I want to use the CDC components, the CDC source component requires the State variable, which (a) will be empty, as I haven't used the CDC Control Task to capture it, and (b) needs to be in a specific format, e.g. TFSTART/CS/0x0000030D000000AE0003/CE/0x0000159D1E0F01000000/TS/2011-08-09T05:30:43.9344900/
Is it possible to manually build the State variable in the required format, so that the source component knows from where to start? And presumably the first load would need to cover the LSN range that was captured during the initial load?
September 25, 2017 at 9:16 am
I think have a working solution now.
I've built a table to capture the lsn. In my initial load package, the first thing it does is to construct the CDC_state value from the lsn and write it to the table:
state = 'TFEND/CS/' + CONVERT(VARCHAR, sys.fn_sqlvarbasetostr(sys.fn_cdc_get_max_lsn())) + '/TS/' + CAST(CAST(SYSUTCDATETIME() AS DATE) AS VARCHAR) + 'T' + CAST(CAST(SYSUTCDATETIME() AS TIME(7)) AS VARCHAR) + '/'
So 0x0000035a0003f83a0004 becomes TFEND/CS/0x0000035a0003f83a0004/TS/2017-09-25T14:21:34.2973152/
It also captures the end lsn after the initial load is complete, although I'm not sure that this is really necessary.
When it comes to the incremental load, I'm just adding an extra step to assign the state value from the table to the CDC_state variable in the CDC control task (get processing range). Once the control task has executed the CDC_State variable then becomes:
TFSTART/CS/0x0000035A0003F83A0004/CE/0x0000035A0008CA0F0003/TS/2017-09-25T15:11:49.3382000/
Once the incremental load is complete, and the processing range has been marked, the CDC_State variable becomes:
{TFEND/CS/0x0000035A0008CA0F0003/TS/2017-09-25T15:12:58.9861921/}
Which then gets assigned back to the state field in the table ready for the next incremental load.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply