Blog Post

Common Issues When Using Change Tracking for Data Warehouse Incremental Loads

,

I have a few clients that incrementally load tables from a SQL Server source into their data warehouse or lakehouse by using change tracking. Lately, they encountered some issues with changes to the configuration and the data in the source database, so I decided to share some things you can check before using change tracking as part of your ETL load or when troubleshooting your data load.

A visual depiction of incremental loads. The first three rows were inserted in one load. The next three rows were inserted in the second load. The last 3 rows were inserted in the third load. The fourth load updated rows from the second adn third
Incremental loads import and apply only changes (inserts, updates, and deletes) rather than all the data in a table.

Common change tracking + incremental load issues to check before implementation:

  1. Understand how your source tables are loaded in the source database. We had tables that showed inserts in change tracking and were supposed to be only additive data. That looked like a good prospect for our incremental loads. But another team was responsible for loading the source tables, and they were actually truncating and reloading the data a couple of times per day. Typically, I write ETL so that duplicate rows cannot be inserted into staging tables. So if the row already existed in a prior load, any changes to the row would be missed. If you run into this scenario, you can either change the way the source table is loaded or choose not to include the table in an incremental load.
  2. Ensure the timing of your data loads in all environments is compatible with the retention period set for change tracking. I had to track down a problem in a test environment where some inserts and updates were missing. It turned out that the test environment was only loaded daily on week nights, but the retention period change tracking was set to the default 2 days. So when the load ran again on Monday, the changes from the end of the work week were no longer available. Something similar could also happen if you had ETL failures for multiple days that prevented loading the tables using change tracking past the retention period. Assuming your database can handle the extra data, you can increase the retention period to cover the maximum amount of time needed by the ETL process.
  3. Coordinate with the dev ops team to understand if/when tables are being dropped and recreated. Ideally, a deployment process would not drop a table, but sometimes we don’t get to control that. After initially implementing change tracking at one client, everything was looking good. But then we started to get errors that a table did not have change tracking enabled. We knew that table had change tracking enabled previously because our logging showed that the incremental load ran fine. After some investigation, it turned out that the developers had deployed database changes, and that deployment process dropped and recreated the table but did not re-enable change tracking. If this happens, you would need to re-enable change tracking and do a full reload of the table to ensure you didn’t miss any changes before the table was dropped. Then you can re-enable your incremental load. The change tracking version number is used across the database and is not specific to a table, so you can keep the previous change tracking version as the high water mark.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating