September 24, 2020 at 12:00 am
Comments posted to this topic are about the item Incremental Data Loading using Azure Data Factory
October 21, 2020 at 4:43 am
Hi Sucharita,
Thank you for this article!
I have some comments though:
Best regards,
René
October 22, 2020 at 8:05 am
Thank you for your feedback.
My response on your questions/remarks:
Then the source data is copied into another type of Corporate / Enterprise (transaction) data model depending on the implemented Data Warehouse methodology
And then the data in the Corporate / Enterprise (transaction) data model is copied into a Corporate / Enterprise information data model (the BUS Matrix of Dimensional Modeling) consisting of Facts and Dimensions -- yes. Once the data is transferred to the destination, many possible activities can be done on the data.
October 28, 2020 at 7:20 pm
Hi Sucharita Das, thanks for the blog I really appreciate this, I have few questions though they are as below:
Appreciate your respone here, thanks again!
October 29, 2020 at 7:03 am
Hi Sucharita Das,
I have a question, how can we do this same thing using an incrementing key instead of timestamp column. Also is it possible to perform delete operation .
Thanks.
November 2, 2020 at 9:57 am
Thank you for your feedback.
You may please refer the article https://www.sqlservercentral.com/articles/incremental-data-loading-through-adf-using-change-tracking.
Let me know for any more question/query on this.
March 14, 2022 at 7:01 pm
Hi Sucharita, many thanks for this extensive and well explained article.
I had a question regarding the strategy with multiple source tables and one target table. What would be you preferred option?
I have this scenario:
- Two delta tables A and B in an Az Data Lake.
- One target table in Synapse Analytics.
- LastModified timestamp in tables A and B.
At the moment, using a data flow since source data is in delta format, I retrieve the MAX LastModified timestamp for table A and table B, and then take the MIN of these two. This is the new watermark column value. I could also get the MAX of the two instead of the MIN, but we may want to reload a failed pipeline and add LastModified timestamps prior to the MAX of the two tables.
The caveat from this design is, you will always grab some data which was already process, and if the update frequency of the two tables (A and B) is completely different, let's say table A is updated daily but table B monthly, then I will reload every day the current month of table A until Table B has a new Last Modified Date.
I look forwards to getting your thoughs on that 🙂
Best regards,
Paul
May 17, 2022 at 2:00 pm
So, how do you scale this solution?
Thanks,
Chris
Learning something new on every visit to SSC. Hoping to pass it on to someone else.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply