November 21, 2022 at 5:22 pm
We have a vendor application that is now out of our support. We need to extract the data from this source and move it to Snowflake via Fivetran (ETL Tool). Fivetran needs Change Tracking enabled on the source so that it can pick up only the changes since the last cycle. The issue we are facing is there are no PKs on the source.
That vendor database has a clustered indexes (not unique not designated as PK) on some of the tables and when I check they are unique. So I can make them PKs. But there are some tables that don't have a unique index nor a clustered key, forget about PK :-(. One option is for me to see all the indexes on the table if any are unique.
The other option I was thinking was to add a Identity column to all the tables and make that the PK. But if they wrote the SQL the following way then we have an issue.
declare @mytest table (col1 int identity, col2 varchar(10))
declare @mytest2 table (rowid int identity, name varchar(10))
insert into @mytest select * from @mytest2
I am lost. Any ideas or thoughts? I greatly appreciate your help and time.
November 22, 2022 at 1:44 pm
I'm not sure I understand the example that's giving you problems. Yeah, if the column names are different, you have to explicitly state them on an INSERT ... SELECT. That's how T-SQL works. No big deal. Further, if you're migrating from @mytest to @mytest2, why even bother giving @mytest an IDENTITY. Inserting into @mytest2 will generate the identity values, no need to do it twice.
And yeah, if you need a PK, you'll have to create one. No choice there.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 22, 2022 at 2:46 pm
Thanks very much for your time Grant. The reason why I had Identity on the second table also is because, if I add a surrogate key column to all the tables to make that column as PK and an INSERT from table1 into Table2 (with lot more joins of course) and the SQL is written as INSERT INTO table1 SELECT * from Table2, then we are in trouble.
Unfortunately, as this is a Vendor application, we don't have visibility to the Code. I started a trace on the DB to see what kind of statements come in.
BTW, I forgot to mention, the size of the DB is 1.5 GB if that gives any other ideas.
Thanks again.
November 22, 2022 at 2:53 pm
May be this is a better example of the approach to add a Identity column to all the tables to make them the PKs. Just thinking through.
INSERT INTO Table 1
SELECT B.*, C.Col1
FROM Table2 AS B
INNER JOIN Table3 AS C
ON B.<some column> = C.<some column>
WHERE <some condition>
November 22, 2022 at 5:00 pm
The "application" is out of support and yet the database is separate and updatable? If at all possible I wouldn't touch a single thing. If something breaks... better to treat the whole thing like a static artifact. Companies like Red Gate (and their competitors) have "data diff" products which compare tables and generate change scripts. These tools work best with unique integer primary keys in all tables. However, afaik they're alleged to work without that too. These tools have CLI's which means scripts are automatable. Keep staging tables and run the comparisons on a powerful vm or pc.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 22, 2022 at 5:14 pm
Thanks Steve. Let me explore the diff tools option. Issue is that Fivetran (ETL tool) goes after SQL Server CT system tables to get the changes. I am sure there must be another way for them to get to the changes, but may be we are unable to find the right technical resource, I guess. Will keep you all posted.
Thanks for your time Steve
November 22, 2022 at 6:09 pm
However, afaik they're alleged to work without that too.
For Redgate Compare, you don't have to have a primary key. You do have to have a column, or set of columns, that uniquely identify the row. It won't work without that.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 22, 2022 at 10:39 pm
BTW and just to be sure, if you're enabling change tracking for some legal reason, it won't hold up in court.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2022 at 8:51 pm
I'm thinking that with only 1.5 GB of data, you could create a series of triggers that insert or update the data into a second database that has primary keys and change tracking defined. You could then run the ETL tool against the second database.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
December 6, 2022 at 6:32 am
The "application" is out of support and yet the database is separate and updatable? If at all possible I wouldn't touch a single thing. If something breaks... better to treat the whole thing like a static artifact. Companies like Red Gate (and their competitors) have "data diff" products which compare tables and generate change scripts. These tools work best with unique integer primary keys in all tables. However, afaik they're alleged to work without that too. These tools have CLI's which means scripts are automatable. Keep staging tables and run the comparisons on a powerful vm or pc.
I appreciate your time, Steve.
December 11, 2022 at 5:50 pm
Thanks Michael. The issue is I don't want to touch the existing vendor DB. It is anyways out of support but it still works and we don't want to change anything on the primary DB.
Thanks for your attention and time.
December 11, 2022 at 5:59 pm
Jeff- This is not for legal reasons. We have multiple sites (manufacturing) and we are trying to build an Enterprise platform to be able to do analytics across all the sites. We use an ETL tool (Fivetran) to pick up the data from the sites, land it into a landing zone in Snowflake and then transform the data into a dimensional model. We use Tableau for dashboarding.
The issue we have with one of the sites is that Fivetran (our ETL/ELT tool) needs either CT or CDC to pick up the changes from the source. And the source (a vendor DB) does not have any PKs defined. We do not want to change anything on the vendor DB. I know I am asking to eat my cake and still have it. But that is the predicament I am in 🙁
Thanks again for your time
December 11, 2022 at 7:57 pm
taking in consideration that the database is tiny doing as Jonathan mentioned is likely your best choice. - or alternatively just to an initial ETL from that main db onto another one (compare using haskeys and the combination of the unique fields you identified and build the tables used for ETL with a properly defined PK. - this would prevent the need to add triggers (which may not be supported by your vendor and which could potentially have "issues" with the vendor app)
December 11, 2022 at 9:23 pm
Thanks Michael. The issue is I don't want to touch the existing vendor DB. It is anyways out of support but it still works and we don't want to change anything on the primary DB.
Thanks for your attention and time.
??? In your original post, you were talking about changing a whole bunch of stuff like Clustered Indexes, PKs, adding Identity columns, etc. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
December 12, 2022 at 12:42 pm
Jeff- I agree. I was too hasty in my initial thought process. Change is inevitable. Change of thought for better is not 🙂
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply