November 3, 2019 at 12:47 am
So, I have a massive undertaking, at least I believe it will be. I’ve asked a similar question in a different forum, however, this topic is similar but different.
I have 4 databases that are similar in schema but yet some differences such as Schema X in Database A, however, not in Databases B,C and D. These 4 DB’s need ETL performed on a regular basis as these are production databases and schemas change a couple times a month. So, I need to figure out a way to perform ETL on these databases with feedback to admins.
Requirements: Must remain production ready at all times, meaning the customer will be using while development is happening due to various reasons. Must be able to provide some sort of notification of ETL changes that require manual intervention.
November 3, 2019 at 1:36 am
So read the header row from your data, split it, and compare it to your table columns. If they're not named the same, then you'll need a mapping table to help but it's just going to boil down to a full outer join between the two sets of column names. Then you can apply whatever business logic you want for what constitutes a change.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 3, 2019 at 1:46 am
Thank you for the response. So basically I can create a .NET application to provide additional functionality, but reading headers and splitting as you’ve suggested . Then I can repeat a similar process for stored procs etc. ? I’m not sure that SSIS will do everything I need which I why I brought up .NET. These databases are several TB in size ~10.
November 3, 2019 at 2:02 am
Thank you for the response. So basically I can create a .NET application to provide additional functionality, but reading headers and splitting as you’ve suggested . Then I can repeat a similar process for stored procs etc. ? I’m not sure that SSIS will do everything I need which I why I brought up .NET. These databases are several TB in size ~10.
No .NET application required. Just use BULK INSERT to read the first row of the files. And I wouldn't go near SSIS for this. Splitting the headers you read is easy with a string splitter in T-SQL. If you want to read file names, you could do that with xp_CmdShell or, if you can't convince people of the mistake of forbidding its use, you could always use xp_DirTree (with all 3 parameters) to get a listing of the files and dump that into a TempTable.
And I'm not sure what size has to do with any of this. You're not loading 10TB of data in one go, right?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 3, 2019 at 2:12 am
So, will this work from moving data from SQL 2012 to SQL 2017? Would this need to be setup as a SQL job?
im not trying to be a pain, but my boss is die hard developer and codes everything. I’d rather not reinvent the wheel if we don’t have to. As mentioned I just need to be sure that if manual intervention is required then there’s notification. Manual intervention would changes in stored procs, triggers and functions. Right now he’s planning on a .NET application with a UI. This will work with data as well correct?
November 3, 2019 at 3:10 am
My apologies. I normally don't associate the term "ETL" with moving data between databases even when one may be a source and another is a data warehouse. I normally associate ETL as importing data from non SQL Server sources such as files.
However, the same principles can apply with the requirement to check for changing schemas. Instead of reading the header from a file, you can read schema (column names and datatypes if you need that level of detail) from tables and do just about everything the same way that I suggested with files.
We do this with about 50 tables each night, reading the data from IBM PowerSystems/DB2 into SQL Server. We maintain 2 sets of tables on the receiver side. We load one set while having the other set online. When the set being loaded has completed, we just repoint some synonyms. The total downtime is measured in milliseconds and users never see it.
It's all done with T-SQL... no SSIS needed but... you could certainly use SSIS for such a thing but it's overkill and brings little else to the table. It can help you avoid linked servers but I don't find those to be a problem in the area of performance simply because of the way we use them. SSIS can have similar performance issues if you don't use it right so we just stuck with the linked servers.
We also use this same technique of having duplicate tables and repointing a synonym to the new/active data tables in a large number of places between SQL Servers, as well.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply