May 8, 2009 at 10:21 am
Note: I've reposted this from the "Working with Oracle" forum as no-one seems to go there! I hope I've not broken any rules?
We're in the process of building a new ASP.NET/SQL Server website to replace an old Oracle/PHP site. I'm charged with all the database aspects, and as such am designing a relational model to eliminate redundancy and increase flexibility over the old system which is driven from a few large reporting tables (the old system was built to enable searching whereas we're using a 3rd party search tool that we will feed with data changes via views).
What I will need to do, eventually, is take data from these big, wide Oracle tables a couple of times per day (both systems will run in parallel for a while, with the data loads into the existing system and the new one taking data from there) and split it up across the new normalised structure, looking up foreign key references etc. as the data is inserted (i.e. the existing data will have make, model, type, various flags etc. all as text values, I have those things in separate tables as foreign key references and need to look up the Ids for the values and put those Ids in my core tables).
My question is what's the best way to approach this? I see two main approaches: Carry out the logic in SSIS or dump the raw data to a copy of the original tables and process it with stored procedures and/or triggers.
We're under a bit of time pressure on the project, and while I can take the time to get my chosen approach right, I don't want to spend significant amounts of time trying one thing only to have to abandon it as too complex and start again with another approach, so any advice will be gratefully recieved.
To give an idea of the complexity, the three main tables I will be processing have between 100 and 200 columns (though I can probably ignore half of them), and my normalised structure currently has around 50 tables (there will be a few more by the time I'm done but I'll probably end up with no more than 70).
Oh, and the existing data quality isn't great so there will be some cleaning of data involved.
Whaddya reckon?
Cheers,
Dave.
May 11, 2009 at 5:49 pm
This kind of thing is really easy to whip up with ssis so the question you seem to be asking is what method you are most comfortable with for this temporary process. I'd say if you are under some time pressure and don't have time to learn ssis but you already know how to do it with procedures then go ahead and use them. Otherwise a set of ssis packages will be: easier to maintain in case you are unavailable (because a temp project probably won't have a complete set of support documentation, am I right?) and also easier to clean up after the migration. A bunch of procedures might lurk in corners of the system forever but there's nothing wrong with using them if that's faster and easier for you to get this thing going.
May 18, 2009 at 4:01 am
magarity kerns (5/11/2009)
This kind of thing is really easy to whip up with ssis so the question you seem to be asking is what method you are most comfortable with for this temporary process.
Thanks for the response,
It's not so temporary - I will be building regular update processes (to run multiple times per day) based on these initial load processes.
I've decided to go with SSIS as it ought to be the right tool (I'm extracting, transforming and loading after all!) - though it's not quite proving "easy to whip up" in so far as SSIS demands a different approach to the necessary logic than the more familiar SQL constructs, so there's a fair bit of learining and trial and error involved before I get it right.
Dave.
May 18, 2009 at 9:50 am
dave.farmer (5/18/2009)there's a fair bit of learining and trial and error involved before I get it right.
In lousy economic times its even more critical than ever for IT people to constantly learn new skills.
I sometimes consult on Informatica, another ETL tool, and here is my advice: When you write (proper) SQL procedures, it deals with complete sets of data. When you use an ETL tool (SSIS included) it deals with a flow of data. This isn't the dreaded RBAR of a cursor, but neither is it the entire set in a single bash. Once you get used to thinking of the ETL workflow as a flow, you've got a good handle on it. Just remember that data is moving through the workflow as a set, one row at a time. Got it? 😉
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply