May 6, 2009 at 6:05 am
Hello helpful and friendly people!
We're in the process of building a new ASP.NET/SQL Server website to replace an old Oracle/PHP site. I'm charged with all the database aspects, and as such am designing a relational model to eliminate redundancy and increase flexibility over the old system which is driven from a few large reporting tables (the old system was built to enable searching whereas we're using a 3rd party search tool that we will feed with data changes via views).
What I will need to do, eventually, is take data from these big, wide Oracle tables a couple of times per day (both systems will run in parallel for a while, with the data loads into the existing system and the new one taking data from there) and split it up across the new normalised structure, looking up foreign key references etc. as the data is inserted (i.e. the existing data will have make, model, type, various flags etc. all as text values, I have those things in separate tables as foreign key references and need to look up the Ids for the values and put those Ids in my core tables).
My question is what's the best way to approach this? I see two main approaches: Carry out the logic in SSIS or dump the raw data to a table and process it with stored procedures and/or triggers.
We're under a bit of time pressure on the project, and while I can take the time to get my chosen approach right, I don't want to significant amounts of time trying one thing only to have to abandon it as too complex and start again with another approach, so any advice will be gratefully recieved.
To give an idea of the complexity, the three main tables I will be processing have between 100 and 200 columns (though I can probably ignore half of them), and my normalised structure currently has around 50 tables (there will be a few more by the time I'm done but I'll probably end up with no more than 70).
Oh, and the existing data quality isn't great so there will be some cleaning of data involved.
Whaddya reckon?
Cheers,
Dave.
September 9, 2009 at 6:24 am
This was removed by the editor as SPAM
March 16, 2010 at 1:00 pm
Well, this answer is a bit late for you (I hope!!), but it might help someone else.
The biggest variable you gave us no info on is the network load aspects of your environment.
1. Are Oracle and SQL Server both local databases (as opposed to one being accessed over a remote connection)?
2. Is your network load too high (or likely to become too high) anytime soon?
In addition to the above concerns:
3. How constrained are you for disk space on your SQL Server machine?
4. How constrained are you for overall load run time?
If your network is overloaded (or is expected to become so), then you need a design that minimizes network traffic. One pull of the flat tables across the network may put less load on it than 50 data streams into your new tables. Once across the network, then split it out. In effect, if network load constrained, you are trading disk space usage against network bandwidth usage.
June 22, 2010 at 7:21 am
Hello All
I am over here to tell that i am new to working with oracle programming language. Well i had some basic related to C,C++ and Java. So could any one tell me how to approach SSIS from big, flat oracle DB normalised SQL 2008 DB.
________________________________
Discover the hidden source of targeted traffic that unearths eager buying customers, drives them straight to your site, and literally explodes your profits almost overnight at just simple click unique article wizard.[/url]
June 22, 2010 at 11:25 am
daniel.paul1986 (6/22/2010)
Hello AllI am over here to tell that i am new to working with oracle programming language. Well i had some basic related to C,C++ and Java. So could any one tell me how to approach SSIS from big, flat oracle DB normalised SQL 2008 DB.
.[/url]
Hi Daniel,
I wrote all the logic into a nested structure of SSIS jobs, and having lived with that, adding new functionality, nesting it within wider loading schedules, bug-fixing and testing for around a year now, I'd say DON'T DO WHAT I DID!
What I have constructed is clever and intricate, I'm proud of the issues I've solved and the knowledge I've gained certainly, but the problems outweigh the advantages many times over.
If I were doing it again, I would stick with very simple SSIS jobs - dumping whole tables to copies in a staging SQL database perhaps, or performing minimal splitting and/or sorting only. I would do all the processing with stored procedures on the SQL end.
The advantages with that approach would be to give you much more control over the processing in a variety of ways. The SSIS collection I have all runs as one, and takes over an hour at 4am. Running to test during the day can take 3 hours or more, and as each package talks to the others (there is a strict order necessary, meaning each package cannot be run in isolation) the whole suite has to be run together. If you are simply dumping the data to local copies then processing it from there, you can test individual stored procedures in isolation as you develop them. You can switch parts on and off in your controlling procedures etc. Also, when testing or developing, you have the local data copy available to process and reprocess to your heart's content without having to constantly load the network re-grabbing the data as I have to as the extract and logic are combined in each SSIS package. You can even add new data locally to allow development of the logic side, before even amending the SSIS to grab that new data.
We have a sequence of loading tasks: data from customers is loaded to Oracle and processed there, we then extract with my suite of SSIS packages, then we have to copy data to other SQL environments, then load the search engines (this last step involves taking our site offline). All these steps must complete before the next one can start, but due to restrictions in our setup and control of each environment, they are all scheduled tasks, so if one is not finished when the next one starts, the whole process fails. All this has a strict time limit between the earliest we can start taking the data and the latest we can have the site offline. With a quicker extract (i.e. performing no logic) we would be less at the mercy of network conditions for our timings, and more able to be flexible with the logic when things were running slow.
Finally, when something goes wrong in the logic (due to programming errors or corrupt or unexpected data), you don't want that to be in an SSIS package. Error reporting and tracking is a nightmare, whereas in stored procedures, utilising try/catch and error handling blocks, you can record just about any information you need. Add that to the ready repeatability and easy editing of stored procedures, and tracking and fixing errors will be orders of magnitude easier than finding them in the middle of a cascade of SSIS jobs that take hours to run every time and don't like to tell you what is going wrong! In my case, it's even worse, as we are using package variables provided to the parent package by the SQL Agent job that runs them... so we can't even run our packages interactively without major surgery which in itself may invalidate the results!
Worse still, with everything in SSIS, once you've solved all the problems and got it working, no-one else will understand a word of it. With your logic in stored procedures it's a lot more accessible to the generalist database or even front end developers, whereas SSIS is a whole new tool most of them will run a mile from.
So, in summary, if you want to overcome massive impediments, solve what seem huge, insoluble problems as an academic exercise, and feel proud of your towering intellect, (and be stuck with it as long as you remain at the company) then do it all in SSIS.
On the other hand, if you want testable, controllable, debuggable code that not only does the job but allows other developers to understand and join in without you spending days at a time explaining the intricacies of SSIS to them, then use SSIS in the most simple way possible and conduct the complex logic in nice, accessible stored procedures.
I hope that helps?!? 🙂
Dave.
June 28, 2010 at 12:28 am
What I will need to do, eventually, is take data from these big, wide Oracle tables a couple of times per day (both systems will run in parallel for a while, with the data loads into the existing system and the new one taking data from there) and split it up across the new normalised structure, looking up foreign key references etc. as the data is inserted (i.e. the existing data will have make, model, type, various flags etc. all as text values, I have those things in separate tables as foreign key references and need to look up the Ids for the values and put those Ids in my core tables).
My question is what's the best way to approach this? I see two main approaches: Carry out the logic in SSIS or dump the raw data to a table and process it with stored procedures and/or triggers.
June 28, 2010 at 1:57 am
This was removed by the editor as SPAM
June 30, 2010 at 5:14 am
Hello,
I'm new to this this programming language called Oracle, so could anyone please tell me how can i go through this subject and learn it.
__________________________________________________
Discover the hidden source of targeted traffic that unearths eager buying customers, drives them straight to your site, and literally explodes your profits almost overnight at just simple click unique article wizard.[/url]
June 30, 2010 at 7:22 am
This was removed by the editor as SPAM
July 10, 2010 at 4:28 pm
If you're familiar with T-SQL (Microsoft) then PL/SQL (Oracle) shouldn't be too much of a stretch. If you're not familiar with either, then, well you'll need to pick up a book, get some mentoring, or google what you're trying to do. If you could be more specific, I'd be happy to help you with the syntax and point out any gotchas.
________________________________________
Get some SQLey goodness at Dirt McStain's SQL Blog
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply