June 19, 2020 at 3:03 pm
I have numerous SSIS pkgs that I’ve written that run every 15 min to compare records in SQL Server tables with records in equivalently named tables in Oracle to keep both tables 100% identical in content.
I have currently been using just standard “Select...” statements as the sources in the SQL Server and Oracle OLE DB Source transformations.
I am curious if performance with these OLE DB Source transformations would be improved if I switched them out to call SQL Server and Oracle stored procedures instead of the direct “Select...” statements?
I have read plenty of opinions for and against use of stored procedures being called in an OLE DB Source, with some stating it’s a very wise thing to call stored procedures to get data in an OLE DB Source transformation, and others stating that doing so is just asking for problems.
I would appreciate any opinions / recommendations / personal experience anyone has had, positive or negative, with this.
Thank you!
June 19, 2020 at 4:25 pm
For anything beyond a simple
SELECT Col1, Col2 FROM Tab1
I use a stored proc. I want as much SQL logic as is practical to be kept within the database, as this makes future maintenance easier, because everything is in one place and, potentially, it performs better than an ad hoc query submitted by a SSIS package.
That 'asking for problems' comment ... I'd like to know more detail. I've been using procs as data sources for years and don't remember having anything weird happen.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 19, 2020 at 4:58 pm
Thank you very much for the insight.
Most of the queries are straightforward "Select Col1, Col2, Col3 from Table" in nature. There are, however, several of the SSIS pkgs that have derived columns setup in them and I think w/a fairly high # of records being retrieved, these derived columns would be more quickly generated and served-up to the SSIS pkg via a stored procedure along with the rest of the columns selected than the way I'm deriving those columns directly within the SSIS pkg. Wouldn't you think that as well?
Thank you again!
June 19, 2020 at 5:25 pm
Wouldn't you think that as well?
Yes, 100%. Makes your package cleaner too, if you'll pardon the expression. I hate complex SSIS packages (especially those inherited from some other 'one package to rule them all' genius) & try to keep the flow through them as clean and uncluttered as possible. This tends to help both performance and maintainability.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 19, 2020 at 9:06 pm
I agree with Phil here. 100%. More than 100% if that was possible!
Originally we had a lot of "one-off" SSIS packages that were created for moving data for a very specific process or report. This resulted in several "staging" tables getting populated multiple times as the individual packages were the full ETL for a single report. Our thought (being quite new to SSIS as a team) was to take these large number of packages and group them together by data source. So all SSIS pacakges that pulled data from SQL Instance A went into 1 massive SSIS package. Eventually the runtime got too large and testing for adding simple ETL job steps became a nightmare.
So we are about to go onto attempt 3 of our SSIS design process this year. The previous manager had insisted on a "NO TSQL OR STORED PROCEDURES IN SSIS" rule for the team. This resulted in a lot of very messy SSIS packkages. If we needed to move data from one instance to another, it was by SSIS. So SSIS with no TSQL or stored procedures got messy. We got an exception to that rule because we needed to prune our staging tables prior to populating them so we were allowed to run TRUNCATE statements in them.
Our new approach is "pick the right tool for the job". Test things, experiment with things, find the right tool for the job. If a stored procedure is faster than using SSIS, use that. If a MERGE is faster than SSIS, use that. Make things efficient, but also keep them manageable and testable. Plus, since that manager is gone, we can now do things like TSQL to get the data which is an incredible time saver (only grabbing the columns we need rather than having a table/view as our data source... almost as bad as SELECT *).
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply