June 13, 2011 at 6:50 pm
My boss has created an SSIS package that calls a view within the source (SQL Server 2008 staging database). I just looked at that view and discovered that it calls another view, which in turn calls a combination of views and tables joined together. Although I understand that he is using views as a way to re-use code (I think his background is in C++ and Java), I don't know if this is the best approach for transforming data, especially if we are using SSIS packages.
Am I correct? If so, I'm trying to get my argument together so I can politely and respectfully share with the boss. I think the disadvantage of having an SSIS package call views that call other views are:
1. Performance. With large datasets I believe this approach is going to bog down.
2. Debugging. If we put the steps within an SSIS dataflow, we can better see the point of failure while the SSIS package is running. With the boss' approach, debugging views is going to be more challenging.
3. Maintainability/Learning Curve. As our organization grows, we will probably be adding more database people. It will be a challenge for a new person to dig through views of views to figure out what they are supposed to do.
Again, am I right about this? Am I making a good argument to help the boss?
June 13, 2011 at 8:46 pm
imani_technology (6/13/2011)
My boss has created an SSIS package that calls a view within the source (SQL Server 2008 staging database). I just looked at that view and discovered that it calls another view, which in turn calls a combination of views and tables joined together. Although I understand that he is using views as a way to re-use code (I think his background is in C++ and Java), I don't know if this is the best approach for transforming data, especially if we are using SSIS packages.Am I correct? If so, I'm trying to get my argument together so I can politely and respectfully share with the boss. I think the disadvantage of having an SSIS package call views that call other views are:
1. Performance. With large datasets I believe this approach is going to bog down.
It depends on what the VIEW is doing. Just because a VIEW is calling another VIEW which is then JOINing to tables does not mean it will perform poorly. If done improperly (poor indexing, poor schema, poor query structure), then yes, the type of design you're describing can perform extremely poorly and be very difficult to debug, tune and refactor however that can be said of almost any data access method you choose.
2. Debugging. If we put the steps within an SSIS dataflow, we can better see the point of failure while the SSIS package is running. With the boss' approach, debugging views is going to be more challenging.
I find a better approach is to place all data access logic in a stored procedure and call that from SSIS. If your boss still wants to use nested VIEWs, that's OK, abstract their use into a stored procedure. Then, if the views start causing performance issues down the line you can refactor the proc while still maintaining the input and output interfaces and your SSIS packages will not be impacted. It's a good practice in and of itself, whether you're using views or base tables, to do all data access through stored procedures. It insulates your SSIS from schema changes and you get a performance boost from having cached execution plans.
3. Maintainability/Learning Curve. As our organization grows, we will probably be adding more database people. It will be a challenge for a new person to dig through views of views to figure out what they are supposed to do.
Stress to your boss that what has been constructed has a step learning curve and will be difficult to maintain. Having multiple people maintain nested VIEWs is a challenge and explain that there are better, cheaper to maintain, ways to implement data access...namely stored procedures. Stored procedures offer code re-use and have the benefits over views mentioned earlier.
Again, am I right about this? Am I making a good argument to help the boss?
You are right to be concerned. The two projects I have seen that have employed this technique in a multi-user shop both failed to achieve the desired effect: code re-use while maintaining a high level of performance. Both ended in a rewrite to move all data access logic into stored procedures. That said, there is no technical right or wrong in this instance. The design can, and I'm sure has worked in some implementations.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 14, 2011 at 3:13 pm
imani_technology (6/13/2011)
Again, am I right about this? Am I making a good argument to help the boss?
yes. I wouldn't bring up the point on performance..but I think you have plenty of ammo just from a maintainability aspect.
fwiw, we have a lot of this crap too...and it is a major PITA when the related processes need to be updated for whatever reason.
I've noticed this design pattern [nested nested nested views] created by people who come from an Access background (where you see a lot of query on query on query).
Please update with results...I'm very curious to see if changes are permitted.
Data Enthusiast | @SQLbyoBI | www.opifexsolutions.com
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply