October 9, 2018 at 6:55 am
Hi!!
Here's the thing - I have the following scenario:
Base points
- Multiple teams work on the same Data Warehouse (DW)
- The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure
- The leaves are 'unit of work' (i.e. 1 table = 1 flow)
- Everything is tracked on GIT, with different branches
Issue
When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.
I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.
How do you handle shared SSIS projects?
Thanks
October 9, 2018 at 7:22 am
Data Cruncher - Tuesday, October 9, 2018 6:55 AMHi!!
Here's the thing - I have the following scenario:
Base points
- Multiple teams work on the same Data Warehouse (DW)
- The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure
- The leaves are 'unit of work' (i.e. 1 table = 1 flow)
- Everything is tracked on GIT, with different branches
Issue
When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.
I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.
How do you handle shared SSIS projects?
Thanks
You don't merge DTSX files. Treat them like executables.
Shared SSIS projects are completely fine, however – but somehow you need to find a way of ensuring that the same package is not being worked on by more than one person at a time.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 9, 2018 at 7:38 am
Data Cruncher - Tuesday, October 9, 2018 6:55 AMHi!!
Here's the thing - I have the following scenario:
Base points
- Multiple teams work on the same Data Warehouse (DW)
- The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure
- The leaves are 'unit of work' (i.e. 1 table = 1 flow)
- Everything is tracked on GIT, with different branches
Issue
When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.
I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.
How do you handle shared SSIS projects?
Thanks
Easy, we have two data warehouses. 😀
One data warehouse for the BI team and one for everything else. This is because the BI team typically needed more granular and more rigorous work done versus the other teams that were primarily using the data warehouse for general reporting. Essentially, separating analytics from operational reporting. This evolved into the BI team using more of a cloud based solution (Azure DW) versus the operation reporting (SQL Server) on top of the BI team using other tools like machine learning, data lakes, etc that are more suited for analytical processing than the other.
The other go to is not having BI touch the data warehouse and just focus on the data marts. One team for data warehouse and one for data mart/OLAP creation. Anything the BI needs from the data warehouse is handled by ONE data warehouse team.
October 11, 2018 at 6:35 am
As has been mentioned, you don't merge SSIS packages. There's no tool that's ever going to be able to accomplish such a task. How could one possibly know what to do when pieces of two SSIS package edits conflict? Plus, who would even want to make an attempt to do that? You'd have to be able to manually choose what parts of the conflicting XML to save and thus you'd have to be able to know exactly what the rules are the exact same way that SSIS does, and you'd have to be able to recognize the problem in 100% of all such cases. Not likely in our lifetimes. The level of AI needed is well beyond what AI is likely to accomplish any time soon, and even then, might still be totally impractical. You have to use a source control system that allows you to check out the file such that no one else is allowed to make changes until after you check the code back in.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 17, 2018 at 4:42 am
Uhm, got it.. So we'll keep raising hands to get a priority on a development branch. Somehow I was hoping for a more computer-aided solution but your replies are more than reasonable.
Thank you folks 🙂
October 17, 2018 at 5:23 am
Data Cruncher - Wednesday, October 17, 2018 4:42 AMUhm, got it.. So we'll keep raising hands to get a priority on a development branch. Somehow I was hoping for a more computer-aided solution but your replies are more than reasonable.
Thank you folks 🙂
How big is your development team? If you're 'doing' Agile, it should be easy enough to know in advance what others are working on and thus avoid such collisions. I can't imagine any practical situations where you would want more than one person working on the same package, anyway.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 17, 2018 at 6:59 am
I agree with you @Phil Parkin but in this case the ETL has a hierarchical structure (4 levels). Each team has its own branch and when we work on leaf nodes (let's call them level 4 packages) everything is fine. However the issue arises when both teams work on intermediate nodes (ex. level 2 packages, that call level 3 packages, that call leaf packages).
So merging the branches causes issues on level 2 packages becasue both teams modified it (added logic and references to level 3 packages).
I hope I managed to explain myself, although I realize it can be a bit confusing at first (and yeah, I'm also a bit skeptical if all that hieararchy was really necessary).
October 17, 2018 at 7:15 am
Data Cruncher - Wednesday, October 17, 2018 6:59 AMI agree with you @Phil Parkin but in this case the ETL has a hierarchical structure (4 levels). Each team has its own branch and when we work on leaf nodes (let's call them level 4 packages) everything is fine. However the issue arises when both teams work on intermediate nodes (ex. level 2 packages, that call level 3 packages, that call leaf packages).
So merging the branches causes issues on level 2 packages becasue both teams modified it (added logic and references to level 3 packages).
I hope I managed to explain myself, although I realize it can be a bit confusing at first (and yeah, I'm also a bit skeptical if all that hieararchy was really necessary).
OK, I think I understand.
I avoid anything like this because I have built a framework which allows me to specify, in metadata, which packages should be called as part of a job, and in what order (also allowing for parallelism).
Adopting this means that packages in jobs can be modified in isolation. We have a rule that packages in jobs do not reference any other packages.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 17, 2018 at 7:48 am
Data Cruncher - Tuesday, October 9, 2018 6:55 AMHow do you handle shared SSIS projects?
It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 17, 2018 at 8:15 am
Jeff Moden - Wednesday, October 17, 2018 7:48 AMData Cruncher - Tuesday, October 9, 2018 6:55 AMHow do you handle shared SSIS projects?
It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.
Can we set your username to SSIS Troll? 😛
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 17, 2018 at 9:37 am
Phil Parkin - Wednesday, October 17, 2018 8:15 AMJeff Moden - Wednesday, October 17, 2018 7:48 AMData Cruncher - Tuesday, October 9, 2018 6:55 AMHow do you handle shared SSIS projects?
It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.
Can we set your username to SSIS Troll? 😛
😛
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2018 at 6:47 am
Jeff Moden - Wednesday, October 17, 2018 9:37 AMPhil Parkin - Wednesday, October 17, 2018 8:15 AMJeff Moden - Wednesday, October 17, 2018 7:48 AMData Cruncher - Tuesday, October 9, 2018 6:55 AMHow do you handle shared SSIS projects?
It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.
Can we set your username to SSIS Troll? 😛
😛
I mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀
October 18, 2018 at 7:13 am
xsevensinzx - Thursday, October 18, 2018 6:47 AMI mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀
OK, here's a nice starter for you. What T-SQL commands would you use to grab Google Analytics data (which requires a web service login using OAuth 2.0)?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 18, 2018 at 7:36 am
Phil Parkin - Thursday, October 18, 2018 7:13 AMxsevensinzx - Thursday, October 18, 2018 6:47 AMI mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀OK, here's a nice starter for you. What T-SQL commands would you use to grab Google Analytics data (which requires a web service login using OAuth 2.0)?
I don't know off the top of my head because I don't use Google Analytics data. But, if you've written code that can do it, then it can be done without the use of SSIS. I do have T-SQL-driven code that auto-magically downloads data from the NANPA website with the understanding that a special login isn't required.
Now, that being said, if someone has an SSIS package that already does that for Google anything, there's no sense in redeveloping the wheel... unless it becomes a problem (and writing a thousand packages is a problem especially if they need to (and they should) deploy even simple maintenance through Dev, Staging, and Prod, IMHO, if it comes to that).
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2018 at 7:41 am
Jeff Moden - Thursday, October 18, 2018 7:36 AMI don't know off the top of my head because I don't use Google Analytics data. But, if you've written code that can do it, then it can be done without the use of SSIS. I do have T-SQL-driven code that auto-magically downloads data from the NANPA website with the understanding that a special login isn't required.
Now, that being said, if someone has an SSIS package that already does that for Google anything, there's no sense in redeveloping the wheel... unless it becomes a problem (and writing a thousand packages is a problem especially if they need to (and they should) deploy even simple maintenance through Dev, Staging, and Prod, IMHO, if it comes to that).
I never intended to suggest that there was anything SSIS could do that no other product can do. Instead, I was responding to this assertion:
Anything you can do in SSIS, I can likely do the exact same with TSQL
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply