How do you handle SSIS package merging between two BI teams?

Question

How do you handle SSIS package merging between two BI teams?

Data Cruncher

SSC-Addicted

Points: 499
More actions
October 9, 2018 at 6:55 am

#365468

Hi!!
Here's the thing - I have the following scenario:
Base points
- Multiple teams work on the same Data Warehouse (DW)
- The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure
- The leaves are 'unit of work' (i.e. 1 table = 1 flow)
- Everything is tracked on GIT, with different branches
Issue
When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.
I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.
How do you handle shared SSIS projects?
Thanks

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246987 More actions · Answer 1

Data Cruncher - Tuesday, October 9, 2018 6:55 AM
Hi!!
Here's the thing - I have the following scenario:
Base points
- Multiple teams work on the same Data Warehouse (DW)
- The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure
- The leaves are 'unit of work' (i.e. 1 table = 1 flow)
- Everything is tracked on GIT, with different branches
Issue
When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.
I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.
How do you handle shared SSIS projects?
Thanks

You don't merge DTSX files. Treat them like executables.
Shared SSIS projects are completely fine, however â€“ but somehow you need to find a way of ensuring that the same package is not being worked on by more than one person at a time.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 2

Data Cruncher - Tuesday, October 9, 2018 6:55 AM
Hi!!
Here's the thing - I have the following scenario:
Base points
- Multiple teams work on the same Data Warehouse (DW)
- The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure
- The leaves are 'unit of work' (i.e. 1 table = 1 flow)
- Everything is tracked on GIT, with different branches
Issue
When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.
I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.
How do you handle shared SSIS projects?
Thanks

Easy, we have two data warehouses. 😀

One data warehouse for the BI team and one for everything else. This is because the BI team typically needed more granular and more rigorous work done versus the other teams that were primarily using the data warehouse for general reporting. Essentially, separating analytics from operational reporting. This evolved into the BI team using more of a cloud based solution (Azure DW) versus the operation reporting (SQL Server) on top of the BI team using other tools like machine learning, data lakes, etc that are more suited for analytical processing than the other.

The other go to is not having BI touch the data warehouse and just focus on the data marts. One team for data warehouse and one for data mart/OLAP creation. Anything the BI needs from the data warehouse is handled by ONE data warehouse team.

sgmunson SSC Guru Points: 110640 More actions · Answer 3

As has been mentioned, you don't merge SSIS packages. There's no tool that's ever going to be able to accomplish such a task. How could one possibly know what to do when pieces of two SSIS package edits conflict? Plus, who would even want to make an attempt to do that? You'd have to be able to manually choose what parts of the conflicting XML to save and thus you'd have to be able to know exactly what the rules are the exact same way that SSIS does, and you'd have to be able to recognize the problem in 100% of all such cases. Not likely in our lifetimes. The level of AI needed is well beyond what AI is likely to accomplish any time soon, and even then, might still be totally impractical. You have to use a source control system that allows you to check out the file such that no one else is allowed to make changes until after you check the code back in.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Data Cruncher SSC-Addicted Points: 499 More actions · Answer 4

Uhm, got it.. So we'll keep raising hands to get a priority on a development branch. Somehow I was hoping for a more computer-aided solution but your replies are more than reasonable.
Thank you folks 🙂

Phil Parkin SSC Guru Points: 246987 More actions · Answer 5

Data Cruncher - Wednesday, October 17, 2018 4:42 AM
Uhm, got it.. So we'll keep raising hands to get a priority on a development branch. Somehow I was hoping for a more computer-aided solution but your replies are more than reasonable.
Thank you folks 🙂

How big is your development team? If you're 'doing' Agile, it should be easy enough to know in advance what others are working on and thus avoid such collisions. I can't imagine any practical situations where you would want more than one person working on the same package, anyway.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Data Cruncher SSC-Addicted Points: 499 More actions · Answer 6

I agree with you @Phil Parkin but in this case the ETL has a hierarchical structure (4 levels). Each team has its own branch and when we work on leaf nodes (let's call them level 4 packages) everything is fine. However the issue arises when both teams work on intermediate nodes (ex. level 2 packages, that call level 3 packages, that call leaf packages).
So merging the branches causes issues on level 2 packages becasue both teams modified it (added logic and references to level 3 packages).
I hope I managed to explain myself, although I realize it can be a bit confusing at first (and yeah, I'm also a bit skeptical if all that hieararchy was really necessary).

Phil Parkin SSC Guru Points: 246987 More actions · Answer 7

Data Cruncher - Wednesday, October 17, 2018 6:59 AM
I agree with you @Phil Parkin but in this case the ETL has a hierarchical structure (4 levels). Each team has its own branch and when we work on leaf nodes (let's call them level 4 packages) everything is fine. However the issue arises when both teams work on intermediate nodes (ex. level 2 packages, that call level 3 packages, that call leaf packages).
So merging the branches causes issues on level 2 packages becasue both teams modified it (added logic and references to level 3 packages).
I hope I managed to explain myself, although I realize it can be a bit confusing at first (and yeah, I'm also a bit skeptical if all that hieararchy was really necessary).

OK, I think I understand.

I avoid anything like this because I have built a framework which allows me to specify, in metadata, which packages should be called as part of a job, and in what order (also allowing for parallelism).

Adopting this means that packages in jobs can be modified in isolation. We have a rule that packages in jobs do not reference any other packages.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 8

Data Cruncher - Tuesday, October 9, 2018 6:55 AM
How do you handle shared SSIS projects?

It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Parkin SSC Guru Points: 246987 More actions · Answer 9

Jeff Moden - Wednesday, October 17, 2018 7:48 AM
Data Cruncher - Tuesday, October 9, 2018 6:55 AM
How do you handle shared SSIS projects?
It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.

Can we set your username to SSIS Troll? 😛

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 10

Phil Parkin - Wednesday, October 17, 2018 8:15 AM
Jeff Moden - Wednesday, October 17, 2018 7:48 AM
Data Cruncher - Tuesday, October 9, 2018 6:55 AM
How do you handle shared SSIS projects?
It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.
Can we set your username to SSIS Troll? 😛

😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 11

Jeff Moden - Wednesday, October 17, 2018 9:37 AM
Phil Parkin - Wednesday, October 17, 2018 8:15 AM
Jeff Moden - Wednesday, October 17, 2018 7:48 AM
Data Cruncher - Tuesday, October 9, 2018 6:55 AM
How do you handle shared SSIS projects?
It's easy for me. I write stored procedures to replace them. 😉 Only 2 more to go and then we can turn off SSIS.
Can we set your username to SSIS Troll? 😛
😛

I mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀

Phil Parkin SSC Guru Points: 246987 More actions · Answer 12

xsevensinzx - Thursday, October 18, 2018 6:47 AM
I mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀

OK, here's a nice starter for you. What T-SQL commands would you use to grab Google Analytics data (which requires a web service login using OAuth 2.0)?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 13

Phil Parkin - Thursday, October 18, 2018 7:13 AM
xsevensinzx - Thursday, October 18, 2018 6:47 AM
I mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀
OK, here's a nice starter for you. What T-SQL commands would you use to grab Google Analytics data (which requires a web service login using OAuth 2.0)?

I don't know off the top of my head because I don't use Google Analytics data. But, if you've written code that can do it, then it can be done without the use of SSIS. I do have T-SQL-driven code that auto-magically downloads data from the NANPA website with the understanding that a special login isn't required.

Now, that being said, if someone has an SSIS package that already does that for Google anything, there's no sense in redeveloping the wheel... unless it becomes a problem (and writing a thousand packages is a problem especially if they need to (and they should) deploy even simple maintenance through Dev, Staging, and Prod, IMHO, if it comes to that).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Parkin SSC Guru Points: 246987 More actions · Answer 14

Jeff Moden - Thursday, October 18, 2018 7:36 AM
I don't know off the top of my head because I don't use Google Analytics data. But, if you've written code that can do it, then it can be done without the use of SSIS. I do have T-SQL-driven code that auto-magically downloads data from the NANPA website with the understanding that a special login isn't required.
Now, that being said, if someone has an SSIS package that already does that for Google anything, there's no sense in redeveloping the wheel... unless it becomes a problem (and writing a thousand packages is a problem especially if they need to (and they should) deploy even simple maintenance through Dev, Staging, and Prod, IMHO, if it comes to that).

I never intended to suggest that there was anything SSIS could do that no other product can do. Instead, I was responding to this assertion:

Anything you can do in SSIS, I can likely do the exact same with TSQL

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel