SSIS and GIT structure

  • New gig uses GIT for source control (Bitbucket to be exact) which I'm not really all that familiar with (that's changing daily).

    What are your best practices for managing SSIS projects with GIT?  The options seem to be a repository with multiple SSIS projects underneath or a GIT project with one repository per SSIS project.  I would think a single repository with multiple SSIS projects underneath might be troublesome when it comes to branching.  I know of a concept known as submodules but have not really looking into it yet.

    With TFS I used this folder structure
    Database
    - DB1
    - DB2
    - etc
    SSAS
    - sales
    - etc
    SSIS
    - DBM1
    - ETL1
    - ETL2
    - etc
    SSRS
    - Sales
    - etc

    Thanks.

  • Tom_Hogan - Wednesday, July 12, 2017 7:14 AM

    New gig uses GIT for source control (Bitbucket to be exact) which I'm not really all that familiar with (that's changing daily).

    What are your best practices for managing SSIS projects with GIT?  The options seem to be a repository with multiple SSIS projects underneath or a GIT project with one repository per SSIS project.  I would think a single repository with multiple SSIS projects underneath might be troublesome when it comes to branching.  I know of a concept known as submodules but have not really looking into it yet.

    With TFS I used this folder structure
    Database
    - DB1
    - DB2
    - etc
    SSAS
    - sales
    - etc
    SSIS
    - DBM1
    - ETL1
    - ETL2
    - etc
    SSRS
    - Sales
    - etc

    Thanks.

    If you have SSIS solutions containing multiple projects, you really don't want to be splitting those projects up across repos, IMO.
    What problems are you envisaging with having a single repo? Branching in Git is way faster and easier than branching in TFS.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Wednesday, July 12, 2017 7:41 AM

    If you have SSIS solutions containing multiple projects, you really don't want to be splitting those projects up across repos, IMO.
    What problems are you envisaging with having a single repo? Branching in Git is way faster and easier than branching in TFS.

    I was thinking about multiple developers working on the different projects in different branches but, talking it through, it doesn't really seem like a problem the way GIT's branching works.  Most likely, they'd be doing development on local branches rather than the central repository than like how TFS works.    Sometimes I overthink things :).

    Thanks.

  • Tom_Hogan - Wednesday, July 12, 2017 8:05 AM

    I was thinking about multiple developers working on the different projects in different branches but, talking it through, it doesn't really seem like a problem the way GIT's branching works.  Most likely, they'd be doing development on local branches rather than the central repository than like how TFS works.    Sometimes I overthink things :).

    Thanks.

    While it is true that devs should work on local branches, you do need to give a good deal of thought to your entire development process and how it works with Git.
    I try to keep it simple, having two main 'server based' branches, one for QA and one for Production. Our CI processes use only these branches when maintaining QA and Production environments.
    Other branches are feature branches (for development, branched from QA) and hotfix branches (branched from Production). You'll need to get savvy with merging, rebasing and conflict resolution so that you'll know what to do when your local branch changes conflict with someone else's.
    If you are using JIRA as well as Bitbucket, you may like to tag your Git commits with JIRA document codes, which creates a two-way link between the systems.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Wednesday, July 12, 2017 1:14 PM

    While it is true that devs should work on local branches, you do need to give a good deal of thought to your entire development process and how it works with Git.
    I try to keep it simple, having two main 'server based' branches, one for QA and one for Production. Our CI processes use only these branches when maintaining QA and Production environments.
    Other branches are feature branches (for development, branched from QA) and hotfix branches (branched from Production). You'll need to get savvy with merging, rebasing and conflict resolution so that you'll know what to do when your local branch changes conflict with someone else's.
    If you are using JIRA as well as Bitbucket, you may like to tag your Git commits with JIRA document codes, which creates a two-way link between the systems.

    Thanks Phil.  Good advice here.  I always prefer to keep things simple.  We're waiting on a Git plug-in to be installed into our defect / enhancement tracking software.  We know it'll be extremely useful in the future.

    Time to do some research.

  • Hi Tom & Phil,

    I am currently setting up version control and workflow / processes within a DevOps MI / datawarehouse team I have just joined.

    We will be moving from SSDT with no version control to Visual Studio 2015 Pro, Git and Bitbucket, plus Jira & Confluence.

    I have been researching and experimenting for a few weeks now and am really impressed with GIT. A great tool and expletive all in one 3 letter word. Anyway, thanks for your recommendations Phil, all makes good sense. We have settled on feature branching driven from Jira and in my testing that works really well even though I haven't unleashed anything on our team yet. However, I did have some questions about how you ask developers to manage their repos on their local drives.

    We have a large number database and ETL projects and will keep them in separate repos, but based on recommendations about CI testing, are thinking we may need to go even further with databases defined one per repo (to reduce unwanted build overhead in CI).

    So Q1 - is CI really that practical or beneficial within an MI estate and is one repo per database sensible (eg. are the pros wors than the cons) - we have about 100 databases 🙁
    -------------
    I have been simulating multiple developers on my laptop by trashing repos and re-cloning and also having multiple repos on my C drive. I have separate repo folders for database and SSIS projects, with multiple SSIS projects nested within their folder.
    Nothing special here, but when we seed Bitbucket with our existing software folders (currently on fileshares) we will need to undertake quite a lot of massaging to get the structures right for uploading. 

    So Q2 - what have you tried and settled on? When one clones a repo, I gather you can only get the entire repo, so developers working on SSIS projects will end up with the entire SSIS project repo (this will probably be split by 5-6 business function or data warehouse specific areas, but each repo will still contain in the hundreds of packages each).

    Thanks
    Eric

  • Not trying to answer for those two, but here is my two cents.

    Ideally, you're going to create 2 main lines to start. One for dev and one for your master. When you create a new task in JIRA to do something to an SSIS package, the team should be able to branch right from that task in JIRA with the help of Bitbucket. This should copy the entire repo to their machine, which is hopefully their developer environment.

    In this sense, the developer environment is a private instance only for that developer and no one else. Thus, it does make sense to have separate database environments for that developer and anything else the need so they do not conflict with other developers. You're not just doing this just to do this, you're doing this to address a problem you may have in development -- that conflict. If you feel a conflict can happen with sharing databases, then address the problem, not just the idea. No one says you can't have one database environment for multiple developers if there is no conflict. But in most cases, if you have two separate tasks for two separate developers for the same table or something. It will likely conflict.

    That being said, the app structure of what you are working with is important to GIT too. If you're changing that structure to stack repos, then you may want to rethink your dev environment and process. It should be straight forward. Branch from a repo to work on a new feature or bug to anywhere you like in your environment, do the work, commit those changes, review, merge, and move on.

  • xsevensinzx - Friday, March 23, 2018 6:12 AM

    Not trying to answer for those two, but here is my two cents.

    Ideally, you're going to create 2 main lines to start. One for dev and one for your master. When you create a new task in JIRA to do something to an SSIS package, the team should be able to branch right from that task in JIRA with the help of Bitbucket. This should copy the entire repo to their machine, which is hopefully their developer environment.

    In this sense, the developer environment is a private instance only for that developer and no one else. Thus, it does make sense to have separate database environments for that developer and anything else the need so they do not conflict with other developers. You're not just doing this just to do this, you're doing this to address a problem you may have in development -- that conflict. If you feel a conflict can happen with sharing databases, then address the problem, not just the idea. No one says you can't have one database environment for multiple developers if there is no conflict. But in most cases, if you have two separate tasks for two separate developers for the same table or something. It will likely conflict.

    That being said, the app structure of what you are working with is important to GIT too. If you're changing that structure to stack repos, then you may want to rethink your dev environment and process. It should be straight forward. Branch from a repo to work on a new feature or bug to anywhere you like in your environment, do the work, commit those changes, review, merge, and move on.

    Thanks for your response.

    I understand the point of local repos and keeping developer activity separated. My points about one DB per repo was just about automated CI tools that rebuild all databases in a commit - if they are in the repo they are implied in the commit is my understanding).

    We wont be re purposing the application; the current setup of the source code folders is a purely arbitrary structure from the top as a Function ID, with database and ETL nested in separate sub folders. These need to be pivoted out to ETL and Database at the top with Function ID repeated within those new top level starting points containing the relevant objects. That's the bulk of what we have.

    However I'm sure I will discover the outlier software soon enough and also the gnarly facts of seeding a large codebase into Bitbucket from scratch.

  • How would you configure bitbucket if 2 or more developers are working on the same SSIS project? My vision is for developers working on separate packages and then committing to bitbucket. My issue is that there are additional files (e.g., *.dtproj, *.conmgr, *.params) that would be problematic (updated in both environments). Should these files be ignored? How do we coalesce separate packages into a consolidation build?

  • pgjerde wrote:

    How would you configure bitbucket if 2 or more developers are working on the same SSIS project? My vision is for developers working on separate packages and then committing to bitbucket. My issue is that there are additional files (e.g., *.dtproj, *.conmgr, *.params) that would be problematic (updated in both environments). Should these files be ignored? How do we coalesce separate packages into a consolidation build?

    You are right to be concerned about these things. The main problem you will face is with edits to existing files (adding and removing files tends not to be much of an issue in VCSs), and the question boils down to

    "Can I reliably perform a line by line textual merge of two separately updated versions of the same file?"

    In practice, the answer depends on the developers' knowledge of the XML structures of the files and their competence with whatever merging tool you have chosen, as well as the volume and complexity of the changes.

    A git rebase (of developer B's changes, on top of developer A's) might also get the job done.

    The developers need to know what they are doing, and even then it might end up being easier for dev A to save their changes somewhere, then undo/revert, then pull dev B's changes, then reapply their changes over the top.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply