Easy Question: Organizing Data Flows

  • Since I'm new to this and data warehousing, I'm not sure what is the best way to do what I'm wanting. It just seems best and more impressive to move my different packages into one package.

    What I'm doing is pulling information from various sources into a Data Mart. My Data Mart has a set of 6 dimensional tables and one fact table. Each data flow task is in its own package (currently).

    So the way I'm thinking of organizing this is this:

    1) In Control Flow, get two sequence containers. The first container will contain all the dimension tables' ETL processes or Data flows.

    2) The second container would have the fact table.

    I don't know if this is a good design or if I will run into problems, but logically it seems I should make sure my dimensions are loaded first and then my fact table. So why not put them all in the same package?

    I know that the info doesn't have to be placed in the dimensions first, but i don't really know how to do that yet. I'm just familiar.

    Thanks!

  • I don't use SSIS but, speaking from an old dude's perspective, I'd likely make the loading of each table a separate package for reusability and then have a control package (dunno if it's possible... like I said, I don't use SSIS) that defined the order of execution. That could also allow parallel execution if you needed it and the system could tolerate it. Makes for a hellova performance boost when it works right.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That really sounds like a good point. I'd still like to get some feedback from someone who uses SSIS.

    But like you said from a re-usability standpoint, these dimension tables will be re-used in other data marts. It is probably not wise to presume we will load each dimension before loading all the different fact tables that use them.

  • Jeff Moden (12/1/2013)


    I don't use SSIS but, speaking from an old dude's perspective, I'd likely make the loading of each table a separate package for reusability and then have a control package (dunno if it's possible... like I said, I don't use SSIS) that defined the order of execution. That could also allow parallel execution if you needed it and the system could tolerate it. Makes for a hellova performance boost when it works right.

    Yes, master and child packages are possible. There is an execute package task.

    From the sound of things that would be my approach here, especially since the child packages are already built.

  • Nevyn (12/1/2013)


    Yes, master and child packages are possible. There is an execute package task.

    Thanks for the confirmation on that, Nevyn.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/1/2013)


    I don't use SSIS but, speaking from an old dude's perspective, I'd likely make the loading of each table a separate package for reusability and then have a control package (dunno if it's possible... like I said, I don't use SSIS) that defined the order of execution. That could also allow parallel execution if you needed it and the system could tolerate it. Makes for a hellova performance boost when it works right.

    Confirmed that this is completely feasible and sounds like a good idea to me.

    Bear in mind that you'll have to tune the level of parallelism you introduce & keep an eye on locks to get the best solution.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (12/2/2013)


    Jeff Moden (12/1/2013)


    I don't use SSIS but, speaking from an old dude's perspective, I'd likely make the loading of each table a separate package for reusability and then have a control package (dunno if it's possible... like I said, I don't use SSIS) that defined the order of execution. That could also allow parallel execution if you needed it and the system could tolerate it. Makes for a hellova performance boost when it works right.

    Confirmed that this is completely feasible and sounds like a good idea to me.

    Bear in mind that you'll have to tune the level of parallelism you introduce & keep an eye on locks to get the best solution.

    When I've done similar in T-SQL, I generally load to a "staging" table that looks and smells exactly the same as the original. When it's done, I simply repoint a synonym and there's virtually no contention or locking issues. Of course, I've also had the luxury of having enough disk space to do such a thing (I don't believe that disk space is as cheap as some would think but I do press for it for such things).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks guys! I really appreciate this feedback! I'll do some more studying on this. Greatly appreciated.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply