Best strategy for ETL processes - 1 ODS and many OLTP instances.

  • Hi there

    I'm new to SQL Server 2005 and trying to get to grips with Integration Services in parallel. Your input to configuring the best strategy for one of our clients would be greatfully received.

    We will be creating an OLTP database per country for a client who have offices around the globe. These will feed into a central Operational Data Store nightly. The OLTP/ODS structures will be almost the same except for the usual ODS differences, dates, country_ids, effective_to, effective_from etc. From here the data will be extracted to an OLAP database.

    Proposed solution (ignoring the OLAP stuff for now):

    - Central ODS database pulls the data from each data source instead of all data sources pushing the data. Centralised management should anything go wrong and less maintenance for upgrades etc.

    - From what I have gathered so far in SSIS, a package can't contain more than 1 data source. If correct, have one ETL package (on the ODS server) for each OLTP data source.

    - A single job running on the ODS Server calling each package in turn.

    Is that about right?

    Regarding the above - is it possible to have session variables available globally, i.e. to consider the ETL process as a single process and to share data, e.g. the datetime the ETL process starts will be identical for each package being called?

    Would appreciate your input here, if this strategy isn't the best or if there's anything i have missed off etc.

    Many thanks in advance! Cheers, Jon

  • jonathan.strauss (11/20/2008)


    Hi there

    - From what I have gathered so far in SSIS, a package can't contain more than 1 data source. If correct, have one ETL package (on the ODS server) for each OLTP data source.

    - A single job running on the ODS Server calling each package in turn.

    Is that about right?

    Regarding the above - is it possible to have session variables available globally, i.e. to consider the ETL process as a single process and to share data, e.g. the datetime the ETL process starts will be identical for each package being called?

    Would appreciate your input here, if this strategy isn't the best or if there's anything i have missed off etc.

    Many thanks in advance! Cheers, Jon

    Your strategy seems sound to me.

    A package can contain multiple data sources and destinations, you just need to define the connection managers for each one. I use a hierarchical parent/child package structure, with one master "conveyor" package which sets the connection string properties for the child package connection managers via package-scope variables which are passed to the various child packages through configuration settings. The parent package also controls the execution of the child packages, enforcing dependencies, and making logging easier.

    If you want to centralize your process, a hierarchical approach will facilitate the type of variable sharing that you are looking for.

  • Many thanks SSC Journeyman...

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply