ssis package help needed

  • The requirement is to load a table (in server A, database A) with a query which involves selecting data from three different servers, three different databases and 3 different tables. The query is very simple as select, from, where, group by and order by. They are asking me to design an SSIS package for this. Daily data gets loaded. What I did is in my package, I added an execute sql task and run it as a job. Though this package gets the job done, I know it wouldn't be that simple. Any idea of overall ssis package would be really helpful.

  • If I understand your requirement, you have a server, we'll call A that receives the data. You have 3 other servers, we'll call C, D, and E that we get data from using simple queries. That about right..

    Basically that sounds like 4 connection managers, and depending on whether all three servers put their data in one table or not, 1 or 3 data-flows.

    Does this make sense? Is this what yo were looking for..

    CEWII

  • Thanks for replying. The requirement is

    INSERT INTO Server A. Database A. Table A

    SELECT col1, col2, col3....

    FROM Server B.Database B. Table B JOIN

    Server C.Database C. Table C ON ........

    WHERE DateID = @DateKey

    GROUP BY Col1, Col2

    ORDER BY Col1, Col2.

    In my package, I created 4 connection managers and only one Execute SQL Task. But, I don't think the package would be that simple. Please let me know your thoughts.

  • Also, I wanted to mention that in my query source data is coming from three different servers and the destination is only one table. If I choose data flow, then do you think I should be putting three OLE DB sources, one OLE DB command transformation and one OLE DB destination???

  • First, the the EXEC SQL task can only take actions against a single server at a time. So you can't pull data from one connection and put it in another in this task. That is what a data-flow task is for..

    Now, you clarified that there are 3 inputs that all go into a single table. Do you need to have the data from all three sources added at the same time or can there be a little lag between them. The reason is that is defines the data-flow structure. If there can be a little lag, I'd probably do three data-flows, each coming from a different source and each going to the same destination and put precedence constraints between them so they go in a row. If not then you can have 1 data-flow, three sources in the dataflow to a Union ALL in the data-flow and into a the destination.

    Does that make sense?

    CEWII

  • You understood my requirement correctly. Apart from data flow task, is the package not going to have anything??? is it going to be that simple??? Please reply.

  • I still see a couple issues..

    1. You still need to handle DateKey

    2. Does the table get cleaned before it gets filled? If so then add an exec SQL task to truncate it first.

    CEWII

  • I need to have the data from all three sources added at the same time and there shouldn't be any lag between them. You mentioned that the data flow should have three sources to a union all, but for union all the tables should have same number of columns and same data types. But the three tables involved in the query have different number of columns with different data types. Kindly let me know as how I should take it. Also, apart from data flow, is the package not going to have anything else..

  • Any idea how to handle this date key. Also, the table doesn't get cleaned before it gets filled. So, I don't have to truncate the table.

  • Ok, that makes it a bit complex..

    You'll have to pull the data needed from each source, make sure it is sorted, and make sure you set the source to indicate that. You will need to use a Merge Join or two to join them together..

    CEWII

  • pjrpjr7 (2/24/2010)


    Any idea how to handle this date key. Also, the table doesn't get cleaned before it gets filled. So, I don't have to truncate the table.

    How does date key get determined? that will determine a lot..

    CEWII

  • The data will get loaded once every week. Job that runs this SSIS package is set in such a way that the job will run once in a week and hence ssis package runs once every week and datekey will be getdate on the day the job and package runs. Let me know if you understand the req after reading this.

  • What happens if you run it twice in a row? Does it die?

    Also, is the datekey a date ONLY or a date/time. The field it is matching has a date/time? The method needs to take into account multiple run. It should be restartable..

    Listen, how do *you* think this should work? I don't want to code this for you, I am happy to help you with it. But there is way to much about the tables and structures that I can't know..

    CEWII

  • Thanks for replying. Datekey and the field it is matching is date only column (no time is being mentioned).

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply