Educational question

  • Hi all, I have been learning SSIS recently and getting some hands-on practice, I have a scenario below and my proposed resolution, but I am a little stuck on how to finish the package:

    1) Load data from a csv file into 3 tables - Cust, Addr, Total_Spend.

    2) Split data accordingly, ensuring common fields for ease of reference to the data from the Customer table.

    3) Autogenerate Customer ID's as CustID for the Customer table, AddressID for Address table etc.;

    4) Cleanse data;

    5) Output cleansed data into 3 csv files in accordance with table names i.e. Customer.csv, etc along with column titles

    My solution (so far)

    In the Control Flow tab:

    Bulk Insert Task - this copies everything from a text file into a new staging table;

    Execute SQL - On success of above, this performs data cleansing (I've yet to add code to it)

    In the Data Flow tab:

    Ole DB Source - Connects to the newly cleansed data;

    Multicast - On success of above, splits out the appropriate columns to three destination tables (which already exist)

    SQL Server Destination x 3 -

    However, the latter doesn't allow me then pipe out to the necessary csv files, nor does the OLE DB destination. My questions are:

    1) Do I need to create a 2nd Data Flow task further downstream in the Control tab to output to csv?

    2) I found when creating the Multicast that it creates the destination tables, my table design itself needs a refresher course, do I need to manually alter the tables to add a PK/ID column that autogenerates an ID, or is there a way to achieve this within SSIS?

    Many thanks,

    Jake.

  • 1) Yes. Although I wouldn't use the SQL Server Destination. The performance gain is neglible and it has certains downsides: it only works if the package runs on the same server as the destination, which makes your entire solution less portable.

    2) It's not the multicast that generates the tables, it's normally the SQL Server or OLE DB Destination. SSIS will generate destination tables for you based on it's metadata in the dataflow. But it only has columnnames and datatypes. It's up to you to add nullability, constraints, indexes et ceterea. I normally create the tables in advance using SSMS instead of using SSIS.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (2/27/2012)


    1) Yes. Although I wouldn't use the SQL Server Destination. The performance gain is neglible and it has certains downsides: it only works if the package runs on the same server as the destination, which makes your entire solution less portable.

    2) It's not the multicast that generates the tables, it's normally the SQL Server or OLE DB Destination. SSIS will generate destination tables for you based on it's metadata in the dataflow. But it only has columnnames and datatypes. It's up to you to add nullability, constraints, indexes et ceterea. I normally create the tables in advance using SSMS instead of using SSIS.

    Thanks for the replies Koen,

    1) Is that to say you use the OLEDB destination wherever possible in place of the SQL Server destination?

    2) Like I said above, my table design could use a refresh, any good articles for 2K8?

  • Yes, use the OLE DB Destination with the Fast Load option in favor of the SQL Server Destination.

    What do you mean with "any good articles for 2K"?

    Do you mean for SSIS, database design, data warehouse design, ...?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (2/28/2012)


    Yes, use the OLE DB Destination with the Fast Load option in favor of the SQL Server Destination.

    What do you mean with "any good articles for 2K"?

    Do you mean for SSIS, database design, data warehouse design, ...?

    Oh just for the very elementary stuff like datatypes, PK/FK constraints etc, as a production DBA I had very little exposure to designing new tables but clearly this will be an issue in SSIS.

  • Also of interest is that I had to go back to the Control Flow tab to finish the flow of data out to csv, I think someone at Redmond has an interesting sense of humour...

    😀

  • Jake Shelton (2/28/2012)


    Koen Verbeeck (2/28/2012)


    Yes, use the OLE DB Destination with the Fast Load option in favor of the SQL Server Destination.

    What do you mean with "any good articles for 2K"?

    Do you mean for SSIS, database design, data warehouse design, ...?

    Oh just for the very elementary stuff like datatypes, PK/FK constraints etc, as a production DBA I had very little exposure to designing new tables but clearly this will be an issue in SSIS.

    If you're going to design data warehouses you should read the Ralph Kimball's Data Warehouse Toolkit. For the rest, a good Google search should provide you with plenty of material, just don't don't everything out there immediately as a "best practice". 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply