Importing an excel sheet with scattered data

  • Hi All

    Not sure if I am posting the question in the right forum or shall I find some excel sheet help!

    I have an excel file with master data in the rows and child data below in in different columns.

    for e.g (claim#, insuredname is in rows, followed by the child records -----)

    Claim#1

    InsuredName1

    ----------------

    -----------------

    Claim#2

    InsuredName2

    ------------------

    ------------------

    I have to import this into a single table , Do I need to fix the excel first??

    There are thousands of claim# (records) ...

    without fixing the excel is there a way I can import it into sql-server table using ssis.

    Thanks!

    SJanki

  • For those not interested in opening the Excel file here it is:

    Just a couple WAGs, you could try importing the entire sheet into one table and then using T-SQL to send the data in the initial table into properly formed tables that represent your data.

    Or, you could process the Excel in a Script Component setup as a Data Source where you process the data line by line and push the rows onto an appropriate output per the data format/type.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Another possibility would be to write some code in Excel which adds the two 'group' columns (claim number and insured) to the table data & then throw away all non-data rows.

    I say 'some code' - it will require a fair bit - but once it's in that format you'll find it should import nicely.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi SJanki

    I do not know if you resolved your issue already, if you haven't the following package handles the problem you described, the data flow picture below drives the explanation.

    PROCEDURE

    a. Create a brand new package.

    b. Add a Data Flow Task into its Flow Control; switch to the Data Flow Task

    c. Add an Excel Source component

  • Configure its Connection Manager by adding a new one
  • (1) Select Microsoft Excel 2007 if it is available, otherwise pickup Microsoft Excel 97-2003, then

    (2) click on the browse button to locate your Excel file

    (3) un-check the option First row has columns headers; it is not visible on the picture.

    (4) Now assign the columns names as used by the application, as shown by the picture.

    d. Add a Conditional Split transform, its condition DataRow will exclude the heading and non-data rows with the following condition:

    ISNULL(Status) == FALSE && ISNULL([Pay Ind]) == FALSE && Status != "Status" && [Pay Ind] != "Pay Ind"

  • WARNING This is a hard-coded, data depending filter; if the source Excel file ever change its heading, the package will return unexpected results
  • e. Add a Derived Column transform; this transform introduces the Claim Number and Insured columns with their initial values.

  • The Claim Number column expressions is:
  • Status == "Claim Number :" ? [Pay Ind] : ""

  • The Insured column expression is:
  • Status == "Insured:" ? [Pay Ind] : ""

    f. Add an Script transform; this is the critical step, the script will assign the Claim Number and Insured values to the children; the script is written in C# and listed below (you can replace the code below with the default code for the script, as long as you are using the same variable names I used):

    public class ScriptMain : UserComponent

    {

    private string _ClaimNumber = string.Empty;

    private string _Insured = string.Empty;

    public override void Input0_ProcessInputRow(Input0Buffer Row)

    {

    /*

    Add your code here

    */

    if (Row.ClaimNumber.Trim().Length > 0)

    {

    _ClaimNumber = Row.ClaimNumber;

    }

    else

    {

    Row.ClaimNumber = _ClaimNumber;

    }

    if (Row.Insured.Trim().Length > 0)

    {

    _Insured = Row.Insured;

    }

    else

    {

    Row.Insured = _Insured;

    }

    }

    }

    g. Now add another Conditional Split transformation, this one select the real data, as we no longer need those header rows containing the Claim Number and Insured information. Its filter name is: Real Row and its condition shown below:

    Status != "Claim Number :" && Status != "Insured:"

  • WARNING this is another condition driven by data found inside the Excel file, if the user or anybody change the 'labels' (or header) in these Excel files, the package will fails to select the data properly.
  • h. Finally a Row Count transform was added with a data viewer to validate the Data Flow was parsing the Excel file as expected; you need to add a package level variable for this Row Count mine was named RowCount

    I tested with the Excel file you supplied and it seems to be doing the job.

    Please let us know if you go for this solution.

    Cheers,

    Hope this helps,
    Rock from VbCity

  • SJanki

    The solution listed on the previous post has the following dependencies:

  • Its logic is based on information found within the Excel file; the user should not replace columns headers or labels required by the package's filters on its Split Transformation.
  • The solution is using the Excel 2007 driver, allowing it to support any version of Excel files.
  • The solution can't work with password protected Excel files; this limitation is imposed by the components, nothing we could do about it.
  • The package must be run in 32bits mode; this is an Integration Services limitation, nothing we could do about it.
  • If you want to know how to handle Excel 2007 files, please refer to the following post: How to pass excel file name using a variable

    Cheers,

    Hope this helps,
    Rock from VbCity

  • Rock from VbCity (6/21/2012)


    SJanki

    The solution listed on the previous post has the following dependencies:

  • Its logic is based on information found within the Excel file; the user should not replace columns headers or labels required by the package's filters on its Split Transformation.
  • The solution is using the Excel 2007 driver, allowing it to support any version of Excel files.
  • The solution can't work with password protected Excel files; this limitation is imposed by the components, nothing we could do about it.
  • The package must be run in 32bits mode; this is an Integration Services limitation, nothing we could do about it.
  • If you want to know how to handle Excel 2007 files, please refer to the following post: How to pass excel file name using a variable

    Cheers,

    Hi,

    Thank you so much for the proposed solution. This looks good and I'll try it. For now due to limited time I requested the client to give me file in vertical format. So it helped!

    Thanks again,VBCity Rocks!

  • Viewing 6 posts - 1 through 5 (of 5 total)

    You must be logged in to reply to this topic. Login to reply