Reading Excel column headers and their data into SQL

  • I'm a relative novice with SSIS, but I'm trying to get my head around the best way to do this.

    I have a number of CSV files that always contain two rows. The first row is a header/descriptor, and the second row contains the data. I.E.

    Row1: Date,Time,MachineName,Shift,Test1Output,Test2Output

    Row2: 03/05/16,23:00,Machine1,Shift1,10,20

    I need to be able to read the data into a SQL Server where the table is simple three columns: FileName, Descriptor,Value.

    FileName = the csv filename

    Descriptor = the Row1 header names.

    Value = the Row2 Values.

    The issue is, the csv files dont always have a set number of headers/descriptors. Sometime it's 100, sometimes it's 200.

    I'm not sure on the best way to import this data, as it will need to read the first columns, then add the second columns data.

    Any tips would be great on how to approach this.

  • Maddave (5/3/2016)


    I'm a relative novice with SSIS, but I'm trying to get my head around the best way to do this.

    I have a number of CSV files that always contain two rows. The first row is a header/descriptor, and the second row contains the data. I.E.

    Row1: Date,Time,MachineName,Shift,Test1Output,Test2Output

    Row2: 03/05/16,23:00,Machine1,Shift1,10,20

    I need to be able to read the data into a SQL Server where the table is simple three columns: FileName, Descriptor,Value.

    FileName = the csv filename

    Descriptor = the Row1 header names.

    Value = the Row2 Values.

    The issue is, the csv files dont always have a set number of headers/descriptors. Sometime it's 100, sometimes it's 200.

    I'm not sure on the best way to import this data, as it will need to read the first columns, then add the second columns data.

    Any tips would be great on how to approach this.

    I suggest that you define the CSVs as having a single column. That way, the number of headers does not matter.

    I am assuming that you want to then expand this out such that you have as many rows in your table as you have descriptors.

    Use an asynchronous script component to parse the rows and output multiple rows per input row. There are plenty of examples online – do some investigation and see whether it makes sense, then post back with any questions.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Excellent. Thank you. That makes sense. I'll look at the asynchronous scripting and see if I can get it working. Thanks.

  • I asked a question, but I think I've found the solution... ignore me!

  • If anyone is interested, this is the script I wrote to read the data. This reads the input of one column, reads the first row of the data and splits out the individual elements to a string array, then reads the second row of the input and splits the elements to a second string array, then it loops through the first string array and writes both elements to a new output row. Happy to be told on better ways to do this, but this works for me at the minute.

    //declare some global variables:

    int RowNumber = 0;

    string[] FirstRow;

    string[] SecondRow;

    /// <summary>

    /// Parses the input buffer and writes the individual row elements to the output buffers.

    /// </summary>

    /// <param name="Buffer"></param>

    public override void Input0_ProcessInput(Input0Buffer Buffer)

    {

    while (Buffer.NextRow())

    {

    RowNumber = RowNumber + 1;

    Input0_ProcessInputRow(Buffer);

    }

    // Counter var used the loop through the string array

    int i = 0;

    // Loop through string array with First Row elemente

    while (i < FirstRow.Length)

    {

    // Start a new row in the output

    Output0Buffer.AddRow();

    // This is the splitted column. Take the [n] element from the array

    // and put it in the new column.

    Output0Buffer.OutputDescription = FirstRow;

    Output0Buffer.OutputValue = SecondRow;

    // Increase counter to go the next value

    i++;

    }

    if (Buffer.EndOfRowset())

    {

    Output0Buffer.SetEndOfRowset();

    }

    }

    /// <summary>

    /// Get the first and second rows and save to an array to be used later.

    /// </summary>

    /// <param name="Row">The row that is currently passing through the component</param>

    public override void Input0_ProcessInputRow(Input0Buffer Row)

    {

    if (RowNumber == 1)

    { FirstRow = Row.InputData.ToString().Split(new char[] { ',' }, StringSplitOptions.None); }

    else

    { SecondRow = Row.InputData.ToString().Split(new char[] { ',' }, StringSplitOptions.None); }

    RowNumber++;

    }

  • Looks tidy to me.

    It's satisfying how you can do so much in C# with so few lines of code.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply