Appending values from second row to first row

  • I have .csv source file that looks like this:

    Type,ColA,ColB

    Test,ValueA,ValueB

    Test,ValueA,ValueB

    AP,ValueA,ValueB

    Test,ValueA,ValueB

    For every row where Type="Test", I want to ignore it if it's followed by a row where Type<>"Test". So in my example, I'd want to ignore rows 2 and 3. My plan was to create a script component where for each row in the file, it would grab the first value from the subsequent row and populate a new output column (i.e. NextRowTypeValue). The output would look something like this:

    Type,ColA,ColB,NextRowTypeValue

    Test,ValueA,ValueB,Test

    Test,ValueA,ValueB,AP

    AP,ValueA,ValueB,Test

    Test,ValueA,ValueB...

    I've found many script examples for capturing previous row values (i.e. http://ericwisdahl.wordpress.com/2009/08/02/ssis-retain-values-from-previous-rows/), but not for capturing subsequent row values. Any ideas? Thanks.

  • I suspect that this requirement would be way easier to do loading the data into a staging table and then using T-SQL.

    Maybe some more veteran SSIS users have a quick script solution but if I had to do this without a staging table I'd be applying row numbers, then sorting descending, then loading previous, then sorting ascending. And T-SQL would be way better than that.

  • Nevyn (10/23/2013)


    I suspect that this requirement would be way easier to do loading the data into a staging table and then using T-SQL.

    Maybe some more veteran SSIS users have a quick script solution but if I had to do this without a staging table I'd be applying row numbers, then sorting descending, then loading previous, then sorting ascending. And T-SQL would be way better than that.

    Thanks for your reply. That makes sense, was just hoping there was a more efficient way to do it via a script component. I was able to get it working by doing as you suggested...apply row number, sort descending, etc.

  • There might be a better way. Im new to script transforms.

    As I said, I'd load it into a staging table and then do it via tsql. Easier and would probably run faster.

  • This can definitely be done using a Script Component. It should be asynchronous, because not every input row will have a corresponding output row.

    I did not reply sooner because your logic puzzled me.

    For every row where Type="Test", I want to ignore it if it's followed by a row where Type<>"Test"

    Applying this logic to your sample data (with added row numbers for clarity):

    Input

    1 Test,ValueA,ValueB

    2 Test,ValueA,ValueB

    3 AP,ValueA,ValueB

    4 Test,ValueA,ValueB

    Output

    1 Test,ValueA,ValueB

    3 AP,ValueA,ValueB

    4 Test,ValueA,ValueB

    Only row 2 is ignored, because it is the only one which matches your exclusion rule.

    You have said that rows 2 and 3 will be ignored (why row 3, its type <> 'Test'?). And why does your sample output include all of the rows?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Only row 2 is ignored, because it is the only one which matches your exclusion rule.

    You have said that rows 2 and 3 will be ignored (why row 3, its type <> 'Test'?). And why does your sample output include all of the rows?

    You are correct, I was wrong in saying both 2 and 3 will be ignored. My sample output included all the rows because I had planned on using a conditional split on that output to exclude the appropriate rows. In other words, my sample output was not the "final output", it was just getting the data in a state where I could perform my conditional logic. I apologize if I did not word my question properly.

    Anyway, I'd be interested in hearing your approach with the script component. Thanks.

  • Some rough pseudo-code:

    rowA = readRow;

    rowB = readRow;

    do

    if (rowA.Type != "Test" || rowB.Type != "Test")

    ignore

    else

    output rowA;

    rowA = rowB;

    rowB = readRow;

    until (EOF)

    -Edit: fix logic error in pseudo-code & simplify

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply