October 23, 2013 at 9:43 am
I have .csv source file that looks like this:
Type,ColA,ColB
Test,ValueA,ValueB
Test,ValueA,ValueB
AP,ValueA,ValueB
Test,ValueA,ValueB
For every row where Type="Test", I want to ignore it if it's followed by a row where Type<>"Test". So in my example, I'd want to ignore rows 2 and 3. My plan was to create a script component where for each row in the file, it would grab the first value from the subsequent row and populate a new output column (i.e. NextRowTypeValue). The output would look something like this:
Type,ColA,ColB,NextRowTypeValue
Test,ValueA,ValueB,Test
Test,ValueA,ValueB,AP
AP,ValueA,ValueB,Test
Test,ValueA,ValueB...
I've found many script examples for capturing previous row values (i.e. http://ericwisdahl.wordpress.com/2009/08/02/ssis-retain-values-from-previous-rows/), but not for capturing subsequent row values. Any ideas? Thanks.
October 23, 2013 at 1:24 pm
I suspect that this requirement would be way easier to do loading the data into a staging table and then using T-SQL.
Maybe some more veteran SSIS users have a quick script solution but if I had to do this without a staging table I'd be applying row numbers, then sorting descending, then loading previous, then sorting ascending. And T-SQL would be way better than that.
October 23, 2013 at 3:27 pm
Nevyn (10/23/2013)
I suspect that this requirement would be way easier to do loading the data into a staging table and then using T-SQL.Maybe some more veteran SSIS users have a quick script solution but if I had to do this without a staging table I'd be applying row numbers, then sorting descending, then loading previous, then sorting ascending. And T-SQL would be way better than that.
Thanks for your reply. That makes sense, was just hoping there was a more efficient way to do it via a script component. I was able to get it working by doing as you suggested...apply row number, sort descending, etc.
October 23, 2013 at 5:29 pm
There might be a better way. Im new to script transforms.
As I said, I'd load it into a staging table and then do it via tsql. Easier and would probably run faster.
October 23, 2013 at 8:11 pm
This can definitely be done using a Script Component. It should be asynchronous, because not every input row will have a corresponding output row.
I did not reply sooner because your logic puzzled me.
For every row where Type="Test", I want to ignore it if it's followed by a row where Type<>"Test"
Applying this logic to your sample data (with added row numbers for clarity):
Input
1 Test,ValueA,ValueB
2 Test,ValueA,ValueB
3 AP,ValueA,ValueB
4 Test,ValueA,ValueB
Output
1 Test,ValueA,ValueB
3 AP,ValueA,ValueB
4 Test,ValueA,ValueB
Only row 2 is ignored, because it is the only one which matches your exclusion rule.
You have said that rows 2 and 3 will be ignored (why row 3, its type <> 'Test'?). And why does your sample output include all of the rows?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 23, 2013 at 8:39 pm
Only row 2 is ignored, because it is the only one which matches your exclusion rule.
You have said that rows 2 and 3 will be ignored (why row 3, its type <> 'Test'?). And why does your sample output include all of the rows?
You are correct, I was wrong in saying both 2 and 3 will be ignored. My sample output included all the rows because I had planned on using a conditional split on that output to exclude the appropriate rows. In other words, my sample output was not the "final output", it was just getting the data in a state where I could perform my conditional logic. I apologize if I did not word my question properly.
Anyway, I'd be interested in hearing your approach with the script component. Thanks.
October 24, 2013 at 6:27 am
Some rough pseudo-code:
rowA = readRow;
rowB = readRow;
do
if (rowA.Type != "Test" || rowB.Type != "Test")
ignore
else
output rowA;
rowA = rowB;
rowB = readRow;
until (EOF)
-Edit: fix logic error in pseudo-code & simplify
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply