SSIS Solution pls

  • Hi,

    I have records like below

    VendorId,CustId,Names,AddressLine1,Town,PostCode,County

    1,1,Mr Xyx,NULL,NULL,NULL,NULL

    2,1,Mr Xyx, London Road,NULL,NULL,NULL

    3,1,Mr Xyx,NULL,London,NULL,NULL

    4,1,Mr Xyx,NULL,NULL,EC12 8ER,NULL

    5,1,Mr Xyx,NULL,NULL,NULL,UK

    Need to get out like below.

    1,1,Mr Xyx,London Road,London,EC12 8ER,UK

    The SSIS package needs look into vendor hierarchy and pick the highest vendor heirarchy record as source then keep looking down(lookup) into hierarchy and pick valid(non null) fields to complete record.It should only pick missing fields from down the vendor hierarchy records.

    Thanks

    Nick

  • You really don't want to do this in SSIS. I hate saying that but you will not be well served by the number of multicasts and loops you'll be forced to do.

    Dump this to a staging table, and do it in three passes. First pass determines core record. Second pass determines highest priority/level vendor with a non-null in each field. Third pass connects itself up to the necessary items and dumps those to either a second, scrubbed data staging table or directly to the primary table.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (4/21/2011)


    You really don't want to do this in SSIS.

    ---

    Actually, I think there is quite a nice way for SSIS to handle this - using an asynchronous script component.

    The script would process the rows for a particular vendor ID, assigning the non-NULL columns as they are found to the relevant Output columns and then committing the record on change of vendor ID.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (4/22/2011)


    Craig Farrell (4/21/2011)


    You really don't want to do this in SSIS.

    ---

    Actually, I think there is quite a nice way for SSIS to handle this - using an asynchronous script component.

    The script would process the rows for a particular vendor ID, assigning the non-NULL columns as they are found to the relevant Output columns and then committing the record on change of vendor ID.

    Hm, you'd have to force a sort but that's not horrendous. I stand corrected. Apparently I need to look into the asynchronous script component. Not sure I've ever used that.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Phil Parkin (4/22/2011)


    Craig Farrell (4/21/2011)


    You really don't want to do this in SSIS.

    ---

    Actually, I think there is quite a nice way for SSIS to handle this - using an asynchronous script component.

    The script would process the rows for a particular vendor ID, assigning the non-NULL columns as they are found to the relevant Output columns and then committing the record on change of vendor ID.

    Phil, I went looking for this, and I can't seem to find a way to get my script component to act asynchronously as a trap and release or anything built in that will do it. I'm assuming this is my lack of familiarity. Do you know of any articles that could walk me through what you're recommending here?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks Craig.

    I was basing my recommendation on this article which, now that I look at it, is not quite what is required here.

    The sample code in the article creates 3 rows of data, with output schema different from input schema, for every input row.

    We want the converse - multiple input rows to a single output row.

    But it seems that it should be possible without too much difficulty, using a combination of appropriately scoped variables and conditional use of the OutputBuffer.AddRow() method on transition from one ID to another. What do you think?

    Phil

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (4/26/2011)


    But it seems that it should be possible without too much difficulty, using a combination of appropriately scoped variables and conditional use of the OutputBuffer.AddRow() method on transition from one ID to another. What do you think?

    I think what you're recommending might work, passing the package variable from row to row as it comes in through the stream. You'd definately need a pre-sort of some kind and you'd have to watch for releasing a blank row through the structure on first row, when the variable switches from 0 to the first ID. But in this case the stream component wouldn't be the roadblock, it would be the sort.

    One of my larger concerns here is the prioritization of the suborder within an ID. Unless that's pre-built into the file, this becomes a painful endeavour of conditional logic, and if they do build the logic into the package, you then have to deal with conflicts at the same priority level, kind of like arguing with merge replication... another thing I avoid like the plague unless absolutely necessary.

    All of this is feasible, but we're skipping from the realm of data cleansing into programming if you keep following that rabbit down the hole. I've seen similar output from some really nasty output software that did relatively the equivalent, basically XML with no tags, so it went ID name <name> (CR) ID address1 <address1> (CR), etc. In the end I went to T-SQL. Row construction was just painful and not as easy to roll through and test updates as it was from a staging table into a series of selects to confirm ordering and the like. It also allowed for unanticipated items being introduced not affecting the script because the rows were simply skipped, and duplication to be errored out and manually inspected.

    This seems like a similar issue, just the file's format is filled with nulls, which is almost worse. If three rows have data in the same column you can't be sure which takes priority unless it's explicitly declared by the originating file. The VendorID bit might work for right now, but what happens when priorities mix and you add more important vendors later? T-SQL would allow an easy join to a vendor's hierarchy table. Recoding the SSIS package would take a lot more upkeep.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • One of my larger concerns here is the prioritization of the suborder within an ID. Unless that's pre-built into the file, this becomes a painful endeavour of conditional logic, and if they do build the logic into the package, you then have to deal with conflicts at the same priority level, kind of like arguing with merge replication... another thing I avoid like the plague unless absolutely necessary.

    Thanks Craig, a considered response and an interesting problem.

    You are looking at the bigger picture whereas I was solving this particular problem.

    I made the assumption that the file was already sorted by VendorID within CustomerID - based on the sample data provided. If that assumption is correct, no sort is required and the row-by-row logic is not that tough, in dodgy pseudo-code:

    Foreach Field in Row

    If not Field.IsNull Then

    savedField(i) = Field.Value

    End If

    Next

    This works because the rows are in ascending priority order.

    But I agree that seemingly simple changes to requirements would blow this solution away.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (4/26/2011)


    You are looking at the bigger picture whereas I was solving this particular problem.

    I made the assumption that the file was already sorted by VendorID within CustomerID - based on the sample data provided. If that assumption is correct, no sort is required and the row-by-row logic is not that tough, in dodgy pseudo-code:

    Yeah, now that I've thought it through, you're right, it would be simple for that mechanic. I just don't trust the source file. Bad habit of mine. 🙂

    But I agree that seemingly simple changes to requirements would blow this solution away.

    That's where I don't feel this ends up as the best solution. Things like this end up so mobile in requirements, I prefer a more versatile testing environment that will allow for more user customization rather then schema updates and the like. I'll have to think more about applying this discussion to another issue I'm having though, it seems like an interesting way to clean up distinct issues without doing stream blocking if you set the datasource to ordered and use an orderby to pull the data from the source.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks a lot all,

    After going through your discussion I came to know that I have 2 options which are tsql and SSIS script component.I will explain both options to client and I feel they will with tsql 🙂 where I am very confident to get done easily.

    Thanks onace again.

    Nick

  • Thanks for the feedback, Nick. Appreciated and glad your found what you needed. Don't mind Phil and I if we abduct your thread for further testing, review, and general abuse of the concept. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply