Fill in the gaps - ideas needed, help appreciated.

  • First, to outline the issue:

    My ETL/SSIS package is provided a file (1 of 7) on a weekly basis. The file consists of 9 columns, 2 are used to identify a unique record, while the other 7 are information. The information is frequently sparse and incomplete. The requirement is to fill in the holes with a source table. A vendor manually keyed a table with much of the missing data that can be used to fill in the holes, provided the data is there. Another issue with the source data is that sometimes it's wrong. When I analyzed the data, I found that of the 7 data columns, Col1, Col2, Col3 were most often populated. 2-5% of the time, one of them is missing. Col1 Seems to be the most Important based on my knowledge of the data. Look at it like an address. Col1 could be Milkway, Col2 could be ClassMStar, Col3 = Earth, Col4 US, etc. etc. Each column is a narrow point on the data.

    My source table is as such:

    COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 |

    AB&M RR | B4&5 | 88 | 5 | | | 4798 |

    IGN RR CO | A1 | 3 | | 497 | | |

    It was a pain to make that look pretty-ish :cool:

    The keyed table will have all 7 values and I need to fill in the blanks. Also, for columns 4-7, there's no telling which ones will be missing. It could be just 1, it could be all. The really fun part comes when I was advised that those first 3 columns could change from vendor to vendor....

    Please ask questions, I'm open to suggestions. I have a working version based on a stored proc. I Join the 2 tables and use a Case Statement on the other side... Like thus:

    From Table1 L right Join Table2 C

    ON L.Col1 = C.Col1

    and L.Col2 = C.Col2

    and L.Col3 = C.Col3

    AND C.Col4 = CASE WHEN ISNULL(L.Col4,'') = ''

    THEN C.Col4

    ELSE L.Col4

    END

    AND C.Col5 = CASE WHEN ISNULL(L.Col5,'') = ''

    THEN C.Col5

    ELSE L.Col5

    END

    AND C.Col6 = CASE WHEN ISNULL(L.Col6,'') = ''

    THEN C.Col6

    ELSE L.Col6

    END

    AND C.Col7 = CASE WHEN ISNULL(L.Col7,'') = ''

    THEN C.Col7

    ELSE L.Col7

    END

    My goal is to find ideas for a better way to do this.

    Again, any suggestions and questions are welcome.

    Thanks in advance.

    Crusty.

  • If two columns are used to uniquely ID the record, why are you appearing to join on the 7 columns that are NOT used to do that? Or am I just mis-interpreting what you meant?

    Also, better than what, exactly? Is there a particular problem that exists with what you are doing now? How does it perform? How well does it work ?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • The 2 columns that uniquely ID a record are from a different table (the source of the trouble) but do not relate to the other information. The "other" information relates to it... I'm being cryptic on purpose because the nature of what I'm doing is a bit sensitive.

    Think of the first 2 columns as

    CAR Model

    Ford Mustang

    Their is so much more information. The 7 columns in question are the same columns in both tables, so I guess the other two should be ignored. The code sample I showed is an example of the joins. I would prefer to do this as a Table Valued function... Today was too crazy for me to work on it too much tho...LOL

  • Suggestion, post the DDL for the tables, sample data for the tables, expected results based on the sample data provided.

  • I think I might understand the kind of alternative you might be looking for, but it's very much a guess given the lack of details. How about something along the lines of up to 7 parallel LEFT JOINs to TABLE2 from TABLE1, with the unique identifiers plus just one column. That way, each join is designed to bring in just one column's "fill in the gaps" value. Not sure if that helps or hurts, but it IS an alternative.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply