February 28, 2008 at 8:42 am
I'm still working through what you are trying to accomplish, but to give you an idea of how I am approaching this I thought I should make a quick post.
I am starting for the (bad) assumption that the data is perfect. This is allowing me to map the flow of the data from the Contract table to the appropriate DW tables. I didn't have has much time last night to work on it, so didn't get too far yet.
Hopefully I can have something for you later today or tomorrow.
😎
February 28, 2008 at 8:46 am
I also have to agree with Matt. But back to my last post, getting an idea of the data flow with perfect data should also help identify how things need to flow. Then you can start looking at adding flows for when (and how) the data can go bad.
😎
February 29, 2008 at 4:53 am
Matt,
You actually have a separate DB to do the scrubbing?
Hmmm. I didn't think about that idea. Originally I was going to use staging tables in the DW, but then I had to redesign the whole DW and completely forgot about the scrubbing part.
So who on your side is responsible for resolving the data problems that require human intervention? IT or the business unit?
EDIT: And how did you ever manage to resolve the father/son team living at the same address? I'm really curious on that one.
February 29, 2008 at 6:21 am
Brandie,
Still working on the basic flow. Regarding the data scriub, once you have done everything possible programmatically, questionable and bad matches should go to an error report and back to the users for correction. They are the ones who know the data well, and all corrections should be made in the source system so that they flow properly after that.
😎
February 29, 2008 at 7:04 am
Brandie Tarvin (2/29/2008)
Matt,You actually have a separate DB to do the scrubbing?
Hmmm. I didn't think about that idea. Originally I was going to use staging tables in the DW, but then I had to redesign the whole DW and completely forgot about the scrubbing part.
So who on your side is responsible for resolving the data problems that require human intervention? IT or the business unit?
EDIT: And how did you ever manage to resolve the father/son team living at the same address? I'm really curious on that one.
It is a shared solution. The database keeps track of the decisions as to "who is who". We take in and record the "key elements" involved in the decision, a table essentially tracking the "individuals" as defined by our mainframe and external systems, a table of the de-duplicated "people" as best we had determined them, and a mapping table of the "old ID's to new ID's".
So - we never got rid of the old ID's from the external systems - we kept them to track what we had previously decided.
The effort was shared - a lot of the heavy lifting of doing the research was done at the business unit(s), since they have the access to the people being tracked and the paper records. There were a lot of issues (like - "splitting" people they had thought were one) they couldn't fix for themselves, so a fair amount of issues would come back to our shop for fixing.
The 2 doctors (who on top of everything else, had VERY similar SSN's...!) got picked up by our filter, and initially were "merged". After one of them brought up the issue - they were manually split back apart, using an override process we established at that very moment (a few columns essentially telling the processing system "don't ever touch these rows again"). We also ended up with a table called EvilTwins, which essentially tracked the "summary" of those bad matches (so that we could use it outside of this scrubbing process). Something to the effect of "this ID is not and never will be the same person as THIS ID...".
As of now - there's a new business unit more or less 100% dedicated to keeping this all straight (it's for a healthcare system, so it was really important. 38 Million distinct people (historically) with information on 40 systems (which don't talk to each other 100%) visiting 400 locations each doing their own data entry, well - you get a fair amount of duplicates/mismatches).
This may be sad to say - but this is actually one of the larger databases in that organization. Much larger in fact than a fair amount of the source systems it takes info from.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 29, 2008 at 5:01 pm
You may want to look at Jamie Thomson's blog entry on loading related tables at http://blogs.conchango.com/jamiethomson/archive/2006/02/17/SSIS_3A00_-How-to-load-related-tables.aspx [/b]"> http://blogs.conchango.com/jamiethomson/archive/2006/02/17/SSIS_3A00_-How-to-load-related-tables.aspx .
You may have to register to access this site. He offers a couple of alternatives to the "how do I know the foreign key" problem.
Viewing 6 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply