January 19, 2005 at 9:36 am
My colleague and I have been working on a project to load the data from similar datasets in multiple systems into one table in a data warehouse. I'm going to detail the approach we're currently favouring and ask you lot for your take on it.
A bit of background:
We work for a hospital trust (which is a collection of 1 or more hospital sites, in our case there are 2 sites) and each site has its own Patient Administration System (PAS). As part of a national IT project, trust systems are being replaced with a new standard PAS. We have our own data warehouse that we keep well fed from both existing PAS systems. The challenge is that we will effectively have to load 3 sets of data (one set from each PAS) into 1 corresponding table in the data warehouse, e.g. 3 Hospital Referral tables will go into a Trust level Referral table.
We want the Trust table to have an int field that refers back to the PAS table record. However, the problem we have is that the primary keys on the old systems are almost all varchars. So, we've come up with a way of generating new unique IDs:
At the PAS level we have 2 unique IDs per table: one is the Primary Key of the table (which is a varchar in the old systems and an int in the new) and the other is a unique row ID (which is an int on all three). We have to create the unique row ID. To do this, we will have another table (PKGenerator) that has two fields: PASTableKeyID, UniqueRowID. UniqueRowID will simply be an identity field. So, for every record in the PAS table we insert a row into PKGenerator to get a unique int.
The Trust level table has two fields on it that are used to point back to the PAS level record that created the Trust record: SourceSystemID, SourceSystemKeyID. In this case, SourceSystemID relates to a PAS system, and SourceSystemKeyID is the UniqueRowID from the step above.
As far as we can see this gives us int PKs/FKs across the board. The alternative is to consider using the existing varchar keys (which can be anything from 8 to 26 characters long) and we think that is an evil thing to do and it makes us cry.
I'd really appreciate your thoughts/opinions on this approach. Anyone done anything similar?
Thanks 🙂
January 19, 2005 at 10:03 am
IMHO, I would just use a new int PK. As long as the original PAS keys are retained in the new table, or put into a cross reference or lookup table, then referential integrity is still maintained. All you really need to do is ensure that unique records are added to the warehouse and the ability to look up existing PAS records at the specific location is retained.
I am sure there is MUCH more to this than detailed here, but remember: KISS!
January 20, 2005 at 1:53 am
When you talk about verifiable and validated what do you mean in this context? Are you saying that ints cannot easily be used to identify a record just by looking at the number? e.g. some of the varchar keys we have are readable (being made up of date, time, name etc), whereas an int is totally unreadable.
I'm not worried about definitive. I am just interested in hearing opinions on the proposal. For example, what I just said about record identifiability was something I hadn't considered until I read your post 🙂
January 20, 2005 at 8:10 am
Right, I see what you mean.
In terms of validation all we need is a unique integer. There is no range that it has to fit into, except the limitations imposed by the datatype.
The verification is potentially more problematic, but I think that the example you give is similar to the new national approach to a patient ID. If so, that will be handled at national level and won't be something I need to worry about.
I may run into problems maintaining the new key(s) across datasets and your last point highlights this very well. That is something I need to take into account, because if the keys I create for a Referral and an Outpatient appointment don't match up - well, all hell will break loose.
Thanks for your input, it's been very useful 🙂
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply