December 9, 2008 at 7:58 am
My transactional database tables are designed with "GUID" as primary key. Now, I want to pull required tables into my reporting environment incrementally (datawarehouse or data mart). These tables do not have datetime stamp at all so what is the best method to load my reporting environment incrementally?
also, would it be necessary to create an "integer" type surrogate key in my reporting database?
so far we haven't decided whether the reporting environment would be in the same database as transactional database or would it be a different database in same server.
--Thanks.
December 9, 2008 at 8:48 am
It is going to depend on a lot of factors.
Since you cannot tell when a record has been inserted or updated based on the record itself, you have to look for other options. Is the GUID a NewID() or a NewSequentialID()? If it is a sequential GUID, you can tell new records based on the last record you transferred to the reporting system. So, if you are only looking for new records, you could keep track of the last one you transferred and just get anything with a "higher" GUID. If you need updated records, this will not help you.
For new and changed records, you can either use triggers or replication. Replication is nice because it reads from the log so nothing gets added to your original transaction. You have a great deal of control over the replication process, so using transactional replication may work pretty well for your situation.
If your tables are small enough, you could actually compare the records in the tables and insert new records and update existing records. For this, I would suggest you search google for "Slowly Changing Dimension" because it is exactly the same issue and there are a lot of solutions that have good and bad points.
December 9, 2008 at 9:03 am
It is newid() & its not sequential. also, we have an issue of slow performance & retrieval of record from reporting environment as the primary key are GUID. hence, would it be necessary to create an integer (surrogate key) at the reporting environment.
whether surrogate key should be created only in case of building a formal data warehouse. currently, we just have a plan to capture transaction tables that are necessary for reporting but not decided on any data mart (facts & dimensions)
also, what index would be ideal on a column with "GUID" using newid()?
--Thanks.
December 9, 2008 at 10:08 am
No index is ideal with a GUID using NewID(). You should change it to a sequential GUID - this may significantly help your performance issues as a NewID() column constantly fragments indexes.
The only way to fight this fragmentation is to switch to a newsequentialid() or to leave a fillfactor large enough to minimize fragmentation between reindexing operations and reindex pretty regularly.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply