February 25, 2010 at 1:59 am
Hi Magarity,
Absolutely the way to go once the process is stable and populated on an incremental basis.
Tell me, I gather from the discussions on the subject that there is only a 1 in 2^40 chance of a duplicate MD5 checksum (as opposed to the SQL Server CHECKSUM function) - have you ever experienced problems with checksums?
Thanks,
Adam
February 25, 2010 at 2:03 am
For Dbowlin:
There is a good MD5 checksum approach here:
February 25, 2010 at 2:25 am
Hi,
thanks for the article and example.
what I don't understand and this may be as you don't have complete control of the ETL or you're restricted to on the design, is why do you truncate the Dimensions? could you not just relate this to a Business Key and do appropriate SCD 1 / SCD 2 transformations?
Even with the Fact table, you could just insert for new records and update for existing records.
Regards,
Amrit
February 25, 2010 at 2:57 am
Hi Amrit,
This is explained in my reply to Magnus, above - the design is in a considerable state of flux, and we wnated complete teardown on every process run, to guarantee coherence and to avoid having "old" data cluttering up the DW.
Sorry that this wasn't clear in the article.
Otherwise yes - standard techniques could (and probably will) be used.
Regards,
Adam
February 25, 2010 at 2:58 am
Ok, great!
Sorry if I misunderstood.
February 25, 2010 at 12:16 pm
No, I've never had a problem with using md5. I'm inspired to write my own article on how to use md5 in an update/insert strategy for loading data warehouses if I can get Steve the admin to accept it.
February 25, 2010 at 12:48 pm
magarity kerns (2/25/2010)
No, I've never had a problem with using md5. I'm inspired to write my own article on how to use md5 in an update/insert strategy for loading data warehouses if I can get Steve the admin to accept it.
By all means, please write away. We currently truncate and replace, but would be extremely interested in a more incremental strategy. For example, my understanding is BIDS Slowly Changing Dimension transformation is a performance hound for large tables.
February 25, 2010 at 11:02 pm
We don't need no stinkin' changing dimension widgets... Took me a while with the article-submitting editor, but I got my treatise on using MD5 submitted. We'll see how fast it gets rejected.
February 26, 2010 at 1:51 am
Fabulous! I look forward to reading it!
Adam
December 17, 2010 at 12:08 am
Thanks for the article.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
December 17, 2010 at 1:46 am
Its a good article.
However, did you experience any problems with datatypes being incorrectly assigned using the SELECT..INTO.. method, I notice you mention that nulls are handled in the consolidation layer so it may mitigate the issue.
I'm also not sure how this can be adapted to an incremental DW unless you are not concerned about maintaining the history, as by dropping the fact table you loose the ability to maintain the historic Fact record, and thus you surely defeat the object one of the objectives of an incremental, which is to allow you to perform point in time analysis, which can be quite critical for DW's in the Finance industry.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
December 17, 2010 at 2:33 am
INSERT INTO <table> WITH (TABLOCK)
will minimally log in the same way SELECT .... INTO
will by default.
December 17, 2010 at 3:25 am
For example, my understanding is BIDS Slowly Changing Dimension transformation is a performance hound for large tables.
FAO tskelly:
If your finding the standard SCD component of SQL slow, have a look at Todd McDermitts Kimball SCD component.
I've switched to using his Kimball SCD for a number of reasons. First of all it's faster. As mentioned in the details, it doesn't destroy the dataflow when you need to make changes. Also, it provide a wealth of auditing information. Well worth a look!
You can find it here: http://kimballscd.codeplex.com/
_____________________________________________________________________________MCITP: Business Intelligence Developer (2005)
December 17, 2010 at 3:28 am
Thank you Adam.
At first it is almost necessary to have control of the the whole "ETL" process. Missing data or
bad quality is major problem in many cases. You can seldom avoid outer joins. I use ISNULL(expression,'MISSING') and have a default dimension member MISSING in most of the dimensions. Your approach is also usefull in "non OLAP" situations like with the new feature PowerPivot.
We do have dimension tables and one or more fact tables to import. PowerPivot then creates the "CUBE" by it self.
Gosta M
December 17, 2010 at 3:57 am
dave-dj (12/17/2010)
For example, my understanding is BIDS Slowly Changing Dimension transformation is a performance hound for large tables.
FAO tskelly:
If your finding the standard SCD component of SQL slow, have a look at Todd McDermitts Kimball SCD component.
I've switched to using his Kimball SCD for a number of reasons. First of all it's faster. As mentioned in the details, it doesn't destroy the dataflow when you need to make changes. Also, it provide a wealth of auditing information. Well worth a look!
You can find it here: http://kimballscd.codeplex.com/%5B/quote%5D
We use Todds component as well, it has a few performance issues with large datasets, we did some alpha testing the v1.6 of the component and that seems to have fixed some of the perfomance issues, unfortuantely we're just waiting for the final release.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
Viewing 15 posts - 16 through 30 (of 38 total)
You must be logged in to reply to this topic. Login to reply