May 2, 2010 at 12:56 am
Hi All,
I have question on which one would be the best methods for Loading into DW.
Method:1
Truncate the Entire Target Table and Load it from scratch???
Method:2
Going for Incremental Loads.
What are the Different factors we consider the best loading method???
Thanks,
May 3, 2010 at 7:19 am
This answer to this question is, like the answer to so many questions, is it depends. At least in part. Truncating and reloading can be used to get a prototype up and running very quickly. Not having to worry about handling edits or where to start the download are significant. This only works if there aren't so many records so that the download is possible. Sometimes you simply have to brute force a table because there's not enough information in the source database(s) to identify which records have been added or changed.
Ideally, however, it is better to add or alter only those records that have been added or changed. This allows the download to occur in the shortest time, and it can even make it possible to do mid-day updates. This is especially so for tables that are true transactions; once added they are rarely if ever altered. The development time for this is much more extensive, however. Often the person who works on this is not the same person who works on the datawarehouse itself.
At least, that's my two cents.
May 3, 2010 at 7:40 am
May 3, 2010 at 7:59 am
I think Method 1 only works if you can all your history from the source. If not, this method is flawed as you will lose some of your history each time.
If you can get the history each time or it doesn't matter, then it depends on the number of rows I guess.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
May 3, 2010 at 9:14 am
I think that it depends on some requirements and the volume of data you have, but I will say that you should try method 1 at first. You won't be able to do this for every table or every situation, but you may be surprised how fast SSIS and SQL Server 2008 are.
For example, I recently was working on a project involving populating a Sales related data warehouse, and I was surprised to find how fast I could move data throughout the process. At first, I was working with a source database that was not very large and I was merely trying to prototype the data warehouse so that I could show some users. After some review and fixing things, I decided to try out the populating of the data warehouse on a much larger source database. I was incredibly surprised at how fast data moved from source to destination.
I work for a company that sells a retail management product to small to medium sized retail companies. So, I was able to easily switch out a smaller database for a much larger one.
Method 1 works well for this Sales related data, but not so well for inventory related data. We will be performing incremental loads on this data.
- Andy
May 4, 2010 at 8:02 am
I assume you DW is all Type 1 changes if you are going to do a truncate/reload each night. Type 2 changes would require the DW tables not be truncated so history of change can be maintained.
May 5, 2010 at 1:02 pm
masheh
You don't tell us anything about the source. I have one case where the source
delivers a bunch of text files to be loaded inte the DW. One file contains invoicelines
many millions a year. It is not possible to generate and transfer all those lines every day so the update is incremental (per day). This is good as long you can be sure that
there will be no overlap from the source and all days will nicely be delivered and loaded. Be carefull. In my case the source also creats a "secure" file which contains the number of lines to be transfered as a value. After the update of DW in a "stagetable" the number of records in the stagetable is compared to the number of records in the "securefile". If not equal an alarm E-mail is sent and the target table will not be
uploaded.
//Gosta
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply