September 9, 2008 at 5:57 am
Hi,
I am trying to load the updates/Inserts to my Fact table. For that I am using destination table as my lookup table. I am not able to cache(Full mode) the entire fact table because of my fact table contains 2 Billion records.
can any one suggest me how to over come this situation?
Hareesh
September 9, 2008 at 6:24 am
First, a fact table should not be updated, it should only get inserts or you will not be able to incrementally process your measure group. Typically, a fact table would get reversing records for records that need an "update" and then new records.
As far as finding the record that need to be reversed, you should probably do this in T-SQL by putting the records you want to compare into a staging table and then joining to your fact table. Doing this operation in SSIS is going to be pretty slow. SSIS is all row-by-row processing. It is fast, and certainly better at RBAR than the SQL database engine, but it is still far less efficient than a set-based operation will be.
September 9, 2008 at 6:41 am
Thank you for your valuble inputs.
Lets forget about wether it is Fact or Dimension. My question is how to handle lookup transfermation to cache 2 billion records. Do I need to change the memory settings? If so, Can you please give me the detaild approch of changing the memory settings.
Hareesh
September 9, 2008 at 6:53 am
To cache that much data, you will need enough physical memory to hold all of it - or at least enough physical and virtual memory. It is just going to be horribly inefficient. If you enable memory restriction, you can get it to individually query for each row or do a partial cache, but in full cache mode the package is probably just going to sit in the pre-execute phase for a very long time.
You may find better performance if you use a data flow source and a merge join to do your lookup rather than a lookup component. You would need to have everything sorted, but it will probably perform better. Either way it is probably not going to perform well enough to actually use. You probably have to use a staging table and do this in T-SQL.
September 9, 2008 at 11:33 am
You cannot cache a 2 billion records table. If you are trying to do so then there will be a buffer swapping issue and the package execution will slow down.
if you still don't have any other way then use a none cache. Using a none cache will not buffer but will fire the query on the table and get the records. Its better to use a none cache and check if there is index on the fields based on which you lookup.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply