February 15, 2019 at 2:41 am
Hello everyone,
now imagine the following situation: You have a DWH Environment, do the usual ETL process and in the end you have a large fact table with currently roughly 100.000.000 rows. Now this particular Table has some unused non-clustered indexes (which I ofcourse do want to get rid of but, management approval to remove 35 GB unused indexes is mostly hindered by being impressed by this number, that is my current feeling when I talk to the decision maker about this). This table is being truncated by an SSIS Job every night before new data is being added to this table.
What exactly happens to the data inside these NCX? Is this data being truncated right away or is this data removed once insertion of new data starts?
I come to ask myself for the following reason: We did some changes to one Dimension table which did hold a detail as TEXT Data Type. Now the tables have been changed everywhere to store this detail in a varchar(255) field. In the Fact table we do not directly insert the detail but rather we use a foreign key to reference to the detail. Changing the dimension in this case meant we had to drop the fact table other wise we could not drop and re-create the dimension table. Could this explain a much faster insert process to the fact table?
Cheers
February 15, 2019 at 5:28 am
My working theory is by the way: We had a much faster load last night because those unused indexes did only exist as definitions during the first run while during normal loads these indexes already have been pre-populated. So INSERT might behave differently in those situations basically?
February 15, 2019 at 5:16 pm
AFAIK, if you truncate the table, the pages allocated to the NDX are deallocated. They are then cleaned up with the ghost cleanup process.
If there is data in there, and you load, the indexes are updated, which could lead to page splits and other operations. If you load to empty indexes, there may be just page allocations, but depending on what is being loaded, there still could be other operations.
February 16, 2019 at 8:58 am
Hi Steve,
I think what you say makes sense, that's why I came to the conclusion a fast load makes sense once, however during the last load we went down another 8 minutes on the load time. Which makes me again suspicious of the TEXT Column which is not directly written to the fact table, just the FK to the PK of the detail table is written to the table. Thus no change in definition for the fact table.
Of course, fragmentation could be low enough to not cause any slow downs yet but it's still a factor 10 faster right now which kind of makes more sense that the Query Engine now instead of having to expect of up to 11 MB of data for every row nicely chuckles at varchar(255). I just do not see how this makes sense as that column isn't used during lookup operations either, simply copied as detail for a dimension table and that's it.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply