September 1, 2011 at 1:57 am
Hi I have this problem what among those two ETL choices will i create?
I know that Truncate and Load will Erase all records from Table and Load All records from source to it.
And Slowly Changing insert All records if it is new and Update records if it is changed.
But i don't know exactly which is faster?
Thanks in Regards
September 1, 2011 at 2:11 am
TRUNCATE TABLE removes the data by deallocating the data pages used to store the table's data
it is preety fast. I am not sure what do you mean by Load.
Regards,
Iulian
September 1, 2011 at 3:16 am
Actually I am going to Create an Integration Services or ETL...and i don't know which is better to use and who will finish faster.
Specifically My Source is a Table from ERP Server(Which contains millions of Records) and my destination is the replicate of that table on Datawarehouse Server.
September 1, 2011 at 3:49 am
how many tables, how many rows and how many rows have been changed since last load?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 1, 2011 at 4:34 am
In the slow method you don't mention deleting records that are no longer present in the source; presumably you have to do that - or is nothing ever deleted?
Whether truncate followed by load from source is quicker that insert new, update changed, and delete missing from source will depend on how many records are the same, how many are changed, how many are deleted, and how many are new, and what indexes exist on the target table, whether it's a heap or clustered and what sort of page splits (if any) will be caused by deletes and updates.
Unless you give some sort of estimates for the numbers the only answer anyone can give you is £it depends", and even with the numbers it may not be obvious without details of the schema. In fact your best bet is probably to run some tests - try doing truncate and load and see how long it takes; then try doing it with a merge statement and see how long it takes - but don't belive the results of a single comparison, do enough tests to give you an idea of speed over the range of mixes of cretae/update/delete/leave alone that you are likely to encounter in real life.
Tom
September 1, 2011 at 3:06 pm
From a design point of view they really have different purposes.
But to answer your question the SSIS SCD transform is a dog and not generally recommended. There is a much better one on codeplex by the Kimball Group. PragmaticWorks has also taken the Kimbal SCD and tweaked it a bit to turn it into a commercial product.
So back to basics. Do you need or want to maintain any historical data in the table? I suspect the answer is no because you are considering truncate. If you need to maintain historical data that will exclude a truncate and force you into some sort of SCD transformation. Whether you use the existing one because you don't really have to worry much about scale in your environment, or get a 3rd party one, or even build your own it doesn't matter. If you need to maintain historical data you need to have something that will do that.
So that brings us to the third possibility, you don't need to maintain historical data and you are asking if a truncate and insert will be faster than an incremental load. I would generally say that an incremental load would be the way to go because it will absolutely scale better. An incremental load however would require some lookups or merge joins and some conditional splits as well as 2 destinations(an update destination, and an insert destination) so it is a bit more complicated than an Execute SQL Task with a truncate command followed by a data source and a data destination.
Without trying to be insulting it sounds like you need to do a bit of reading on how to use the different SSIS transforms so you can figure out the best approach because comparing a truncate to an SCD is comparing apples and oranges because their fundamental purpose is different.
September 1, 2011 at 7:26 pm
Thanks A LOT!
I would go to Slowly Changing Dimension some of our application needs look up to that Table from Datawarehouse. We get some issues about it where we cant get data when Job is Running.
September 2, 2011 at 4:57 am
DELETE is a logged operation on a per row basis. This means
that the deletion of each row gets logged and physically deleted.
You can DELETE any row that will not violate a constraint, while leaving the foreign key or any other contraint in place.
TRUNCATE is also a logged operation, but in a different way.
TRUNCATE logs the deallocation of the data pages in which the data
exists. The deallocation of data pages means that your data
rows still actually exist in the data pages, but the
extents have been marked as empty for reuse. This is what
makes TRUNCATE a faster operation to perform over DELETE.
You cannot TRUNCATE a table that has any foreign key
constraints. You will have to remove the contraints, TRUNCATE the
table, and reapply the contraints.
TRUNCATE will reset any identity columns to the default seed
value. This means if you have a table with an identity column and
you have 264 rows with a seed value of 1, your last record will have
the value 264 (assuming you started with value 1) in its identity
columns. After TRUNCATEing your table, when you insert a new
record into the empty table, the identity column will have a value of
1. DELETE will not do this. In the same scenario, if you
DELETEd your rows, when inserting a new row into the empty table, the
identity column will have a value of 265
Reagrds
Raju
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply