When you start working in ETL (Extract, Transform, and Load) or SSIS projects, you’ll be faced with two basic terms: Full Load (or Destructive Load) and Incremental Load. Learn what these mean with a simple analogy.
Back when I was in school, me and a bunch of friends used to hangout at a diner. The diner was popular for its specials, which were displayed on a board outside the entrance.
The manager would roll in every day, wipe the board clean and list all the specials for that day. There will be a few new specials introduced daily, but most of them would be the same every day. Still the manager would erase everything completely (including those that weren’t changed) and rewrite. He was fine with having to spend a lot of time every day to do this, but he wanted to clearly display all the specials.
On the other hand, a manager at a different diner would erase only those specials that were discontinued and add the new ones. He didn’t mind that the specials were not clearly displayed, and only wanted to make the daily changes as quickly as possible.
Destructive Load
Also known as Full Load, is a process of completely destroying/deleting the existing data and reloading it from scratch. A lot of unchanged data is also deleted and reloaded in this process. But a destructive load ensures the highest data integrity easily.
“Delete destination data. Read data from source. Load into destination.”
Incremental Load
Incremental load is a process of loading data incrementally. Only new and changed data is loaded to the destination. Data that didn’t change will be left alone. Data integrity can be ensured in this process too, but ETL can get complicated.
“Read source data. Compare with destination. Filter for new and changed data. Write to destination.”. I stole this line from Anatomy of an Incremental Load by Andy Leonard (b|t).
Why Incremental?
A full load seems to be easiest the approach. Right?. Then why do we even care about incremental load?
- Speed. Opting to do a full load on larger datasets will take a great amount of time and other server resources. Ideally all the data loads are performed overnight with the expectation of completing them before users can see the data the next day. The overnight window may not be enough time for the full load to complete.
- Preserving history. When dealing with a OLTP source that is not designed to keep history, a full load will remove history from the destination as well, since full load will remove all the records first, remember! So a full load will not allow you to preserve history in the data warehouse.
Full Load vs. Incremental Load:
Continue Learning
You saw a simple introduction to full and incremental loads. To learn more using detailed examples and implementing with SSIS, click here for full and here for incremental. And click “X” to reclaim your life. Results may vary!