When I first started down the business intelligence and data warehousing road, I needed to learn the foundations of dimensional design. Ralph Kimball's methods seemed appropriate at the time, and they still are for me. The next step was learning how to make Microsoft's BI tool stack follow that methodology, and after that, making the technology work as easily, reliably, and as fast as possible.
There Are Knowledge Gaps
Over the last year, I've noticed that although there's a lot of interest in the last step in that process - making things go fast - there seems to still be a lot of confusion over the first steps. Quite a few interactions I've had with the community lead me to believe that there's a significant number of people skipping over understanding the dimensional model, and how ETL (of any brand or style) should be used to manage loading one.
I see questions in the MSDN forums asking why the OLE DB Command isn't inserting rows into their dimension table... which clearly demonstrates two problems the asker has. First, they don't understand that the OLE DB Command doesn't insert rows - that's not its purpose. But the second and more fundamental issue is that they don't understand why it doesn't. They aren't grasping the concept of new row versions and updating old rows. If they understood that, they wouldn't wonder why the OLE DB Command didn't insert rows, because they'd know adding rows to their dimension table shouldn't happen at that point.
I Sucked Too
This also came up quite clearly in my side-project of the Kimball Method SCD component. I assumed (and you know what that means) that people would know what to do with the "New" and "Expired" outputs I'd labeled on the component. I held on to that belief so fundamentally that I created a video intending to demonstrate the performance of the component, but completely ignoring the dimensional modeling accuracy of the demo. In fact, the video showed exactly what not to do when inserting/updating rows in the dimension table.
But Then, So Did AdventureWorks
I tried to find a fairly simple example of dimension loading that clearly showed "data in" and "data out" and how that mapped to the SSIS SCD Wizard or any other tool. I expected to find that in some Microsoft samples - like the AdventureWorks Integration Services samples. Unfortunately, the writers of this example also assumed too much about the readers' knowledge of dimensional modeling. The AdventureWorks samples contain an SSIS package specifically named AWDataWarehouseRefresh - implying that it can be used to "refresh" AdventureWorks dimension tables. But it loads dimension tables into SQL Server from a CSV file. A completely populated dimension table - a snapshot in time. No use of the SCD Wizard. No incremental loading. I really can't understand why they thought it would be an accurate demonstration of loading a table in a data warehouse specifically - fact tables OR dimensions. That example applies to any kind of table, data warehouse or not, and provides no help at all for the specific problem of loading dimension tables on an incremental basis.
A New Step-By-Step
So here's my second attempt at doing it better the first time. I've made a few videos and other resources to try to better explain how to incrementally load dimension tables with SSIS, and I'll be posting them in a series here on my blog. The videos were originally intended to better explain my Kimball Method SCD component and are already available there, but they seem to fit this bill as well. Please do comment on specifics that make them easy or hard to understand. I've made comments through my channels to Microsoft about improving the information available for this kind of thing - but your comments can help me confirm to Microsoft that this isn't all a figment of my imagination.
The next post in this series will provide and describe the sample data stored in our data warehouse, and the source system that updates will come from.
Book Review: Big Red - Voyage of a Trident Submarine
I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...
2009-03-10
1,439 reads