It’s the second Tuesday of the New Year, so here comes the first T-SQL Tuesday of 2014! SQLChow (blog | twitter) has the honour to host this jubilee edition of the world famous blog party. The theme of this month is the utterly DBA-related topic of Automation. However, it is pretty easy to give it a BI spin.
Even we measly BI developers can automate some parts of the development. I’m talking about BIML, a metadata driven markup language (XML for the people that like acronyms) that can generate SSIS packages like you’ve never seen before. BIML is developed by the nice people at Varigence and an open-source implementation is available in the widely popular Visual Studio add-in BIDS Helper, which you can download for free at Codeplex. This post will not explain what BIDS exactly is and how it works; I already did that in an article series at MSSQLTips.com. You can find those here, here and here. Make sure you also check the very excellent resources I link to at the bottom of the articles. The key point you need to remember is that you can use metadata (data about how and where your data is stored) and some scripting with .NET and XML to generate SSIS packages on the fly.
The only way a solution with BIDS can work is that you have a lot of the same (see how I referred to the title? Extra bonus points.) Developing a BIML script to generate a single package is just plain crazy. It is much easier to do this directly in BIDS (SSDT/SSDTBI) instead of meddling with those XML tags. However, if you can generate a whole bunch of packages in one single click – for example 20 packages importing flat files or 15 packages doing incremental loads for dimensions – you just saved yourself a whole bunch of time. And that’s where the strength of BIML lies. Take a look at your projects and your SSIS packages and see if you can recognize patterns between them. I bet you can find some. These are opportunities to save a bunch of tedious development effort.
To give an example: in my current project, I created a few packages that all do the same thing: read from the source, calculate a hash value, do a lookup on the destination table and determine if a row is an insert or an update (using the hash value). Inserts are directly sent to the destination table, updates are sent to a staging table. Using this staging table a set-based update is issued against the destination table. Finally a SQL statement checks if rows from the destination table are missing in the source (deletes) and a delete flag is marked for those rows. This pattern was exactly the same for each package. Due to time constraints I couldn’t develop a BIML script because it would take me the same amount of time to develop those packages by hand (about 2 days) and there was a hard deadline. However, I will see if I can find time in the following weeks to develop a BIML script anyway, because I know sources will be added in the future. And when they do, I just need to fill in some metadata in a table and the package will be generated with a simple click. This will potentially saves days of work, and I can always take the script to other projects where similar patterns are needed.
So BI developers, what are you waiting for? Automate!