Handling Mixed Format Data Files in SSIS

Tim Mitchell, 2015-04-22 (first published: 2015-04-13)

Some time back I wrote about how to use the script component to parse out ragged data files in SSIS. In this post, I’ll continue the discussion to describe how to handle mixed format data files – specifically, those with several different record types in each file.

In a perfect world, a flat file will contain a single record type. However, my experience in healthcare several years back, as well as subsequent projects as a consultant, have taught me that there are many antipatterns around mixing file formats in a single data file. Combining several different record types into one file eliminates a small bit of work by eliminating multiple data files, but also adds complexity to those systems consuming the resulting hybrid files. In this post I’ll show an example of this, and I’ll demonstrate a design pattern to handle this type of data formatting.

Consider the case of the following mixed format data file, which contains a combination of patient hospital visit records as well as procedures undertaken in each of those visits.

As shown, there are two different types of data represented here. The longer records, marked with a V in the first column, appear to have hospital visit information. Subsequent records, indicated by a P, show medical procedures related to those visits. Since the shape and purpose of these types of data are so different, it’s a near certainty that the contents of this file should be sent to two different outputs.

For older versions of SSIS – 2008 and 2008 – this would be problematic. The flat file source in those older versions was hard-wired to look for exactly the number of columns defined in the flat file connection manager, even if it meant reading data from the next line in the file to satisfy the number of expected columns. Fortunately, this behavior was fixed in SSIS 2012, in which any missing columns are simply filled in with NULLs. This change in behavior makes it possible to use native components (read: no scripting) to handle a file similar to the one above.

To process this file in SSIS, the flat file connection manager will be configured with enough column metadata to handle the file format with the greatest number of columns. In the example above, the V record (visit) has 11 columns, while the P record (procedure) has only five. Therefore, the flat file connection manager should be configured with 11 columns, with the data type of each column set up to handle the largest or broadest data type in any row type for that column. As shown below, I’ve configured a flat file connection manager with 11 generically named columns, which will be mapped to the appropriate output tables and columns further downstream.

Using the configuration above, all eleven columns will be populated by the visit records; the procedures record type will load only the first five columns, leaving NULL values in the rest. When connecting to this flat file connection manager in the SSIS data flow using a flat file source, both record types will be loaded into the data pipeline. Once that is in place, separating the records is as easy as using a conditional split based on the first column (V for visit, P for procedure).

As shown, Visit records (those matching [Column0] == “V”) are sent to the Visits output, while all others are sent to the default output named Procedures. Each output can then be mapped to its corresponding target table.

Finally, for each target table, the generic column names must be manually mapped to the corresponding output columns in the target table.

It bears mentioning again that this mixed format data file processing pattern works only in SSIS 2012 and 2014. To accomplish this in older versions, the scripting method I described in my earlier post would be used.

The post Handling Mixed Format Data Files in SSIS appeared first on Tim Mitchell.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Handling Mixed Format Data Files in SSIS

Rate

Share

Share

Rate

Handling Mixed Format Data Files in SSIS

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts