It's not everyday that one of your information vendors decides to change the format of its data feed. Evidently, today is not every day. It would be one thing to change the data type of an incoming field or even add/remove a field from the feed. Nope, they decided to change from a .csv file to a full blow .xls with a human readable header telling me what the file is for. Not sure why they would do this, since the file is over 150MB and over 27k rows long! Who is going to look at it? Now we have to read in an .xls file with no valid header information. Its very nice when the Excel Source component reads in the first row and populates the external columns for you. Its especially nice when the file contains 90 columns! Even worse, when you have empty fields mixed with alphanumeric and numeric the helpful source component gives a best guess at what the column data type should be. In our case it was terribly wrong and we needed a way to fix it. Here is what the new header looks like:
This is causes bunches of issues with the Excel Source. I can no longer say 'first row has column names' and this spreadsheet has 90 columns! Add to this the 'auto-magical' setting of column data types. To accomplish this, the source reads the first, I believe, 100 rows and makes a determination. Not helpful in this circumstance. Let's see if the standard editor provides us with any relief.
The columns are labeled F1-F90, because, as I stated before, I cannot use the first row as header information! What I can do here is change the output name from F1 to say 'Reporting Period' and select what columns are read in and pushed down the data pipe. I need to change data types. No joy.
Enter the Advanced Editor available on all Data Source components. This often overlooked editor is tucked away in the right-click menu of the Data Source. This editor has 4 tabs, we are interested in the last tab labeled Input and Output Properties.
Here is where you can wrestle the columns and data types from the component and do with them as you wish. The source thinks the highlighted column is a Unicode string of length 255 named F1. It's really called 'Reporting Period' and should be a string of length 5, since the value is always 'Daily'.
Realize we now have garbage in our stream. We want to throw away everything prior to the first good data record (row 6 in the example above). We simply added a Conditional Split component to weed out the bad header records.
Using the Advanced Editor, the vendor can change the header in any way, even modify the column names themselves and it won't affect the source component. We have drastically reduced the dependency of the component and our SSIS package on 'proper' feed formatting, allowing us to recover quickly from feed 'adjustments'.