SSIS/DTS Data loss due to implicit casting(?)

  • Since the days of SQL 7 DTS, there has been a problem with certain kinds of data imports where it seems like an implicit cast is made on a source column depending on the values towards the head of the file. To illustrate this, I created an Excel sheet with two columns, and populated the columns both with integers for the first 1000 rows. After that I changed the second column to strings for two more records:

    Then I setup an SSIS Data Flow to read that file. When I run the flow using a Data Viewer, you can see that the last two records fail to be read and are shown as NULL:

    We dealt with this in DTSby being explicit about the destination datatype and making sure the biggest numbers or the strings get moved towards the top of the file. Right now we are working with a DB2 source on a mainframe and we are seeing a similar problem where a sparsely populated column of dollar values ends up writing zero in the destination where there should be values. The work around here is to ORDER BY column1 DESC, column2 DESC, etc. so that the bigger numbers show up sooner. I canโ€™t help but think there must be a more elegant solution. I spent some time Googling around but I have not found anything that seems to address this.

  • Is this a problem for you because the files are passing in on the fly so you have no way of knowing in advance the data types required? I must admit I have found my own issues with regards to files being brought in. Too much guess work from ssis and (unless someone can help here) not enough flexibility within that guess work to allow the user to specify what to check for.

    I've always had to have a heads up and then go into the (name slips me by but may be "advanced") menu of the destination to prepare the columns manually.

  • This is a very, very, very common problem with SSIS and Excel. It shows up in hundreds of blog and forum posts.

    And apparently it existed already in DTS.

    The issue is that Excel guesses the datatype based on a fixed number of rows of that column. If it sees x times a number, the column will get a numeric datatype, thus rendering NULL for your string values later on.

    How to fix this? Set IMEX=1 in your connection string. This will tell the driver that if there are intermixed datatypes, to convert everything to text. Furthermore, you have to set a registry value to 0, telling the driver to scan the whole column, and not just the first x rows.

    This is a whole lot easier then preparing your files each time to make sure that the strings are on top.

    Reading material:

    http://support.microsoft.com/kb/189897 (registry key)

    http://swap.wordpress.com/2007/09/18/solution-for-excel-data-uploading-problem-using-ssis/ (IMEX=1).

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks Da-Zero - that will help me too!

  • I know it's common because I've been having to convince colleagues it's a real issue and I am not crazy for 10 years. I am grateful to now know the solution. Thank you!

  • Just one point of clarification: the issue isn't with DTS/SSIS. It's with the Jet driver, used to access Excel spreadsheets by those programs. The same problem will show up when using OpenRowset or OpenDatasource with those drivers.

    I wonder if the new Office drivers address this issue? Might be time to do some testing...

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (8/27/2010)


    Just one point of clarification: the issue isn't with DTS/SSIS. It's with the Jet driver, used to access Excel spreadsheets by those programs. The same problem will show up when using OpenRowset or OpenDatasource with those drivers.

    I wonder if the new Office drivers address this issue? Might be time to do some testing...

    You are absolutely right. (It is also the Jet driver who has issues with 64-bit).

    Actually, the comment that I posted is a template. Yeah, this issue is so common on this forum that I have a standard template to answer it ๐Ÿ˜€

    I'll adapt it so that it is more accurate.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I also want to add links to these 3 blog posts because there is much more detail. I still do not have the magic way to take 1000 general cells that are numbers and not lose the 2 strings at the end. If I explicitly make the first few cells text representations of numbers by typing the ' in front of the number value it works. But don't we need a reliable way to have a DTSX package running in SQL Server Agent on a schedule and accessing an Excel spreadsheet correctly without our intervention?

    DougBert 1

    DougBert 2

    DougBert 3

  • Andrew Notarian (8/27/2010)


    I also want to add links to these 3 blog posts because there is much more detail. I still do not have the magic way to take 1000 general cells that are numbers and not lose the 2 strings at the end. If I explicitly make the first few cells text representations of numbers by typing the ' in front of the number value it works. But don't we need a reliable way to have a DTSX package running in SQL Server Agent on a schedule and accessing an Excel spreadsheet correctly without our intervention?

    DougBert 1

    DougBert 2

    DougBert 3

    Well, there is one way to do this...

    Use a script task. Go into vb.net, and do excel automation to open up the spreadsheet, and get the data row/col by row/col. Then you can process things cell by cell.

    Note that Excel automation requires installing stuff on a server that is only supported on a workstation.

    Check out this article[/url] for how to do Excel Automation directly from within SQL - it can be easily adapted to run in a script task.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • I was thinking along similar lines, but with a different slant: automate Excel to output the data to CSV and then use that CSV as the SSIS data source.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I tried explicit conversion inside the Jet SQL but I don't think this is helping much:

    SELECT ID, CSTR(Description), CSTR(TextDescription) FROM [Sheet1$]

    I think I need to get back to the root problem which is a DB2 source accessed with the Neon Shadow driver. We have columns there that are 0 for thousands of records and then multi-million values get lost because of truncation. This can probably be resolved in the Advanced Properties dialogs and being more specific about data types on the source end.

  • Andrew Notarian (8/27/2010)


    I still do not have the magic way to take 1000 general cells that are numbers and not lose the 2 strings at the end. If I explicitly make the first few cells text representations of numbers by typing the ' in front of the number value it works. But don't we need a reliable way to have a DTSX package running in SQL Server Agent on a schedule and accessing an Excel spreadsheet correctly without our intervention?

    If you use IMEX=1 and set the registry setting of TypeGuessRows to 0, doesn't that solve your problem?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply