Importing Excel file - Overriding the ACE driver's field type

  • Hi All,

    I know this is an age old problem, however, does anyone know how to get around the ACE driver and overide the field type it thinks a field is.

    In the past, I've ended up scripting in SSIS to get around it, however, I've been asked to import about 5 Workbooks, each with at least 3 worksheets, as an ETL process, and unfortunately a field common to all of them is being interpreted as a FLOAT by ACE, although it's actually a string.

    Non Numeric characters don't start appearing in the field until the 2015 YOA, and as the files go back to 2011, they're quite a way down 9the file i'm looking at right now doesn't start have non numerics in the field till row 570).

    Does anyone know how to get around ACE's stupidity? I know that there's the registry change, but that doesn't seem to help, and I know that changing the field type in the source is completely ignored, as ACE just changes it back afterwards or if not at run time when it scans the file. I'd rather not have to create a script task to manually process 15-20 worksheets (I'd like to go home for christmas!), so does anyone know a workarond?

    Many thanks all.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A (12/19/2016)


    Hi All,

    I know this is an age old problem, however, does anyone know how to get around the ACE driver and overide the field type it thinks a field is.

    In the past, I've ended up scripting in SSIS to get around it, however, I've been asked to import about 5 Workbooks, each with at least 3 worksheets, as an ETL process, and unfortunately a field common to all of them is being interpreted as a FLOAT by ACE, although it's actually a string.

    Non Numeric characters don't start appearing in the field until the 2015 YOA, and as the files go back to 2011, they're quite a way down 9the file i'm looking at right now doesn't start have non numerics in the field till row 570).

    Does anyone know how to get around ACE's stupidity? I know that there's the registry change, but that doesn't seem to help, and I know that changing the field type in the source is completely ignored, as ACE just changes it back afterwards or if not at run time when it scans the file. I'd rather not have to create a script task to manually process 15-20 worksheets (I'd like to go home for christmas!), so does anyone know a workarond?

    Many thanks all.

    Hi Thom. First of all, you are not alone – anyone here who has had to deal with Excel imports shares your pain and incredulity.

    Here is the best way I have found of fighting the ACE driver (such an ironic name, should be called the DUFF driver) and (usually) winning. It assumes that there is always a header row in the spreadsheets you are importing.

    1) Use a connection string that contains this:

    HDR=No;IMEX=1

    You are telling the driver that your data has no header and to import mixed data types as text.

    2) Ensure that the range you are importing includes the header row.

    Doing this means that all of the data is imported as text.

    3) Add a Conditional Split after your data source to direct the header row to an unused output.

    4) Add data conversions to transform the non-text data back to how it should be.

    Redirect your errors to a suitable logging table.

    Good luck.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks Phil,

    I got to step 4, and my error rows were about half of my data set. Exporting that data out showed nothing odd about it, but they were all date columns. My thought is that it's another ACE driver special, where it decides that 19/12/2016 is actually 42722, andas 42722 can't be converted to a Date SSIS pukes all over you.

    Thus, I ended up passing all the fields over to SQL as VARCHAR(255) (unfortunately with truncation, but considering the end product can only accept a max length of 2x76 I'll lose more there), and then converting in SQL. Fortunately SQL Server was more than happy to oblige with a couple of CASE statements.

    You are right, ACE is really not what they should be call it, perhaps: microsoft Access Redistributed databaSE engine (or ARSE Engine). :hehe:

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Yes, I've been through that pain too. I actually had it working for a while (pretty much using the steps Phil has mentioned) but at one point it all fell in a heap and wouldn't play again.

    Fortunately I managed to get the people supplying the excel files to send them as csv files instead and everything has been sweetness and light since.

    I suppose getting csv files sent isn't an option? It shouldn't be of course but I've already ranted about the delight of Excel files with SSIS in the past and can feel my blood pressure climbing now.

  • Thom A (12/19/2016)


    Thanks Phil,

    I got to step 4, and my error rows were about half of my data set. Exporting that data out showed nothing odd about it, but they were all date columns. My thought is that it's another ACE driver special, where it decides that 19/12/2016 is actually 42722, andas 42722 can't be converted to a Date SSIS pukes all over you.

    Thus, I ended up passing all the fields over to SQL as VARCHAR(255) (unfortunately with truncation, but considering the end product can only accept a max length of 2x76 I'll lose more there), and then converting in SQL. Fortunately SQL Server was more than happy to oblige with a couple of CASE statements.

    You are right, ACE is really not what they should be call it, perhaps: microsoft Access Redistributed databaSE engine (or ARSE Engine). :hehe:

    Sounds like you have a working solution: well done. But should you find yourself in a belligerent frame of mind, wanting to get round the date conversion issue in SSIS, please post back with some examples and I'll show you how to do it (probably :-))

    --Edit: fixed typo

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • BrainDonor (12/19/2016)


    I suppose getting csv files sent isn't an option? It shouldn't be of course but I've already ranted about the delight of Excel files with SSIS in the past and can feel my blood pressure climbing now.

    Highly unlikely. Considering that I have been hitting a brick wall for the last 2 years with almost every Insurer we deal with to supply their files by SFTP (which is in their interest, as if that email is intercepted it's their fault, not mine!), getting them to supply me with a separate raw file export is going to be impossible. Insurers love a good Excel export to share their financials.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A (12/19/2016)


    BrainDonor (12/19/2016)


    I suppose getting csv files sent isn't an option? It shouldn't be of course but I've already ranted about the delight of Excel files with SSIS in the past and can feel my blood pressure climbing now.

    Highly unlikely. Considering that I have been hitting a brick wall for the last 2 years with almost every Insurer we deal with to supply their files by SFTP (which is in their interest, as if that email is intercepted it's their fault, not mine!), getting them to supply me with a separate raw file export is going to be impossible. Insurers love a good Excel export to share their financials.

    Sounds like we're in the same market!

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • if excel file format is 2010 or higher then https://msdn.microsoft.com/en-us/library/office/gg575571(v=office.15).aspx may be an option.

  • frederico_fonseca (12/19/2016)


    if excel file format is 2010 or higher then https://msdn.microsoft.com/en-us/library/office/gg575571(v=office.15).aspx may be an option.

    Using a script task was exactly what I want to avoid 🙂

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • frederico_fonseca (12/19/2016)


    if excel file format is 2010 or higher then https://msdn.microsoft.com/en-us/library/office/gg575571(v=office.15).aspx may be an option.

    Have you tried using this? How did it go?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (12/20/2016)


    frederico_fonseca (12/19/2016)


    if excel file format is 2010 or higher then https://msdn.microsoft.com/en-us/library/office/gg575571(v=office.15).aspx may be an option.

    Have you tried using this? How did it go?

    No, as it's a Script task, which is something I wanted to avoid. One of our other DBA's really doesn't do VB or C# (He can't write it, and really can struggle with reading it), so if I'm not about and he needs to trouble shoot it, it's not going to happen. As it's something I need to ensure can happen when I'm not about, not something I can afford to do as i suspect he'll simple "shut down" at the sight of a Script.

    Considering that the source is also external, I wouldn't be able to always guarantee that it's going to be Excel 2010+ (although i would be surprised if not, Office 2007 does still has a year(is) of Extended Support left).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Phil Parkin (12/20/2016)


    frederico_fonseca (12/19/2016)


    if excel file format is 2010 or higher then https://msdn.microsoft.com/en-us/library/office/gg575571(v=office.15).aspx may be an option.

    Have you tried using this? How did it go?

    I did use it on a small project - worked ok for our needs then.

    Also on Insurance business but on most cases I can force the file format on those that send them to our ETL teams so even using the ACE driver is not an issue.

    And on that I even forced one of the suppliers to have a dummy record as the first line on their spreadsheet to bypass the datatypes issue. I find that in most cases its easier to ask them to add the dummy record than to have them changing the format of the files.

    In most of my projects we are using C# instead of SSIS to do this type of processing - in some we load directly to the database, in others we crease .sql scripts with blocks of 1k records to insert.

  • frederico_fonseca (12/20/2016)


    Phil Parkin (12/20/2016)


    frederico_fonseca (12/19/2016)


    if excel file format is 2010 or higher then https://msdn.microsoft.com/en-us/library/office/gg575571(v=office.15).aspx may be an option.

    Have you tried using this? How did it go?

    I did use it on a small project - worked ok for our needs then.

    Also on Insurance business but on most cases I can force the file format on those that send them to our ETL teams so even using the ACE driver is not an issue.

    And on that I even forced one of the suppliers to have a dummy record as the first line on their spreadsheet to bypass the datatypes issue. I find that in most cases its easier to ask them to add the dummy record than to have them changing the format of the files.

    In most of my projects we are using C# instead of SSIS to do this type of processing - in some we load directly to the database, in others we crease .sql scripts with blocks of 1k records to insert.

    Interesting. I have bookmarked it for further research at some point.

    If I can use it to get Excel data to stream from a Script Component source (using the SAX method), while having full control over datatypes, this is a worthy test project.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply