Finding Bad Data in CSV Import

  • I'm importing 1,000,000 + records from a .CSV file into SQL using the Data Import Wizard, but am getting a fatal error. Other imports have gone fine, and an earlier error I got gave me the row # so I could fix it, but this error doesn't. 480,870 records got imported, but when I look around 480,870 in the .CSV using Ultra-Edit, they look ok. Perhaps SQL is batching the records so the bad ones are somewhere in the next batch ?

    The data is coming from a SQLite database that I've exported to .CSV files. The fields are TEXT and INTEGER type, so that's how they've been set up in SQL2005 also. They will later get imported into a SQL2005 3rd party warehouse product, so the temporary SQL tables do not need to necessarily be INTEGER and TEXT if changing them will help solve my problem.

    What tips or tricks do you know of that will allow me to find the bad data in the import file ?

    Here's the error message from the SQL Wizard:

    Messages

    Error 0xc0202009: Data Flow Task: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.

    An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Invalid character value for cast specification".

    An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Invalid character value for cast specification".

    An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Invalid character value for cast specification".

    (SQL Server Import and Export Wizard)

    Error 0xc020901c: Data Flow Task: There was an error with input column ""corp"" (92) on input "Destination Input" (71). The column status returned was: "The value could not be converted because of a potential loss of data.".

    (SQL Server Import and Export Wizard)

    Error 0xc0209029: Data Flow Task: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "input "Destination Input" (71)" failed because error code 0xC0209077 occurred, and the error row disposition on "input "Destination Input" (71)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.

    (SQL Server Import and Export Wizard)

    Error 0xc0047022: Data Flow Task: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Destination - boxlist" (58) failed with error code 0xC0209029. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.

    (SQL Server Import and Export Wizard)

    Error 0xc02020c4: Data Flow Task: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.

    (SQL Server Import and Export Wizard)

    Error 0xc0047021: Data Flow Task: SSIS Error Code DTS_E_THREADFAILED. Thread "WorkThread0" has exited with error code 0xC0209029. There may be error messages posted before this with more information on why the thread has exited.

    (SQL Server Import and Export Wizard)

    Error 0xc0047038: Data Flow Task: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Source - boxlist_csv" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

    (SQL Server Import and Export Wizard)

    Error 0xc0047021: Data Flow Task: SSIS Error Code DTS_E_THREADFAILED. Thread "SourceThread0" has exited with error code 0xC0047038. There may be error messages posted before this with more information on why the thread has exited.

    (SQL Server Import and Export Wizard)

  • hi,

    i was just having the same problem. Weird thing is, I looked for the lines causing the problems, and copied a couple of "bad" ones and a couple of "good" ones into a new csv with only 15 lines. Now the new csv is choosing other lines as "bad".

    In the end I found out that Excel (adapting the csv) seems to have forgotten the delimiters at the end of some lines, where the fields would have been empty. I haven't a solution for that yet, but at least I know the reason...

    Maybe that helps you too...

    Max

  • I'm not sure this will help but this is why I like BCP... you can tell it to put the bad lines of data in an isolation file.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • max-1156434 (10/2/2009)


    hi,

    i was just having the same problem. Weird thing is, I looked for the lines causing the problems, and copied a couple of "bad" ones and a couple of "good" ones into a new csv with only 15 lines. Now the new csv is choosing other lines as "bad".

    In the end I found out that Excel (adapting the csv) seems to have forgotten the delimiters at the end of some lines, where the fields would have been empty. I haven't a solution for that yet, but at least I know the reason...

    Maybe that helps you too...

    Max

    If you have blank columns at the end, add a dummy column to the right of the "last" column with something like the letter "X" in it to guarantee that all columns are exported, blank or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • "If you have blank columns at the end, add a dummy column to the right of the "last" column with something like the letter "X" in it to guarantee that all columns are exported, blank or not"

    thx, sounds like a good workaround!

  • Thanks for the feedback, Max.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Are you sure your dest column is wide enough to handle all the incoming text? Not sure, but it sounds like this...

    The column status returned was: "The value could not be converted because of a potential loss of data"

    ...means that one of the lines you're trying to import is too long, i.e. the incoming chunk of text is too big to fit the dest column, so the import is saying "whoa, hold on there, don't want to do that!"

  • Eh, nevermind, it's "Invalid character value for cast specification". That's something different. Sorry.

  • You could save what the import wizard is doing as a SSIS package (if you haven't already), then open it in BIDS and modify the data flow to redirect rows with errors to a file. Good luck.

  • I have had trouble importing manually entered .xls files that was caused by embedded carriage returns. The first time through SQL Server, the carriage returns were ignored, but after exporting the query results as .csv and reimporting, the carriage returns were seen as end of row delimiters. The files were small enough that I was able to fix the problem in Excel before the first import. Hope this helps.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply