Null records being Inserted during Import of CSV file

  • I know very little about SSIS, so bear with me.

    I have a SSIS package that inserts records into a 'staging' table from a CSV file (loading one CSV file per package execution).

    The 1st production file loaded just fine - no issuses. The 2nd file loaded an additional 494,720 NULL rows (all columns for each row had NULLs) in addition to the 18,272 records I wanted to import.

    I have a flat file connection (CSV), Code Page: 20127 (US-ASCII), Format: Delimited, Text Qualifier: ", Header Row Delimiter: {CR}{LF}, Header Rows to Skip: 2, Column Delimter: Comma{,}, Data Type: unicode String [DT_WSTR] (for ALL Columns).

    I have an OLEDB Connection to my Staging table and I'm using 'fast load'.

    What could be different about the 2nd file to cause this? What 'file editor' would I need to use to see said differences? Is there a setting I could use(set) to Not load Null rows?

    Any ideas of what could be causing this?

    Thanks,

    Sqlraider

  • Sqlraider (9/6/2013)


    I know very little about SSIS, so bear with me.

    I have a SSIS package that inserts records into a 'staging' table from a CSV file (loading one CSV file per package execution).

    The 1st production file loaded just fine - no issuses. The 2nd file loaded an additional 494,720 NULL rows (all columns for each row had NULLs) in addition to the 18,272 records I wanted to import.

    I have a flat file connection (CSV), Code Page: 20127 (US-ASCII), Format: Delimited, Text Qualifier: ", Header Row Delimiter: {CR}{LF}, Header Rows to Skip: 2, Column Delimter: Comma{,}, Data Type: unicode String [DT_WSTR] (for ALL Columns).

    I have an OLEDB Connection to my Staging table and I'm using 'fast load'.

    What could be different about the 2nd file to cause this? What 'file editor' would I need to use to see said differences? Is there a setting I could use(set) to Not load Null rows?

    Any ideas of what could be causing this?

    Thanks,

    Sqlraider

    If you open the two files in Notepad++, you should get an idea of what the difference is.

    As for avoiding the import of this dodgy data, I would use a Conditional Split component in the data flow to redirect all of the rubbish rows to an unused output - that will filter it out.

    Or just filter it out when you process the data in staging.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Just a small hint - check if the columns in your destination table allows NULLs and change that.

  • Phil Parkin (9/7/2013)


    Sqlraider (9/6/2013)


    I know very little about SSIS, so bear with me.

    I have a SSIS package that inserts records into a 'staging' table from a CSV file (loading one CSV file per package execution).

    The 1st production file loaded just fine - no issuses. The 2nd file loaded an additional 494,720 NULL rows (all columns for each row had NULLs) in addition to the 18,272 records I wanted to import.

    I have a flat file connection (CSV), Code Page: 20127 (US-ASCII), Format: Delimited, Text Qualifier: ", Header Row Delimiter: {CR}{LF}, Header Rows to Skip: 2, Column Delimter: Comma{,}, Data Type: unicode String [DT_WSTR] (for ALL Columns).

    I have an OLEDB Connection to my Staging table and I'm using 'fast load'.

    What could be different about the 2nd file to cause this? What 'file editor' would I need to use to see said differences? Is there a setting I could use(set) to Not load Null rows?

    Any ideas of what could be causing this?

    Thanks,

    Sqlraider

    If you open the two files in Notepad++, you should get an idea of what the difference is.

    As for avoiding the import of this dodgy data, I would use a Conditional Split component in the data flow to redirect all of the rubbish rows to an unused output - that will filter it out.

    Or just filter it out when you process the data in staging.

    I was focused on the first row of data after the 2nd header record, thinking there was something not right, when in fact there are 494720 Null records (each row is all commas) After the last 'good' data record.

    I'm going to use your suggestion of a Conditional Split for those records I don't load into the staging table. That way if for some reason I don't load a record I'll at least still have it.

    Thanks,

    Sqlraider

  • I am going to hazard a guess that the CSV is generated from an Excel file and that there were 400K empty lines at the end of the CSV file. Any time Excel is involved any where near an SSIS package, SSIS seems to wander off in a huff.

    Excel and SSIS do not play nicely together in any combination I have found

  • aaron.reese (9/16/2013)


    I am going to hazard a guess that the CSV is generated from an Excel file and that there were 400K empty lines at the end of the CSV file. Any time Excel is involved any where near an SSIS package, SSIS seems to wander off in a huff.

    Excel and SSIS do not play nicely together in any combination I have found

    A third party creates the file (how I don't know) and sends it to us as CSV. I didn't see the 400k null lines until I opened the file using Notepad++.

  • So is the source file missing data (i.e. are the lines of empty fields supposed to have data) or just extra lines.

    If you are happy that the data is complete, I would

    1) raise a defect against the file source to get the null rows removed

    2) apply the conditional split as per Phils suggestion to ignore them - the exact rules for the CS will be down to you and the nature of the data (I would find a field or combination of fields that CANNOT be null and validate against them)

  • aaron.reese (9/17/2013)


    So is the source file missing data (i.e. are the lines of empty fields supposed to have data) or just extra lines.

    If you are happy that the data is complete, I would

    1) raise a defect against the file source to get the null rows removed

    2) apply the conditional split as per Phils suggestion to ignore them - the exact rules for the CS will be down to you and the nature of the data (I would find a field or combination of fields that CANNOT be null and validate against them)

    It's just extra lines (lines with just commas no data between ex: ,,,,,,). I did apply a conditional split off of the ONE field that cannot be null. This is a monthly file and only on occasion does it have extra lines.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply