Using Bulk Insert or OPENROWSET to Import a CSV File

Question

Using Bulk Insert or OPENROWSET to Import a CSV File

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply

alex.sqldba SSChampion Points: 10254 More actions · Answer 1

Hi Jeff,
I was only joking about banning CSV's.

In regard to the header guff that appears we are talking with the provider, who in turn is talking with the vendor to get either removed at source or made as an option. As it stands they have a compiled application (no access to source code) that provides this output with all that header guff, and the linefeed, etc.

And if you knew how much they paid for it, it would make your eyes water.

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 2

OK, while I wasn't able to find a solution using BULK INSERT, OPENROWSET, or BCP since they all use the same format file, I was able to build a simple SSIS package to properly read this file.
Setup a Flat File Connection Manager with the following properties:
General:
- Format: Delimited
- Text qualifier: <none>
- Header row delimiter: {LF}
- Header rows to skip: 17
- Column names in the first data row (checked)
Columns:
- Row delimiter: {LF}
- Column delimiter: Semicolon {;}

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 3

alex.sqldba - Friday, March 16, 2018 10:28 AM
Hi Jeff,
I was only joking about banning CSV's.
In regard to the header guff that appears we are talking with the provider, who in turn is talking with the vendor to get either removed at source or made as an option. As it stands they have a compiled application (no access to source code) that provides this output with all that header guff, and the linefeed, etc.
And if you knew how much they paid for it, it would make your eyes water.

Thanks for the feedback, Alex. Sounds like it's easier to just do the good work-around you did with PowerShell. Less chance of them screwing something new up. 😉

And I can only imagine on how much they paid for the data as you say... went through that when I was working for a company that consumed data from Double-Click.net. Totally insane prices for any change no matter how logical or trivial. Their original data format is crap (CSV file that looks like spreadsheet export that goes horizontal instead of vertical... with pairs of columns that need to stay together and can have virtually any number of columns). The only good thing about the CSV data in the files was that the headers had precisely the same number and type of delimiters so that the FIRST ROW thing actually did work correctly.

In the long run, I wrote some T-SQL that would load the header rows (1 with "cell spans" over the second, just like a bloody spreadsheet), figure out what needed to go with what based on the "cell spans", and write the dynamic SQL to unpivot the data from the staging table. Folks had written a bunch-o' gobilty gook in other languages and it took 45 minutes per file just to get the data ready for import. I got it down to importing through the staging table to a secondary table in the correct format, did all the validations, and then load the data into the final table at the rate of 8 files every two minutes. Some of the files were as much a 800 "columns" wide and it figured all that out auto-magically and all in T-SQL.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 4

Chris Harshman - Friday, March 16, 2018 11:58 AM
OK, while I wasn't able to find a solution using BULK INSERT, OPENROWSET, or BCP since they all use the same format file, I was able to build a simple SSIS package to properly read this file.
Setup a Flat File Connection Manager with the following properties:
General:
- Format: Delimited
- Text qualifier: <none>
- Header row delimiter: {LF}
- Header rows to skip: 17
- Column names in the first data row (checked)
Columns:
- Row delimiter: {LF}
- Column delimiter: Semicolon {;}

Heh... not so easy for me... I'd first have to stand up an instance of SSIS. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)