Surely there is a way to import CSV data in this situation?

Question

Post reply

Surely there is a way to import CSV data in this situation?

Atat Rowcount

Default port

Points: 1453
More actions
January 23, 2011 at 7:10 am

#230682

My situation:
1. I have a table in one database with a column which contains the contents of a CSV file
2. I have a table in another database into which I would like to insert this data
The CSV data, like most CSV data, can contain commas within fields, and this is indicated with double quotes, however not all fields are surrounded in double quotes. So of course 1,2,3,"4,5",6 is five fields, and not six.
Aha - Bulk Inserts and File Formats, but no....because I am not reading from the file system
This situation will occur frequently, and the CSV files will usually contain in excess of 10,000 rows per file, therefore code to split lines/fields and loop through with cursors to parse manually, takes much too long to process.
As a recent mover from SQL 2000 to 2008, I am often discovering new commands and approaches, and I am hoping, REALLY hoping, that there is a way to do this.
Brian

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply

Ron McCullough SSC Guru Points: 63877 More actions · Answer 1

Here is a tool writen by Jeff Moden and posted to SSC. It might be just what you need.

CREATE FUNCTION [dbo].[DelimitedSplit8K_old]

/***************************************************************************************************

Purpose:

Split a given string at a given delimiter and return a list

of the split elements (items).

Usage Example:

SELECT *

FROM dbo.DelimitedSplit8K(@StringToSplit, @Delimiter)

Notes:

1. Optimized for VARCHAR(8000) or less.

2. Optimized for single character delimiter.

3. Does not "trim" elements just in case leading or trailing

blanks are intended.

4. cteTally concept originally by Iztek Ben Gan and

"decimalized" by Lynn Pettis (and others) for a bit of

extra speed and finally redacted by Jeff Moden for a

different slant on readability and compactness.

5. If you don''t know how a Tally table can be used to replace

loops, please see the following article...

http://www.sqlservercentral.com/articles/T-SQL/62867/

Revision History:

Rev 00 - 20 Jan 2010 - Jeff Moden

- Final redaction and test.

Rev 01 - 08 Mar 2010 - Jeff Moden

- Changed UNION ALL to UNPIVOT for bit of extra speed.

- See Gianluca Sartori''s solution on the following URL

http://ask.sqlservercentral.com/questions/4241/whats-the-best-way-to-solve-the-fizzbuzz-question

***************************************************************************************************/

--===== Define I/O parameters

(

@pString VARCHAR(8000),

@pDelimiter CHAR(1)

)

RETURNS TABLE

AS

RETURN

--===== "Inline" CTE Driven "Tally Table” produces values up to

-- 10,000... enough to cover VARCHAR(8000)

WITH

E1(N) AS ( --=== Create Ten 1''s very quickly

SELECT N

FROM (SELECT 1 AS N0, 1 AS N1, 1 AS N2, 1 AS N3, 1 AS N4,

1 AS N5, 1 AS N6, 1 AS N7, 1 AS N8, 1 AS N9) AS E0

UNPIVOT (N FOR EN IN (N0, N1, N2, N3, N4, N5, N6, N7, N8, N9)) AS unpvt

),--10

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N)

FROM E4)

--===== Do the split

SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

N AS StartPosition,

SUBSTRING(@pDelimiter + @pString,

N+1,

CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

FROM cteTally

WHERE N < LEN(@pString) + 2

AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

;

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 2

Ron,

Only one problem, that pivots a comma delimited string into a multi-line table. He needs 1 row - delimited 1 row.

A quick and dirty solution would be a while loop of some kind. First in the loop determine the next position of the " or the ,. If it's a ", find your position from quote to quote. Take that internal to your column. If it's a comma, go from comma to comma, and drag that in.

Repeat until you run out of commas on the line. You'll probably want to append a final 'comma' onto the list to make sure you've got an EndOfRow Indicator, which the system can see when dealing with flat files but you can't easily.

If you've only got to do this once or twice, this is your best bet. Easier to code and it's a one shot so you don't care as much about performance.

If you've to do this regularly and need it cleaned up for speed, post some DDL and sample data here on the source and target table and we can probably help you out further.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 3

Actually, we need more information from teh OP. If we could get the DDL (CREATE TABLE statements) for the source and target tables, sample data for the source table, and expected results in the target table based on the sample data, we could provide much better answers than simply guessing at what the true requirements are for developing a solution.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

Brian McGee-355525 (1/23/2011)
My situation:
1. I have a table in one database with a column which contains the contents of a CSV file
2. I have a table in another database into which I would like to insert this data
The CSV data, like most CSV data, can contain commas within fields, and this is indicated with double quotes, however not all fields are surrounded in double quotes. So of course 1,2,3,"4,5",6 is five fields, and not six.
Aha - Bulk Inserts and File Formats, but no....because I am not reading from the file system
This situation will occur frequently, and the CSV files will usually contain in excess of 10,000 rows per file, therefore code to split lines/fields and loop through with cursors to parse manually, takes much too long to process.
As a recent mover from SQL 2000 to 2008, I am often discovering new commands and approaches, and I am hoping, REALLY hoping, that there is a way to do this.
Brian

Hi Brian,

Is the element number of the quoted elements constant from row to row? I'm not talking about the data between the quotes... I'm talking about the "field position". In your example, the 4th field is where the quotes appear. Would the 4th field always have quotes around it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

P Jones SSChampion Points: 12374 More actions · Answer 5

Since the table contains the contents of a csv file, go back to the original csv file and import it using SSIS. It doesn't worry if fields aren't quoted as long as they don't contain quotes or commas (or whatever delimiting character you have).

If you don't have the csv, do a select * in management studio and save the result as a csv.

Atat Rowcount Default port Points: 1453 More actions · Answer 6

Hi

Thanks for all the comments! Here's additional info to each responder:

@bitbucket-25253 : Thanks for the script, but as Craig describes, I need each on its own row, to resemble the CSV file.

@craig-2 Farrell : Thanks for the suggestion, I will try this and post the code so that you can see if I've gone astray 🙂

@jeff Moden : Probably, but not necessarily. Text data will have quotes and numeric data will not, but I cannot guarantee that a field that normally contains text data many not in some instances contain numeric data and not have quotes.

@Old Hand : This isn't a one-off. Also going back to the csv file is not possible, because the data is posted to our servers via http and we do not have the original file. We could write the file to disk and read it back again but I want to avoid that because this will be happening large scale.

Thanks everyone so far!

Brian

Atat Rowcount Default port Points: 1453 More actions · Answer 7

BTW in terms of extra information or DDL I'm not sure what people want.

CSV data with a fixed number of fields (in my case, 13), and an unlimited number of rows, to be imported into a SQL table with the same number of columns as there are fields. Direct 1-1 import. No fancy stuff 🙂

Brian

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 8

Brian McGee-355525 (1/24/2011)
BTW in terms of extra information or DDL I'm not sure what people want.
CSV data with a fixed number of fields (in my case, 13), and an unlimited number of rows, to be imported into a SQL table with the same number of columns as there are fields. Direct 1-1 import. No fancy stuff 🙂
Brian

DDL (Data Definition Language), the CREATE TABLE statement(s) you would use to create the source and destination tables from scratch.

Please read the first article I reference in my signature block regarding asking for help. If you follow the instructions in that article you will get much better answers and receive tested code in return.

Atat Rowcount Default port Points: 1453 More actions · Answer 9

Source = variable passed as SP parameter, datatype nvarchar(max).

Destination is this table

Create Table MyTempTable(

FIELD1nvarchar(max),

FIELD2nvarchar(max),

FIELD3nvarchar(max),

FIELD4nvarchar(max),

FIELD5nvarchar(max),

FIELD6nvarchar(max),

FIELD7nvarchar(max),

FIELD8nvarchar(max),

FIELD9nvarchar(max),

FIELD10nvarchar(max),

FIELD11nvarchar(max),

FIELD12nvarchar(max),

FIELD13nvarchar(max))

David Burrows SSC Guru Points: 65144 More actions · Answer 10

The main problem as I see it is the possiblity of commas between the quotes in a text column.

Other than that what I did when I had a similar problem was to insert tht column in to temp table with 13 additional columns and perform 13 updates to parse the first column in to each of the 13 one at a time removing the value from the first at the same time.

Far away is close at hand in the images of elsewhere.
Anon.

David Burrows SSC Guru Points: 65144 More actions · Answer 11

David Burrows (1/24/2011)
The main problem as I see it is the possiblity of commas between the quotes in a text column.

This will replace commas between quotes with the pipe character

STUFF([column],CHARINDEX(',',[column],PATINDEX('%"%,%"%',[column])),1,'|')

This could be done repeatedly to replace all occurrances and returned to comma during the split

Far away is close at hand in the images of elsewhere.
Anon.

Atat Rowcount Default port Points: 1453 More actions · Answer 12

@david-2 Burrows - THANK YOU 🙂

That is fantastic - that's exactly what I need. In the middle of writing horrible nested loops here and that has cut a big chunk of it out!

Thanks again!

Brian

Atat Rowcount Default port Points: 1453 More actions · Answer 13

Ok - I have it 🙂 And again many thanks to everyone, especially Dave Burrows. Excerpt below. Names changed to protect the innocent, and the not so. BTW 'ParseFile' is just a function that returns a table of lines parsed on ascii 13/10.

One cursor instead of three, and on 8000 records this runs in 52 seconds which is fine for my requirements. It's an improvement on 14 minutes anyway.

CREATE Proc MyProc

@datavarbinary(max)

As

Set NoCount On

Begin Try

Declare@CHARDATAvarchar(max) = convert(varchar(max),@DATA),

@THE_LINEnvarchar(max),

@sqlnvarchar(max)

If Exists(Select 1 From sysobjects Where NAME = '_BMUB') Drop Table _BMUB

Create Table _BMUB(

IMPORT_IDuniqueidentifier primary key nonclustered default newid(),

FIELD1nvarchar(max),

FIELD2nvarchar(max),

FIELD3nvarchar(max),

FIELD4nvarchar(max),

FIELD5nvarchar(max),

FIELD6nvarchar(max),

FIELD7nvarchar(max),

FIELD8nvarchar(max),

FIELD9nvarchar(max),

FIELD10nvarchar(max),

FIELD11nvarchar(max),

FIELD12nvarchar(max),

FIELD13nvarchar(max))

Declare @LINES table(LINE_ID int identity(1,1), THE_LINE nvarchar(max))

Insert@LINES(THE_LINE) Select * From ParseFile(@CHARDATA,char(13)) -- put each line separately into @LINES

Delete@LINES Where Len(Replace(THE_LINE,char(13),'')) = 0 -- drop dud lines

Delete@LINES Where Len(Replace(THE_LINE,char(10),'')) = 0 -- drop dud lines

Update@LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed

Update@LINES Set THE_LINE = Stuff(THE_LINE,CharIndex(',',THE_LINE,PatIndex('%"%,%"%',THE_LINE)),1,'|**|') -- replace commas in quotes with parsing placeholder

Where THE_LINE Like '%"%'

Update@LINES Set THE_LINE = Replace(THE_LINE,'''','''''') -- escape apostrophes

Update@LINES Set THE_LINE = Replace(THE_LINE,'"','') -- strip out quotes

Update@LINES Set THE_LINE = '''' + Replace(THE_LINE,',',''',''') + '''' -- single quote each field

Update@LINES Set THE_LINE = Replace(THE_LINE,'|**|',',') -- return placeholder to comma

Delete@LINES Where Len(THE_LINE) - Len(Replace(THE_LINE,',','')) != 12 -- only allow inserts with 13 fields

DeclareLinesLoop Cursor For

SelectTHE_LINE From @LINES

OpenLinesLoop

FetchNext From LinesLoop Into @THE_LINE

While@@FETCH_STATUS = 0

Begin

Select @sql = 'Insert _BMUB(FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6,FIELD7,FIELD8,FIELD9,FIELD10,FIELD11,FIELD12,FIELD13) Select ' + @THE_LINE

Execsp_executesql @sql

FetchNext From LinesLoop Into @THE_LINE

End

CloseLinesLoop

Deallocate LinesLoop

select * from _BMUB

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 14

I've got a bit of a concern about this process. Specifically in your place-holder mechanic.

My concern there is that you you only want to replace the commas between the odd and the even quotes.

Does this code deal well when there are two quoted entries on the same line? IE: 1,2,3,"abc,def",4,5,"ghi,jkl",6?

Update @LINES Set THE_LINE = Stuff(THE_LINE,CharIndex(',',THE_LINE,PatIndex('%"%,%"%',THE_LINE)),1,'|**|') -- replace commas in quotes with parsing placeholder

Where THE_LINE Like '%"%'

You'll probably want to make sure it works with ," through ", as the beginning/ending delimiters to a quoted entry.

You can probably invert this code and save a few data passes:

Delete @LINES Where Len(Replace(THE_LINE,char(13),'')) = 0 -- drop dud lines

Delete @LINES Where Len(Replace(THE_LINE,char(10),'')) = 0 -- drop dud lines

Update @LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed

Swap it around to:

Update @LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed

Delete @LINES Where Len(The_Line) = 0 -- drop dud lines

I'd be very interested in seeing what this inline tvf looks like:

Insert @LINES(THE_LINE) Select * From ParseFile(@CHARDATA,char(13)) -- put each line separately into @LINES

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA