January 23, 2011 at 7:10 am
My situation:
1. I have a table in one database with a column which contains the contents of a CSV file
2. I have a table in another database into which I would like to insert this data
The CSV data, like most CSV data, can contain commas within fields, and this is indicated with double quotes, however not all fields are surrounded in double quotes. So of course 1,2,3,"4,5",6 is five fields, and not six.
Aha - Bulk Inserts and File Formats, but no....because I am not reading from the file system
This situation will occur frequently, and the CSV files will usually contain in excess of 10,000 rows per file, therefore code to split lines/fields and loop through with cursors to parse manually, takes much too long to process.
As a recent mover from SQL 2000 to 2008, I am often discovering new commands and approaches, and I am hoping, REALLY hoping, that there is a way to do this.
Brian
January 23, 2011 at 10:40 am
Here is a tool writen by Jeff Moden and posted to SSC. It might be just what you need.
CREATE FUNCTION [dbo].[DelimitedSplit8K_old]
/***************************************************************************************************
Purpose:
Split a given string at a given delimiter and return a list
of the split elements (items).
Usage Example:
SELECT *
FROM dbo.DelimitedSplit8K(@StringToSplit, @Delimiter)
Notes:
1. Optimized for VARCHAR(8000) or less.
2. Optimized for single character delimiter.
3. Does not "trim" elements just in case leading or trailing
blanks are intended.
4. cteTally concept originally by Iztek Ben Gan and
"decimalized" by Lynn Pettis (and others) for a bit of
extra speed and finally redacted by Jeff Moden for a
different slant on readability and compactness.
5. If you don''t know how a Tally table can be used to replace
loops, please see the following article...
http://www.sqlservercentral.com/articles/T-SQL/62867/
Revision History:
Rev 00 - 20 Jan 2010 - Jeff Moden
- Final redaction and test.
Rev 01 - 08 Mar 2010 - Jeff Moden
- Changed UNION ALL to UNPIVOT for bit of extra speed.
- See Gianluca Sartori''s solution on the following URL
http://ask.sqlservercentral.com/questions/4241/whats-the-best-way-to-solve-the-fizzbuzz-question
***************************************************************************************************/
--===== Define I/O parameters
(
@pString VARCHAR(8000),
@pDelimiter CHAR(1)
)
RETURNS TABLE
AS
RETURN
--===== "Inline" CTE Driven "Tally Tableβ produces values up to
-- 10,000... enough to cover VARCHAR(8000)
WITH
E1(N) AS ( --=== Create Ten 1''s very quickly
SELECT N
FROM (SELECT 1 AS N0, 1 AS N1, 1 AS N2, 1 AS N3, 1 AS N4,
1 AS N5, 1 AS N6, 1 AS N7, 1 AS N8, 1 AS N9) AS E0
UNPIVOT (N FOR EN IN (N0, N1, N2, N3, N4, N5, N6, N7, N8, N9)) AS unpvt
),--10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N)
FROM E4)
--===== Do the split
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
N AS StartPosition,
SUBSTRING(@pDelimiter + @pString,
N+1,
CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item
FROM cteTally
WHERE N < LEN(@pString) + 2
AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter
;
January 23, 2011 at 12:53 pm
Ron,
Only one problem, that pivots a comma delimited string into a multi-line table. He needs 1 row - delimited 1 row.
A quick and dirty solution would be a while loop of some kind. First in the loop determine the next position of the " or the ,. If it's a ", find your position from quote to quote. Take that internal to your column. If it's a comma, go from comma to comma, and drag that in.
Repeat until you run out of commas on the line. You'll probably want to append a final 'comma' onto the list to make sure you've got an EndOfRow Indicator, which the system can see when dealing with flat files but you can't easily.
If you've only got to do this once or twice, this is your best bet. Easier to code and it's a one shot so you don't care as much about performance.
If you've to do this regularly and need it cleaned up for speed, post some DDL and sample data here on the source and target table and we can probably help you out further.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 23, 2011 at 1:31 pm
Actually, we need more information from teh OP. If we could get the DDL (CREATE TABLE statements) for the source and target tables, sample data for the source table, and expected results in the target table based on the sample data, we could provide much better answers than simply guessing at what the true requirements are for developing a solution.
January 23, 2011 at 10:03 pm
Brian McGee-355525 (1/23/2011)
My situation:1. I have a table in one database with a column which contains the contents of a CSV file
2. I have a table in another database into which I would like to insert this data
The CSV data, like most CSV data, can contain commas within fields, and this is indicated with double quotes, however not all fields are surrounded in double quotes. So of course 1,2,3,"4,5",6 is five fields, and not six.
Aha - Bulk Inserts and File Formats, but no....because I am not reading from the file system
This situation will occur frequently, and the CSV files will usually contain in excess of 10,000 rows per file, therefore code to split lines/fields and loop through with cursors to parse manually, takes much too long to process.
As a recent mover from SQL 2000 to 2008, I am often discovering new commands and approaches, and I am hoping, REALLY hoping, that there is a way to do this.
Brian
Hi Brian,
Is the element number of the quoted elements constant from row to row? I'm not talking about the data between the quotes... I'm talking about the "field position". In your example, the 4th field is where the quotes appear. Would the 4th field always have quotes around it?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 24, 2011 at 1:12 am
Since the table contains the contents of a csv file, go back to the original csv file and import it using SSIS. It doesn't worry if fields aren't quoted as long as they don't contain quotes or commas (or whatever delimiting character you have).
If you don't have the csv, do a select * in management studio and save the result as a csv.
January 24, 2011 at 3:36 am
Hi
Thanks for all the comments! Here's additional info to each responder:
@bitbucket-25253 : Thanks for the script, but as Craig describes, I need each on its own row, to resemble the CSV file.
@Craig Farrell : Thanks for the suggestion, I will try this and post the code so that you can see if I've gone astray π
@jeff Moden : Probably, but not necessarily. Text data will have quotes and numeric data will not, but I cannot guarantee that a field that normally contains text data many not in some instances contain numeric data and not have quotes.
@Old Hand : This isn't a one-off. Also going back to the csv file is not possible, because the data is posted to our servers via http and we do not have the original file. We could write the file to disk and read it back again but I want to avoid that because this will be happening large scale.
Thanks everyone so far!
Brian
January 24, 2011 at 3:37 am
BTW in terms of extra information or DDL I'm not sure what people want.
CSV data with a fixed number of fields (in my case, 13), and an unlimited number of rows, to be imported into a SQL table with the same number of columns as there are fields. Direct 1-1 import. No fancy stuff π
Brian
January 24, 2011 at 4:52 am
Brian McGee-355525 (1/24/2011)
BTW in terms of extra information or DDL I'm not sure what people want.CSV data with a fixed number of fields (in my case, 13), and an unlimited number of rows, to be imported into a SQL table with the same number of columns as there are fields. Direct 1-1 import. No fancy stuff π
Brian
DDL (Data Definition Language), the CREATE TABLE statement(s) you would use to create the source and destination tables from scratch.
Please read the first article I reference in my signature block regarding asking for help. If you follow the instructions in that article you will get much better answers and receive tested code in return.
January 24, 2011 at 6:08 am
Source = variable passed as SP parameter, datatype nvarchar(max).
Destination is this table
Create Table MyTempTable(
FIELD1nvarchar(max),
FIELD2nvarchar(max),
FIELD3nvarchar(max),
FIELD4nvarchar(max),
FIELD5nvarchar(max),
FIELD6nvarchar(max),
FIELD7nvarchar(max),
FIELD8nvarchar(max),
FIELD9nvarchar(max),
FIELD10nvarchar(max),
FIELD11nvarchar(max),
FIELD12nvarchar(max),
FIELD13nvarchar(max))
January 24, 2011 at 6:55 am
The main problem as I see it is the possiblity of commas between the quotes in a text column.
Other than that what I did when I had a similar problem was to insert tht column in to temp table with 13 additional columns and perform 13 updates to parse the first column in to each of the 13 one at a time removing the value from the first at the same time.
Far away is close at hand in the images of elsewhere.
Anon.
January 24, 2011 at 7:19 am
David Burrows (1/24/2011)
The main problem as I see it is the possiblity of commas between the quotes in a text column.
This will replace commas between quotes with the pipe character
STUFF([column],CHARINDEX(',',[column],PATINDEX('%"%,%"%',[column])),1,'|')
This could be done repeatedly to replace all occurrances and returned to comma during the split
Far away is close at hand in the images of elsewhere.
Anon.
January 24, 2011 at 8:33 am
Ok - I have it π And again many thanks to everyone, especially Dave Burrows. Excerpt below. Names changed to protect the innocent, and the not so. BTW 'ParseFile' is just a function that returns a table of lines parsed on ascii 13/10.
One cursor instead of three, and on 8000 records this runs in 52 seconds which is fine for my requirements. It's an improvement on 14 minutes anyway.
CREATE Proc MyProc
@datavarbinary(max)
As
Set NoCount On
Begin Try
Declare@CHARDATAvarchar(max) = convert(varchar(max),@DATA),
@THE_LINEnvarchar(max),
@SQLnvarchar(max)
If Exists(Select 1 From sysobjects Where NAME = '_BMUB') Drop Table _BMUB
Create Table _BMUB(
IMPORT_IDuniqueidentifier primary key nonclustered default newid(),
FIELD1nvarchar(max),
FIELD2nvarchar(max),
FIELD3nvarchar(max),
FIELD4nvarchar(max),
FIELD5nvarchar(max),
FIELD6nvarchar(max),
FIELD7nvarchar(max),
FIELD8nvarchar(max),
FIELD9nvarchar(max),
FIELD10nvarchar(max),
FIELD11nvarchar(max),
FIELD12nvarchar(max),
FIELD13nvarchar(max))
Declare @LINES table(LINE_ID int identity(1,1), THE_LINE nvarchar(max))
Insert@LINES(THE_LINE) Select * From ParseFile(@CHARDATA,char(13)) -- put each line separately into @LINES
Delete@LINES Where Len(Replace(THE_LINE,char(13),'')) = 0 -- drop dud lines
Delete@LINES Where Len(Replace(THE_LINE,char(10),'')) = 0 -- drop dud lines
Update@LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed
Update@LINES Set THE_LINE = Stuff(THE_LINE,CharIndex(',',THE_LINE,PatIndex('%"%,%"%',THE_LINE)),1,'|**|') -- replace commas in quotes with parsing placeholder
Where THE_LINE Like '%"%'
Update@LINES Set THE_LINE = Replace(THE_LINE,'''','''''') -- escape apostrophes
Update@LINES Set THE_LINE = Replace(THE_LINE,'"','') -- strip out quotes
Update@LINES Set THE_LINE = '''' + Replace(THE_LINE,',',''',''') + '''' -- single quote each field
Update@LINES Set THE_LINE = Replace(THE_LINE,'|**|',',') -- return placeholder to comma
Delete@LINES Where Len(THE_LINE) - Len(Replace(THE_LINE,',','')) != 12 -- only allow inserts with 13 fields
DeclareLinesLoop Cursor For
SelectTHE_LINE From @LINES
OpenLinesLoop
FetchNext From LinesLoop Into @THE_LINE
While@@FETCH_STATUS = 0
Begin
Select@SQL = 'Insert _BMUB(FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6,FIELD7,FIELD8,FIELD9,FIELD10,FIELD11,FIELD12,FIELD13) Select ' + @THE_LINE
Execsp_executesql @SQL
FetchNext From LinesLoop Into @THE_LINE
End
CloseLinesLoop
Deallocate LinesLoop
select * from _BMUB
January 24, 2011 at 10:18 am
I've got a bit of a concern about this process. Specifically in your place-holder mechanic.
My concern there is that you you only want to replace the commas between the odd and the even quotes.
Does this code deal well when there are two quoted entries on the same line? IE: 1,2,3,"abc,def",4,5,"ghi,jkl",6?
Update @LINES Set THE_LINE = Stuff(THE_LINE,CharIndex(',',THE_LINE,PatIndex('%"%,%"%',THE_LINE)),1,'|**|') -- replace commas in quotes with parsing placeholder
Where THE_LINE Like '%"%'
You'll probably want to make sure it works with ," through ", as the beginning/ending delimiters to a quoted entry.
You can probably invert this code and save a few data passes:
Delete @LINES Where Len(Replace(THE_LINE,char(13),'')) = 0 -- drop dud lines
Delete @LINES Where Len(Replace(THE_LINE,char(10),'')) = 0 -- drop dud lines
Update @LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed
Swap it around to:
Update @LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed
Delete @LINES Where Len(The_Line) = 0 -- drop dud lines
I'd be very interested in seeing what this inline tvf looks like:
Insert @LINES(THE_LINE) Select * From ParseFile(@CHARDATA,char(13)) -- put each line separately into @LINES
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 15 posts - 1 through 15 (of 33 total)
You must be logged in to reply to this topic. Login to reply