Surely there is a way to import CSV data in this situation?

  • My situation:

    1. I have a table in one database with a column which contains the contents of a CSV file

    2. I have a table in another database into which I would like to insert this data

    The CSV data, like most CSV data, can contain commas within fields, and this is indicated with double quotes, however not all fields are surrounded in double quotes. So of course 1,2,3,"4,5",6 is five fields, and not six.

    Aha - Bulk Inserts and File Formats, but no....because I am not reading from the file system

    This situation will occur frequently, and the CSV files will usually contain in excess of 10,000 rows per file, therefore code to split lines/fields and loop through with cursors to parse manually, takes much too long to process.

    As a recent mover from SQL 2000 to 2008, I am often discovering new commands and approaches, and I am hoping, REALLY hoping, that there is a way to do this.

    Brian

  • Here is a tool writen by Jeff Moden and posted to SSC. It might be just what you need.

    CREATE FUNCTION [dbo].[DelimitedSplit8K_old]

    /***************************************************************************************************

    Purpose:

    Split a given string at a given delimiter and return a list

    of the split elements (items).

    Usage Example:

    SELECT *

    FROM dbo.DelimitedSplit8K(@StringToSplit, @Delimiter)

    Notes:

    1. Optimized for VARCHAR(8000) or less.

    2. Optimized for single character delimiter.

    3. Does not "trim" elements just in case leading or trailing

    blanks are intended.

    4. cteTally concept originally by Iztek Ben Gan and

    "decimalized" by Lynn Pettis (and others) for a bit of

    extra speed and finally redacted by Jeff Moden for a

    different slant on readability and compactness.

    5. If you don''t know how a Tally table can be used to replace

    loops, please see the following article...

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    Revision History:

    Rev 00 - 20 Jan 2010 - Jeff Moden

    - Final redaction and test.

    Rev 01 - 08 Mar 2010 - Jeff Moden

    - Changed UNION ALL to UNPIVOT for bit of extra speed.

    - See Gianluca Sartori''s solution on the following URL

    http://ask.sqlservercentral.com/questions/4241/whats-the-best-way-to-solve-the-fizzbuzz-question

    ***************************************************************************************************/

    --===== Define I/O parameters

    (

    @pString VARCHAR(8000),

    @pDelimiter CHAR(1)

    )

    RETURNS TABLE

    AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table” produces values up to

    -- 10,000... enough to cover VARCHAR(8000)

    WITH

    E1(N) AS ( --=== Create Ten 1''s very quickly

    SELECT N

    FROM (SELECT 1 AS N0, 1 AS N1, 1 AS N2, 1 AS N3, 1 AS N4,

    1 AS N5, 1 AS N6, 1 AS N7, 1 AS N8, 1 AS N9) AS E0

    UNPIVOT (N FOR EN IN (N0, N1, N2, N3, N4, N5, N6, N7, N8, N9)) AS unpvt

    ),--10

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N)

    FROM E4)

    --===== Do the split

    SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

    N AS StartPosition,

    SUBSTRING(@pDelimiter + @pString,

    N+1,

    CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

    FROM cteTally

    WHERE N < LEN(@pString) + 2

    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

    ;

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Ron,

    Only one problem, that pivots a comma delimited string into a multi-line table. He needs 1 row - delimited 1 row.

    A quick and dirty solution would be a while loop of some kind. First in the loop determine the next position of the " or the ,. If it's a ", find your position from quote to quote. Take that internal to your column. If it's a comma, go from comma to comma, and drag that in.

    Repeat until you run out of commas on the line. You'll probably want to append a final 'comma' onto the list to make sure you've got an EndOfRow Indicator, which the system can see when dealing with flat files but you can't easily.

    If you've only got to do this once or twice, this is your best bet. Easier to code and it's a one shot so you don't care as much about performance.

    If you've to do this regularly and need it cleaned up for speed, post some DDL and sample data here on the source and target table and we can probably help you out further.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Actually, we need more information from teh OP. If we could get the DDL (CREATE TABLE statements) for the source and target tables, sample data for the source table, and expected results in the target table based on the sample data, we could provide much better answers than simply guessing at what the true requirements are for developing a solution.

  • Brian McGee-355525 (1/23/2011)


    My situation:

    1. I have a table in one database with a column which contains the contents of a CSV file

    2. I have a table in another database into which I would like to insert this data

    The CSV data, like most CSV data, can contain commas within fields, and this is indicated with double quotes, however not all fields are surrounded in double quotes. So of course 1,2,3,"4,5",6 is five fields, and not six.

    Aha - Bulk Inserts and File Formats, but no....because I am not reading from the file system

    This situation will occur frequently, and the CSV files will usually contain in excess of 10,000 rows per file, therefore code to split lines/fields and loop through with cursors to parse manually, takes much too long to process.

    As a recent mover from SQL 2000 to 2008, I am often discovering new commands and approaches, and I am hoping, REALLY hoping, that there is a way to do this.

    Brian

    Hi Brian,

    Is the element number of the quoted elements constant from row to row? I'm not talking about the data between the quotes... I'm talking about the "field position". In your example, the 4th field is where the quotes appear. Would the 4th field always have quotes around it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Since the table contains the contents of a csv file, go back to the original csv file and import it using SSIS. It doesn't worry if fields aren't quoted as long as they don't contain quotes or commas (or whatever delimiting character you have).

    If you don't have the csv, do a select * in management studio and save the result as a csv.

  • Hi

    Thanks for all the comments! Here's additional info to each responder:

    @bitbucket-25253 : Thanks for the script, but as Craig describes, I need each on its own row, to resemble the CSV file.

    @Craig Farrell : Thanks for the suggestion, I will try this and post the code so that you can see if I've gone astray πŸ™‚

    @jeff Moden : Probably, but not necessarily. Text data will have quotes and numeric data will not, but I cannot guarantee that a field that normally contains text data many not in some instances contain numeric data and not have quotes.

    @Old Hand : This isn't a one-off. Also going back to the csv file is not possible, because the data is posted to our servers via http and we do not have the original file. We could write the file to disk and read it back again but I want to avoid that because this will be happening large scale.

    Thanks everyone so far!

    Brian

  • BTW in terms of extra information or DDL I'm not sure what people want.

    CSV data with a fixed number of fields (in my case, 13), and an unlimited number of rows, to be imported into a SQL table with the same number of columns as there are fields. Direct 1-1 import. No fancy stuff πŸ™‚

    Brian

  • Brian McGee-355525 (1/24/2011)


    BTW in terms of extra information or DDL I'm not sure what people want.

    CSV data with a fixed number of fields (in my case, 13), and an unlimited number of rows, to be imported into a SQL table with the same number of columns as there are fields. Direct 1-1 import. No fancy stuff πŸ™‚

    Brian

    DDL (Data Definition Language), the CREATE TABLE statement(s) you would use to create the source and destination tables from scratch.

    Please read the first article I reference in my signature block regarding asking for help. If you follow the instructions in that article you will get much better answers and receive tested code in return.

  • Source = variable passed as SP parameter, datatype nvarchar(max).

    Destination is this table

    Create Table MyTempTable(

    FIELD1nvarchar(max),

    FIELD2nvarchar(max),

    FIELD3nvarchar(max),

    FIELD4nvarchar(max),

    FIELD5nvarchar(max),

    FIELD6nvarchar(max),

    FIELD7nvarchar(max),

    FIELD8nvarchar(max),

    FIELD9nvarchar(max),

    FIELD10nvarchar(max),

    FIELD11nvarchar(max),

    FIELD12nvarchar(max),

    FIELD13nvarchar(max))

  • The main problem as I see it is the possiblity of commas between the quotes in a text column.

    Other than that what I did when I had a similar problem was to insert tht column in to temp table with 13 additional columns and perform 13 updates to parse the first column in to each of the 13 one at a time removing the value from the first at the same time.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows (1/24/2011)


    The main problem as I see it is the possiblity of commas between the quotes in a text column.

    This will replace commas between quotes with the pipe character

    STUFF([column],CHARINDEX(',',[column],PATINDEX('%"%,%"%',[column])),1,'|')

    This could be done repeatedly to replace all occurrances and returned to comma during the split

    Far away is close at hand in the images of elsewhere.
    Anon.

  • @david-2 Burrows - THANK YOU πŸ™‚

    That is fantastic - that's exactly what I need. In the middle of writing horrible nested loops here and that has cut a big chunk of it out!

    Thanks again!

    Brian

  • Ok - I have it πŸ™‚ And again many thanks to everyone, especially Dave Burrows. Excerpt below. Names changed to protect the innocent, and the not so. BTW 'ParseFile' is just a function that returns a table of lines parsed on ascii 13/10.

    One cursor instead of three, and on 8000 records this runs in 52 seconds which is fine for my requirements. It's an improvement on 14 minutes anyway.

    CREATE Proc MyProc

    @datavarbinary(max)

    As

    Set NoCount On

    Begin Try

    Declare@CHARDATAvarchar(max) = convert(varchar(max),@DATA),

    @THE_LINEnvarchar(max),

    @sqlnvarchar(max)

    If Exists(Select 1 From sysobjects Where NAME = '_BMUB') Drop Table _BMUB

    Create Table _BMUB(

    IMPORT_IDuniqueidentifier primary key nonclustered default newid(),

    FIELD1nvarchar(max),

    FIELD2nvarchar(max),

    FIELD3nvarchar(max),

    FIELD4nvarchar(max),

    FIELD5nvarchar(max),

    FIELD6nvarchar(max),

    FIELD7nvarchar(max),

    FIELD8nvarchar(max),

    FIELD9nvarchar(max),

    FIELD10nvarchar(max),

    FIELD11nvarchar(max),

    FIELD12nvarchar(max),

    FIELD13nvarchar(max))

    Declare @LINES table(LINE_ID int identity(1,1), THE_LINE nvarchar(max))

    Insert@LINES(THE_LINE) Select * From ParseFile(@CHARDATA,char(13)) -- put each line separately into @LINES

    Delete@LINES Where Len(Replace(THE_LINE,char(13),'')) = 0 -- drop dud lines

    Delete@LINES Where Len(Replace(THE_LINE,char(10),'')) = 0 -- drop dud lines

    Update@LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed

    Update@LINES Set THE_LINE = Stuff(THE_LINE,CharIndex(',',THE_LINE,PatIndex('%"%,%"%',THE_LINE)),1,'|**|') -- replace commas in quotes with parsing placeholder

    Where THE_LINE Like '%"%'

    Update@LINES Set THE_LINE = Replace(THE_LINE,'''','''''') -- escape apostrophes

    Update@LINES Set THE_LINE = Replace(THE_LINE,'"','') -- strip out quotes

    Update@LINES Set THE_LINE = '''' + Replace(THE_LINE,',',''',''') + '''' -- single quote each field

    Update@LINES Set THE_LINE = Replace(THE_LINE,'|**|',',') -- return placeholder to comma

    Delete@LINES Where Len(THE_LINE) - Len(Replace(THE_LINE,',','')) != 12 -- only allow inserts with 13 fields

    DeclareLinesLoop Cursor For

    SelectTHE_LINE From @LINES

    OpenLinesLoop

    FetchNext From LinesLoop Into @THE_LINE

    While@@FETCH_STATUS = 0

    Begin

    Select@sql = 'Insert _BMUB(FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6,FIELD7,FIELD8,FIELD9,FIELD10,FIELD11,FIELD12,FIELD13) Select ' + @THE_LINE

    Execsp_executesql @sql

    FetchNext From LinesLoop Into @THE_LINE

    End

    CloseLinesLoop

    Deallocate LinesLoop

    select * from _BMUB

  • I've got a bit of a concern about this process. Specifically in your place-holder mechanic.

    My concern there is that you you only want to replace the commas between the odd and the even quotes.

    Does this code deal well when there are two quoted entries on the same line? IE: 1,2,3,"abc,def",4,5,"ghi,jkl",6?

    Update @LINES Set THE_LINE = Stuff(THE_LINE,CharIndex(',',THE_LINE,PatIndex('%"%,%"%',THE_LINE)),1,'|**|') -- replace commas in quotes with parsing placeholder

    Where THE_LINE Like '%"%'

    You'll probably want to make sure it works with ," through ", as the beginning/ending delimiters to a quoted entry.

    You can probably invert this code and save a few data passes:

    Delete @LINES Where Len(Replace(THE_LINE,char(13),'')) = 0 -- drop dud lines

    Delete @LINES Where Len(Replace(THE_LINE,char(10),'')) = 0 -- drop dud lines

    Update @LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed

    Swap it around to:

    Update @LINES Set THE_LINE = Replace(Replace(THE_LINE,char(13),''),char(10),'') -- drop carriage return and linefeed

    Delete @LINES Where Len(The_Line) = 0 -- drop dud lines

    I'd be very interested in seeing what this inline tvf looks like:

    Insert @LINES(THE_LINE) Select * From ParseFile(@CHARDATA,char(13)) -- put each line separately into @LINES


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply