Surely there is a way to import CSV data in this situation?

  • Hi Craig

    Well spotted on the placeholder piece, yes I've already hit the problem you mentioned and have not found a solution yet.

    Here's ParseFile:

    CREATE Function ParseFile(

    @STRINGnvarchar(max),

    @SEPARATORnchar(1)

    )

    Returns @PARSEDSTRING table(STRING nvarchar(max))

    As

    Begin

    Declare@POSITION int = 1

    Select@STRING = @STRING + @SEPARATOR

    While charindex(@SEPARATOR,@STRING,@POSITION) != 0

    Begin

    Insert@PARSEDSTRING(STRING)

    SelectSubString(@STRING,@POSITION,CharIndex(@SEPARATOR,@STRING,@POSITION) - @POSITION)

    Select@POSITION = CharIndex(@SEPARATOR,@STRING,@POSITION) + 1

    End

    Return

    End

  • Try swapping:

    PatIndex('%"%,%"%',THE_LINE)),

    -- with this:

    PatIndex('%,"%,%",%',THE_LINE)),

    Note where the additional commas are to detect beginning and ending doublequotes.

    I may have an alternative for you but it's going to be a busy day. Will try to get back this afternoon.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (1/24/2011)


    Try swapping:

    PatIndex('%"%,%"%',THE_LINE)),

    -- with this:

    PatIndex('%,"%,%",%',THE_LINE)),

    Note where the additional commas are to detect beginning and ending doublequotes.

    I may have an alternative for you but it's going to be a busy day. Will try to get back this afternoon.

    Tried that, no joy :crying:

    Far away is close at hand in the images of elsewhere.
    Anon.

  • This solution uses a temp table to add a unique id to the data

    It needs to be repeated until no updates are left

    (sorry about the short names but it was getting late and I got fed up with long typing)

    CREATE TABLE #tmp (I int,X varchar(1000))

    INSERT #tmp VALUES (1,'1,2,3,"abc,def",4,5,"ghi,jkl",6,7,"8,"""",x,y"""",",9')

    INSERT #tmp VALUES (2,'1,2,3,"abcdef",4,5,"ghijkl",6')

    WITH b AS (

    SELECT I,

    ROW_NUMBER() OVER(PARTITION BY I ORDER BY I) AS [P],t.N

    FROM #tmp x

    JOIN Tally t ON t.N BETWEEN 1 AND LEN(X)

    WHERE ((t.N=1 AND SUBSTRING(X,t.N,1) = '"') OR (SUBSTRING(X,t.N-1,2) LIKE '[^"]"'))

    AND SUBSTRING(X,t.N-1,3) LIKE '[^"]"[^"]'

    )

    UPDATE a SET a.X=STUFF(a.X,b.N+CHARINDEX(',',SUBSTRING(a.X,b.N,c.N-b.N))-1,1,'|')

    FROM #tmp a

    JOIN b b ON b.I=a.I AND P%2=1

    JOIN b c ON c.I=b.I AND c.P=b.P+1

    WHERE SUBSTRING(a.X,b.N,c.N-b.N) LIKE '%,%'

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Anyone have some code to generate a million row table for this particular problem? My brain isn't quite functional at the moment, but I may have a solution that needs further testing.

  • Just in case someone wants to play with my code, here it is.

    create function dbo.ChangeDelimiters (

    @pLine varchar(max),

    @pSDim char(1),

    @pRDim char(1)

    ) returns table

    as return

    with

    a1 as (select 1 as N union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1),

    a2 as (select

    1 as N

    from

    a1 as a

    cross join a1 as b),

    a3 as (select

    1 as N

    from

    a2 as a

    cross join a2 as b),

    Tally as (select top (len(@pLine))

    row_number() over (order by a.N) as N

    from

    a3 as a

    cross join a2 as b),

    ItemSplit(

    ItemOrder,

    Item

    ) as (

    SELECT

    N,

    SUBSTRING(@pLine,N,1)

    FROM

    Tally

    WHERE

    N <= LEN(@pLine)

    ),

    QuoteSplit as (

    select

    row_number() over (order by ItemOrder) as ItemID,

    Item

    from

    ItemSplit),

    QuoteRep as (

    select

    ((ROW_NUMBER() over (order by ItemID)+1)/2) as QuotePos,

    ItemID,

    Item

    from

    QuoteSplit

    where

    Item = '"'

    ),

    QuoteChange as (

    select

    QuotePos,

    min(ItemID) as StartPos,

    max(ItemID) as EndPos

    from

    QuoteRep

    group by

    QuotePos

    ),

    ReJoin as (

    select

    case when qc.QuotePos is not null and its.Item = @pSDim then @pRDim else its.Item end as Item

    from

    ItemSplit its

    left outer join QuoteChange qc

    on (its.ItemOrder >= qc.StartPos

    and its.ItemOrder <= qc.EndPos)

    )

    select

    Rec = ((select '' + Item from ReJoin for xml path(''),TYPE).value('.','varchar(max)'))

    ;

    go

    declare @TestTab table (

    --RecID int identity(1,1),

    Rec varchar(max)

    );

    insert into @TestTab (Rec)

    select '"1,2,3",4,5,6,"7,8",9' union all

    select '123,456,"101,102,103",432,654,"909,908,907"';

    select

    *

    from

    @TestTab;

    select

    *

    from

    @TestTab

    cross apply dbo.ChangeDelimiters(Rec,',','~')

    ;

    update tt set

    tt.Rec = cd.Rec

    from

    @TestTab tt

    cross apply dbo.ChangeDelimiters(Rec,',','~') cd

    ;

    select

    *

    from

    @TestTab

    ;

  • See how this works for you:

    Sample Data:

    IF OBJECT_ID( 'DeFielder') IS NOT NULL

    DROP TABLE DeFielder

    CREATE TABLE DeFielder

    (RowID INT IDENTITY( 1, 1),

    FileTypeID INT,

    FileData VARCHAR(MAX)

    )

    ON [PRIMARY]

    GO

    INSERT INTO DeFielder( FileTypeID, FileData)

    SELECT1, '1,2,3,4,5,"abc,def",6,7,"yui,asd"' UNION ALL

    SELECT1, '1,"qrs",3,4,5,32,6,7,81' UNION ALL

    SELECT1, '1,"qrs",3,"xyz,abc,def,boo",5,32,6,7,81'

    GO

    Now, for the meat of the process. This is generic and I built it a bit ago, but you should be able to adapt it as needed. It transforms the data into a rowset (I know, not what you're looking for), and then all you need to do is Pivot it back to the way you want. See the adaptation on the end.

    DECLARE @FileData VARCHAR(MAX),

    @nextDelim INT,

    @nextQuote INT,

    @RowCounter INT,

    @FieldCounter INT,

    @Delim CHAR(1)

    SELECT @RowCounter = 0,

    @FieldCounter = 0,

    @nextDelim = 0,

    @nextQuote = 0,

    @Delim = ','-- Change this to pipe, comma, or what you need. Will accept TAB via CHAR()

    DECLARE curFileData Cursor FOR

    SELECT FileData FROM DeFielder WHERE FileTypeID = 1

    DECLARE @targetTable TABLE

    (RowNum INT,

    FieldPosition INT,

    colValue VarChar(50) -- Make sure this fits your largest expected field. We're shoving everything in it.

    )

    OPEN curFileData

    FETCH NEXT FROM curFileData INTO @FileData

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @FileData = @FileData + @Delim

    Set @RowCounter = @RowCounter + 1

    SET @FieldCounter = 1

    WHILE @FileData <> ''

    BEGIN

    SET @nextDelim = CHARINDEX( @delim, @FileData)

    SET @nextquote = CHARINDEX( '"', @FileData)

    -- If 0 on quote skip the quoted logic... no more quotes in string, make it think it's

    -- past next comma.

    IF @nextQuote = 0 SET @nextQuote = 999999

    IF @nextDelim = @nextQuote

    RAISERROR('Serious problem in defielder, please check it', 16, 1)

    PRINT @nextDelim

    PRINT @nextQuote

    -- Next Delimeter is hit before the next record is quoted.

    IF @nextDelim < @nextQuote

    BEGIN

    INSERT INTO @targetTable VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 1, @nextDelim - 1))

    SET @FileData = STUFF( @FileData, 1, @nextDelim, '')

    END

    IF @nextQuote = 1

    BEGIN

    --If it's not the first character for the first field, something's gone wrong

    -- Always working on the 'front' field, so skip the quote. Skip it on the end too.

    INSERT INTO @targetTable

    VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 2, CHARINDEX( '"', @fileData, @nextQuote + 1) -2 ))

    SET @FileData = STUFF( @FileData, 1, CHARINDEX( '"', @fileData, @nextQuote + 1) +1 /*add the delim*/, '')

    END

    PRINT @FileData

    SET @FieldCounter = @FieldCounter + 1

    END

    FETCH NEXT FROM curFileData INTO @FileData

    END

    CLOSE curFileData

    DEALLOCATE curFileData

    SELECT * FROM @targetTable

    And for the pivot:

    SELECT

    RowNum,

    [1], [2], [3], [4], [5], [6], [7], [8], [9]

    FROM

    @targetTable AS tt

    PIVOT

    ( MIN( ColValue )

    FOR FieldPosition IN ( [1], [2], [3], [4], [5], [6], [7], [8], [9])

    ) AS pvt

    This was a chunk of code I was working on to try out procedures passing table variables around, still haven't gotten around to it but it seemed like a worthy candidate. There may be a faster, more efficient way to do this, but I know this functions. It will get you out of the hole until someone comes up with better stuff. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Million Row Testbed:

    INSERT INTO DeFielder (FileTypeID, FileData)

    SELECT

    1,

    CASE WHEN t.N%2 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%3 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%4 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%5 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%6 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%7 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%8 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%9 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +

    CASE WHEN t.N%10 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END AS FileData

    FROM

    tempdb..Tally AS t,

    tempdb..Tally AS t2

    WHERE

    t.N <= 10000

    AND t2.N <= 100

    Modified code. Use this, not the above. VARCHAR(4000) makes sure you don't bang your head on the wall, and I left some print statements in the above that should have been removed after I tested on the new sample data to make sure I had grabbed the finished one.

    SET NOCOUNT ON

    DECLARE @FileData VARCHAR(MAX),

    @nextDelim INT,

    @nextQuote INT,

    @RowCounter INT,

    @FieldCounter INT,

    @Delim CHAR(1)

    SELECT @RowCounter = 0,

    @FieldCounter = 0,

    @nextDelim = 0,

    @nextQuote = 0,

    @Delim = ','-- Change this to pipe, comma, or what you need. Will accept TAB via CHAR()

    DECLARE curFileData Cursor FOR

    SELECT FileData FROM DeFielder WHERE FileTypeID = 1

    DECLARE @targetTable TABLE

    (RowNum INT,

    FieldPosition INT,

    colValue VarChar(4000) -- Make sure this fits your largest expected field. We're shoving everything in it.

    )

    OPEN curFileData

    FETCH NEXT FROM curFileData INTO @FileData

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @FileData = @FileData + @Delim

    Set @RowCounter = @RowCounter + 1

    SET @FieldCounter = 1

    WHILE @FileData <> ''

    BEGIN

    SET @nextDelim = CHARINDEX( @delim, @FileData)

    SET @nextquote = CHARINDEX( '"', @FileData)

    -- If 0 on quote skip the quoted logic... no more quotes in string, make it think it's

    -- past next comma.

    IF @nextQuote = 0 SET @nextQuote = 999999

    IF @nextDelim = @nextQuote

    RAISERROR('Serious problem in defielder, please check it', 16, 1)

    -- Next Delimeter is hit before the next record is quoted.

    IF @nextDelim < @nextQuote

    BEGIN

    INSERT INTO @targetTable VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 1, @nextDelim - 1))

    SET @FileData = STUFF( @FileData, 1, @nextDelim, '')

    END

    IF @nextQuote = 1

    BEGIN

    --If it's not the first character for the first field, something's gone wrong

    -- Always working on the 'front' field, so skip the quote. Skip it on the end too.

    INSERT INTO @targetTable

    VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 2, CHARINDEX( '"', @fileData, @nextQuote +1) -2 ))

    SET @FileData = STUFF( @FileData, 1, CHARINDEX( '"', @fileData, @nextQuote + 1) +1 /*add the delim*/, '')

    END

    SET @FieldCounter = @FieldCounter + 1

    END

    FETCH NEXT FROM curFileData INTO @FileData

    END

    CLOSE curFileData

    DEALLOCATE curFileData

    This dies in the million row test. Time to complete:

    6:30 without the pivot, 7:45 with the pivot, on my desktop.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hi.

    Wow - the help on here is fantastic. I'm going to have to start offering my help to others more - I'm really starting to feel guilty 🙂

    Lynn I'm currently working through your solution, which I love due to the lack of loops. A big thanks to everyone who has offered solutions and help!

    Brian

  • Lynn, won't QuoteSplit and ItemSplit always contain the same data? Isn't QuoteSplit redundant here, and QuoteRep could really just pull from ItemSplit?

    Or have I missed something? 🙂

  • My query had a problem and for the record I post a revised version

    Based on a million rows of my test data

    '1,2,3,"abc,def",4,5,"ghi,jkl",6,7,"8,"""",x,y"""",",9'

    on my desktop, it took 7 iterations taking between 52 and 66 seconds per iteration

    WITH b AS (

    SELECT x.I,

    ROW_NUMBER() OVER(PARTITION BY x.I ORDER BY x.I,t.N) AS [P],t.N

    FROM TestTab x

    JOIN Tally t ON t.N BETWEEN 1 AND LEN(X)

    WHERE ((t.N=1 AND SUBSTRING(X,t.N,1) = '"') OR (SUBSTRING(X,t.N-1,2) LIKE '[^"]"'))

    AND (t.N>1 AND SUBSTRING(X+' ',t.N-1,3) LIKE '[^"]"[^"]')

    )

    UPDATE a SET a.X=STUFF(a.X,b.N+CHARINDEX(',',SUBSTRING(a.X,b.N,c.N-b.N))-1,1,'|')

    FROM TestTab a

    JOIN b b ON b.I=a.I AND P%2=1

    JOIN b c ON c.I=b.I AND c.P=b.P+1

    WHERE SUBSTRING(a.X,b.N,c.N-b.N) LIKE '%,%'

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Hi everyone

    Once again thanks to everyone for all the suggestions. I have implemented Lynn's suggestion with one or two minor tweaks, and it's working like a charm 🙂

    Thank you again to everyone who responded to my post, and got me out of this hole. Particular thanks to Lynn for the solution with no loops at all ! That is some VERY nice SQL! I'm 16 years working with T-SQL and I can honestly say I would never have been able to come up with that!

    Brian

  • Brian, would you please post your final solution? I would like to see what you tweaked.

    I usually make things more complicated than they need to be to start, so this would help me learn where I went off-track a bit.

  • Hi

    I removed QuoteSplit as it is redundant, and I added an A4, to give me more than 10000 rows available to the tally table. That's it really, apart from layout/naming conventions to suit our company etc!

    Before I paste the code, how do you create those code sample boxes that everyone uses here which retain the formatting and layout?

    Brian

  • Brian McGee-355525 (1/25/2011)


    Hi

    I removed QuoteSplit as it is redundant, and I added an A4, to give me more than 10000 rows available to the tally table. That's it really, apart from layout/naming conventions to suit our company etc!

    Before I paste the code, how do you create those code sample boxes that everyone uses here which retain the formatting and layout?

    Brian

    Over on the left, while posting, there's IfCode shortcuts. Highlight your code and then click the one that says code="sql".


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply