differiniate quotes as data vs quotes as column identifier

  • Ghanta...

    I believe I have your solution. I'll post it both here and on SQL Team (your other post, I believe) but I wonder if you'd do me a favor first... if the information is not proprietary, contains no sensitive information, and violates no privacy laws, would you 1) post the first 20 rows and 2) tell me how many rows there will likely be on each load?

    If the information is proprietary, contains sensitive information, or it would violate privacy laws to post it, would you make up the first 3 rows of the file and post it? I need a model to test against before I post my findings.

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff,

    This will help so many folks... My data is comma delimited with quotes around only on those field that has embedded comma (each file can have different fields and fields will have quotes only if there is comma in it). First row has the column name which I will use to create the staging table. Let me know if you need more info... Thank you so muchhhh Jeff...

    ID,ADD_1,ADD_CD,MEM,FullName

    6000014623,"1310 FEARRINGTON POST",00,this is a sample,"tiger, scott"

    6000014646,"532 BURNETT CIR",00,this is a sample,"tiger, scott"

    6000014649,"2106 NC HIGHWAY 87 N",03,this is a sample,"tiger, scott"

    6000014663,"1864 MANNS CHAPEL RD",00,this is a sample,"tiger, scott"

    6000014668,"20 SILVERSTONE DR",00,this is a sample,"tiger, scott"

    6000014669,"159 LAURA JOHNSON RD",00,this is a sample,"tiger, scott"

    6000014670,"RR 1",02,this is a sample,"tiger, scott"

    6000014686,"PO BOX 132",00,this is a sample,"tiger, scott"

    6000014695,"137 FEARRINGTON POST",00,this is a sample,"tiger, scott"

    6000014704,"339 TAYLOR RD",03,this is a sample,"tiger, scott"

    6000014707,"PO BOX 880",00,this is a sample,"tiger, scott"

    6000014714,"904 FEARRINGTON POST",00,this is a sample,"tiger, scott"

    6000014747,"|2622 US HIGHWAY 64 W",00,this is a sample,"tiger, scott"

    6000014748,"PO BOX 518",00,this is a sample,"tiger, scott"

    6000014756,"866 FEARRINGTON POST",00,this is a sample,"tiger, scott"

    6000015365,"1825 EAST ST",00,this is a sample,"tiger, scott"

    6000015493,"407 BRAMPTON CLOSE W",02,this is a sample,"tiger, scott"

    6000015499,"134 HUDSON HILLS RD",00,this is a sample,"tiger, scott"

    6000015500,"321 RECTORY ST",00,this is a sample,"tiger, scott"

    6000015501,"624 HAMLET CHAPEL RD",00,this is a sample,"tiger, scott"

    6050860067,"P,O, BOX 92, WEST END",05,this is a sample,"tiger, scott"

    6050864122,"CALLE RAMIRO 1, 20, 3B",02,this is a sample,"tiger, scott"

    6050864480,"EUROVILLA 1, 803A",02,this is a sample,"tiger, scott"

    6050864782,"WAH CHI COURT, FLAT D 30TH FLOOR",05,this is a sample,"tiger, scott"

    6053339447,"KINGOSGADE 13, 3TV",05,this is a sample,"tiger, scott"

    6053339541,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"

    6053339542,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"

    6053448211,"L'ARGENTIERE BAT A, 637 BD TAVERNIERE",02,this is a sample,"tiger, scott"

    6053450414,"FLAT 43, FREDERICK COURT",02,this is a sample,"tiger, scott"

    6053451508,"AVENIDA FRANCISCO BHERING, 17/201",02,this is a sample,"tiger, scott"

    6053451663,"FLAT 5A, CHENYU COURT",02,this is a sample,"tiger, scott"

    6053451664,"FLAT 5A, CHENYU COURT",02,this is a sample,"tiger, scott"

    6053522326,"CASA VERDE, MANO IZQUIERDA, 175 M NORTE",02,this is a sample,"tiger, scott"

    6053563849,"1 CASTLE PEEK RD BLOCK 6 10-F, A",05,this is a sample,"tiger, scott"

    6055150555,"BUJI TWON,SHENZHEN, GUANGDONG",02,this is a sample,"tiger, scott"

    6057020037,"LINCEY HOUSE, WHITTINGTON ROAD",02,this is a sample,"tiger, scott"

    6057020038,"LINCEY HOUSE, WHITTINGTON ROAD",02,this is a sample,"tiger, scott"

    6057020076,"3186 COUNTY RD |26, RR 2",02,this is a sample,"tiger, scott"

    6057020077,"3186 COUNTY RD |26, RR 2",02,this is a sample,"tiger, scott"

    6057020127,"AV, SANTO TORIBIO 298, APT, 401",02,this is a sample,"tiger, scott"

  • As far as number of rows are concerned, most of the files will be small around 100,000 to 200,000 rows with around 10 fields, but small number (1 a day) might be around few million records with 30 fields.

  • Ok.... a couple of questions...

    First, some data from your post...

    [font="Courier New"]6053339541,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"

    __________  ________________________________  __ ________________  ____________ 

    6053339542,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"

    __________  _____________________________________  __ ________________  ____________

    6053448211,"L'ARGENTIERE BAT A, 637 BD TAVERNIERE",02,this is a sample,"tiger, scott"[/font]

    1. Except for the first column (ID), the first 2 rows above appear to be duplicates. Should one of them be eliminated? If so, which one do you prefer to be eliminated?

    2. How do you want these split? I put underscores over the second 2 rows where a true CSV splitter would consider fields to be...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Also, I don't see a single example of conditional quoting. It would appear to be a stable format as follows...

    1st field - never quoted

    2nd field - always quoted

    3rd field - never quoted

    4th field - never quoted

    5th field - always quoted

    And, none of the quoted fields have embedded quotes.

    Do you have any rows that deviate from that other than the header itself?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • First, some data from your post...

    6053339541,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"

    __________ ________________________________ __ ________________ ____________

    6053339542,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"

    __________ _____________________________________ __ ________________ ____________

    6053448211,"L'ARGENTIERE BAT A, 637 BD TAVERNIERE",02,this is a sample,"tiger, scott"

    1. Except for the first column (ID), the first 2 rows above appear to be duplicates. Should one of them be eliminated? If so, which one do you prefer to be eliminated? --- Duplicate is fine, I need to stage entire data from file

    2. How do you want these split? I put underscores over the second 2 rows where a true CSV splitter would consider fields to be... --- Split looks good to me, each field is separated by comma and embedded ones have quotes... we don't need quotes in the table or we need to clean the data

    Jeff can't thank you enough for this help.

  • BobMcC (2/18/2008)


    I need to import (using DTS) some comma delimited text files into sql server 2000.

    I have discovered that the quotation mark is getting dual use. Not only is some of column data surrounded by quotes, but sometimes the quote marks are part of the data itself, and that's causing problems.

    i.e.

    1234, "some data here", "John "jack" Smith", 1234, 1234

    What's the conventional wisdom on how to handle that? It looks like something I could key on using regular expressions.

    Any suggestions?

    Thanks,

    E3

    Sorry, Bob... I almost forgot about you.

    The row you presented creates no problems for a BCP format file (except if the header row is different). It's not correctly formatted CSV where embedded quotes should actually be doubled up, but, like I said, a BCP format file will easily handle this to a tune of well over 5 million rows per minute (I did 5.1 million 20 column rows in 60 seconds flat on a "old" machine).

    Can you provide the first 20 rows of data from the file. Same rules apply that I told Ghanta... nothing private, nothing proprietary. Please be sure to include the actual header.

    Otherwise, the BCP format file I previously posted should work for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/25/2009)


    Also, I don't see a single example of conditional quoting. It would appear to be a stable format as follows...

    1st field - never quoted

    2nd field - always quoted

    3rd field - never quoted

    4th field - never quoted

    5th field - always quoted

    And, none of the quoted fields have embedded quotes.

    Do you have any rows that deviate from that other than the header itself?

    If any of the text in a column has embedded comma then the entire field will have Quotes

    2nd field was address and I put that example as most likely they will have embedded comma and I don't expect comma in ID so had that as example for field without quotes

    3rd can be anything and quotes will depend on data, same with 4th, 5th was just sample and I used full name with comma... but it can be lastname and we see many records have two lastnames with comma...

  • Jeff Moden (7/25/2009)


    Also, I don't see a single example of conditional quoting. It would appear to be a stable format as follows...

    1st field - never quoted

    2nd field - always quoted

    3rd field - never quoted

    4th field - never quoted

    5th field - always quoted

    And, none of the quoted fields have embedded quotes.

    Do you have any rows that deviate from that other than the header itself?

    Then, the only problem is that the header isn't formatted the same way as the rest of the rows and BCP counts delimiters when trying to skip rows (BIG complaint about BCP for me). We can fix that pretty easily... I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ghanta (7/25/2009)


    Jeff Moden (7/25/2009)


    Also, I don't see a single example of conditional quoting. It would appear to be a stable format as follows...

    1st field - never quoted

    2nd field - always quoted

    3rd field - never quoted

    4th field - never quoted

    5th field - always quoted

    And, none of the quoted fields have embedded quotes.

    Do you have any rows that deviate from that other than the header itself?

    If any of the text in a column has embedded comma then the entire field will have Quotes

    2nd field was address and I put that example as most likely they will have embedded comma and I don't expect comma in ID so had that as example for field without quotes

    3rd can be anything and quotes will depend on data, same with 4th, 5th was just sample and I used full name with comma... but it can be lastname and we see many records have two lastnames with comma...

    Ah... got it... basically a mixed bag and totally data dependent for everything except the ID field. Provided they don't embed a single quote anywhere (not talking about the apostrophe), I've got that solved... Lemme put together the answer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah... sorry, Ghanta... I've got just one more question...

    Do you know how to BULK INSERT a file into a single column table or not?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I know that Jeff. Thanks for all ur help...hope I am not ruining your weekend.

  • Ummmm.... "Weekend".... new word for me... guess I'll have to look it up. 😛

    Since you know how to do the BULK INSERT, we'll start with a staging table... I'll be back soon.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... here's a full up working example with the data you provided. As always, the detail is in the comments in the code...

    --===========================================================================================================
    --      We need a function that will convert the pain-in-the-hiney conditionally text qualified CSV 
    --      into something with consistent delimiters so we can split the column out of each row easily
    --      even when the presence of quotes may change from row to row.
    --===========================================================================================================
     CREATE FUNCTION dbo.TrueCsvSplit(@pString VARCHAR(7998),@pDelimiter VARCHAR(1))
    /**********************************************************************************************************************
     Purpose:
     This function prepares a "TRUE CSV" string for splitting by correctly finding and replacing all possible valid 
     delimiters with the desired character. It also adds the delimiter to the very beginning and end of the converted
     string.  Further, any spaces wrapped up between delimiters are eliminated.
    
     Revision History:
     Rev 00 - 17 Sep 2008 - Jeff Moden - Proof of concept for Sergiy on SQLServerCentral.com
     Rev 01 - 26 Jul 2009 - Jeff Moden - Converted delimiter to use for final split to an input and formalized code.
    **********************************************************************************************************************/
    RETURNS VARCHAR(8000)
         AS
      BEGIN
    
    --===== Define and preset the local variables.  Most are self explanatory.  The "QuoteFlop" variables keep track if a 
         -- given quote is "labeled" as a 1 or a 0. (1st quote is a 1, 2nd is a 0, 3rd is a 1, etc, etc)
    DECLARE @ReturnString  VARCHAR(8000),
            @PrevChar      VARCHAR(1), --MUST be VarChar or unwanted spaces appear
            @CurChar       VARCHAR(1), --MUST be VarChar or unwanted spaces appear
            @PrevQuoteFlop TINYINT,
            @CurQuoteFlop  TINYINT,
            @Keep          TINYINT,
            @KillSpace     TINYINT
     
     SELECT @PrevChar      = ' ',    
            @PrevQuoteFlop = 0,
            @ReturnString  = '',
            @KillSpace     = 1
     
    --===== Find and conditionally replace all valid "true" CSV delimiters with the selected character using a set based 
         -- loop formed by the Tally table.  This is to make the actual splitting of a "true" CSV easier and will beat
         -- the tar out of the equivelent Cursor or While Loop.
     SELECT @CurChar       = SUBSTRING(@pString,t.N,1),
            @CurQuoteFlop  = CASE WHEN @CurChar = '"' THEN ABS(@PrevQuoteFlop-1) ELSE @PrevQuoteFlop END,
            @KillSpace     = CASE
                             --== If current character is a trailing quote or delimiter outside of quotes, start ignoring spaces. 
                             WHEN @CurQuoteFlop = 0 AND @CurChar IN( ',','"') THEN 1
                             --== If current character is inside quotes, stop ignoring spaces.
                             WHEN @CurQuoteFlop = 1 THEN 0
                             --== If current character is outside quotes, spaces are being ignored, and it isn't a space,
                               -- stop ignoring spaces because it's the start of data outside quotes.
                             WHEN @CurQuoteFlop = 0 AND @KillSpace = 1 AND @CurChar  ' ' THEN 0
                             --== Otherwise, make no change as to whether to ignore spaces or not.
                             ELSE @KillSpace
                            END,
            @Keep          = CASE --Keep the current character if ...
                             --== It's the second quote of two quotes that are together it's inside quotes.
                             WHEN @CurQuoteFlop = 1 AND @PrevQuoteFlop = 0 AND @PrevChar = '"' THEN 1
                             --== It's not the first quote of two quotes that are together it's inside quotes.
                             WHEN @CurQuoteFlop = 1 AND @CurChar  '"' THEN 1
                             --== It's outside of quotes and either isn't a space or is an embedded/trailing space
                             WHEN @CurQuoteFlop = 0 AND @KillSpace = 0  THEN 1
                             --== It's outside of quotes and not a leading space keep it.
                               -- In other words, keep everything but leading space outside of quotes.
                             WHEN @CurQuoteFlop = 0 AND @CurChar  ' ' THEN 1
                             ELSE 0 
                             END,
                             --== Ignore first quote of quote pairs and last quote of quote runs
            @Keep          = CASE WHEN @CurQuoteFlop = 0 AND @CurChar = '"' THEN 0 ELSE @Keep END,
                             --== If it's a delimiter comma, change it to desire delimiter.
            @CurChar       = CASE WHEN @PrevQuoteFlop = 0 AND @CurChar = ',' THEN @pDelimiter ELSE @CurChar END,
            @PrevChar      = @CurChar,
            @PrevQuoteFlop = @CurQuoteFlop,
                             --== A bit slow to assemble the string, but it's faster than anything except XML
            @ReturnString  = @ReturnString + CASE WHEN @Keep = 1 THEN @CurChar ELSE '' END
       FROM dbo.Tally t
      WHERE t.N <= LEN(@pString)
     
    --===== Encapsulate the return string in delimiters and exit
     RETURN @pDelimiter + @ReturnString + @pDelimiter
        END
    GO
    --===========================================================================================================
    --      This would be the staging table after you loaded it from BULK INSERT as whole rows only.
    --      Notice that it contains a header of a differt format than the rest of the rows.  Also,
    --      you said the occurance of quotes around the "fields" would vary depending on whether or not the
    --      string data contained any commas.  I changed field 4 and 5 of the first data row so that it's 
    --      different just to prove this oddball method of mine works.
    --      As a side bar, this is probably one of the few places where a properly written CLR would do better.
    --===========================================================================================================
     CREATE TABLE #Staging 
            (WholeRow VARCHAR(8000))
    
     INSERT INTO #Staging 
            (WholeRow)
     SELECT 'ID,ADD_1,ADD_CD,MEM,FullName' UNION ALL
     SELECT '6000014623,"1310 FEARRINGTON POST",00,"this is a sample, first row changed",tiger' UNION ALL
     SELECT '6000014646,"532 BURNETT CIR",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014649,"2106 NC HIGHWAY 87 N",03,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014663,"1864 MANNS CHAPEL RD",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014668,"20 SILVERSTONE DR",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014669,"159 LAURA JOHNSON RD",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014670,"RR 1",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014686,"PO BOX 132",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014695,"137 FEARRINGTON POST",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014704,"339 TAYLOR RD",03,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014707,"PO BOX 880",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014714,"904 FEARRINGTON POST",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014747,"|2622 US HIGHWAY 64 W",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014748,"PO BOX 518",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000014756,"866 FEARRINGTON POST",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000015365,"1825 EAST ST",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000015493,"407 BRAMPTON CLOSE W",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000015499,"134 HUDSON HILLS RD",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000015500,"321 RECTORY ST",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6000015501,"624 HAMLET CHAPEL RD",00,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6050860067,"P,O, BOX 92, WEST END",05,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6050864122,"CALLE RAMIRO 1, 20, 3B",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6050864480,"EUROVILLA 1, 803A",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6050864782,"WAH CHI COURT, FLAT D 30TH FLOOR",05,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053339447,"KINGOSGADE 13, 3TV",05,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053339541,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053339542,"11 LONG CIRCULAR ROAD ,ST,JAMES,",05,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053448211,"L''ARGENTIERE BAT A, 637 BD TAVERNIERE",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053450414,"FLAT 43, FREDERICK COURT",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053451508,"AVENIDA FRANCISCO BHERING, 17/201",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053451663,"FLAT 5A, CHENYU COURT",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053451664,"FLAT 5A, CHENYU COURT",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053522326,"CASA VERDE, MANO IZQUIERDA, 175 M NORTE",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6053563849,"1 CASTLE PEEK RD BLOCK 6 10-F, A",05,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6055150555,"BUJI TWON,SHENZHEN, GUANGDONG",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6057020037,"LINCEY HOUSE, WHITTINGTON ROAD",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6057020038,"LINCEY HOUSE, WHITTINGTON ROAD",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6057020076,"3186 COUNTY RD |26, RR 2",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6057020077,"3186 COUNTY RD |26, RR 2",02,this is a sample,"tiger, scott"' UNION ALL
     SELECT '6057020127,"AV, SANTO TORIBIO 298, APT, 401",02,this is a sample,"tiger, scott"'
    GO
    --===========================================================================================================
    --      We need to reformat all that stuff so we can actually find all the delimiters before we do a split.
    --      You can either update the #Staging table directly or create another table on the fly.  I prefer to
    --      create another table on the fly.  Of course, we'll use our new function... and we'll use TABs for
    --      the delimiters 
    --===========================================================================================================
     
     SELECT RowNum   = IDENTITY(INT,1,1),
            WholeRow = dbo.TrueCsvSplit(s.WholeRow,CHAR(9))
       INTO #Staging2
       FROM #Staging s
     
    --===========================================================================================================
    --      Then, it's just a matter of splitting the data and reassembling it as columns in a row.  It takes
    --      a Tally table to do that the fastest.  I'll post the URL for the Tally table and what it does in the
    --      next post down.  CHAR(9) is the TAB character.  I'm assuming that you have SQL Server 2005 here.
    --      Let me know if you only have 2k so we can do the work around.
    --===========================================================================================================
     
    ;WITH
    cteSplit AS
    (
     SELECT s.RowNum,
            Element = ROW_NUMBER() OVER (PARTITION BY s.RowNum ORDER BY s.RowNum, t.N),
            RowText = SUBSTRING(s.WholeRow,t.N+1,CHARINDEX(CHAR(9),s.WholeRow,t.N+1)-t.N-1)
       FROM dbo.Tally t
      CROSS JOIN #Staging2 s
      WHERE t.N  1
    )
     SELECT MAX(CASE WHEN Element = 1 THEN RowText END) AS ID,
            MAX(CASE WHEN Element = 2 THEN RowText END) AS Add_1,
            MAX(CASE WHEN Element = 3 THEN RowText END) AS Add_CD,
            MAX(CASE WHEN Element = 4 THEN RowText END) AS Mem,
            MAX(CASE WHEN Element = 5 THEN RowText END) AS FullName
       FROM cteSplit
      GROUP BY RowNum
    

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    First of all thank you so much again for assisting me on this. Since we cannot use the UDF in dynamic sql what do you suggest.. CLR might be too much for me rite now.. Do you think creating temp staging table and then doing a select into (after parsing temp Staging2 and using CTE) into the table name from variable might work... thanks a lot.

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply