Split rows into column

  • I am trying to join two tables and looks like the data is messed up.

    I want to split the rows into columns as there is more than one value in the row.

    But somehow I don't see a pattern in here to split the rows.

    This how the data is

    Create Table #Sample (Numbers Varchar(MAX))

    Insert INTO #Sample Values('1000')

    Insert INTO #Sample Values ('1024 AND 1025')

    Insert INTO #Sample Values ('109 ,110,111')

    Insert INTO #Sample Values ('Old # 1033 replaced with new Invoice # 1544')

    Insert INTO #Sample Values ('1355 Cancelled and Invoice 1922 added')

    Select * from #Sample

    This is what is expected...

    Create Table #Result (Numbers Varchar(MAX))

    Insert INTO #Result Values('1000')

    Insert INTO #Result Values ('1024')

    Insert INTO #Result Values ('1025')

    Insert INTO #Result Values ('109')

    Insert INTO #Result Values ('110')

    Insert INTO #Result Values ('111)

    Insert INTO #Result Values ('1033')

    Insert INTO #Result Values ('1544')

    Insert INTO #Result Values ('1355')

    Insert INTO #Result Values ('1922')

    Select * from #Result

    Any idea how I can implement this ? I believe if there are any numbers I need to split into two columns . Any Ideas

  • One way to approach this involves 3 functions: one that removed duplicate characters, one that replaces characters based on a pattern and Jeff Moden's Splitter. There are links to two are the three functions referenced in my signature line (DelimitedSplit8K and PatReplace8K) if you want to learn more about them. The code for RemoveDupes8K will be included below. Here's the code for the three functions:

    USE tempdb

    GO

    CREATE FUNCTION [dbo].[DelimitedSplit8K]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

    FROM cteStart s

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = SUBSTRING(@pString, l.N1, l.L1)

    FROM cteLen l;

    GO

    CREATE FUNCTION dbo.PatReplace8K

    (

    @String VARCHAR(8000),

    @Pattern VARCHAR(50),

    @Replace VARCHAR(10)

    )

    /*******************************************************************************

    Purpose:

    Given a string (@String), a pattern (@Pattern), and a replacement character (@Replace)

    PatReplace8K will replace any character in @String that matches the @Pattern parameter

    with the character, @Replace.

    Usage:

    --===== Basic Syntax Example

    SELECT newstring

    FROM dbo.dbo.PatReplace8K(@String,@Pattern,@Replace);

    --===== Replace numeric characters with a "*"

    SELECT newstring

    FROM dbo.PatReplace8K('My phone number is 555-2211','[0-9]','*');

    --==== Using againsts a table

    DECLARE @table TABLE(oldstring varchar(40));

    INSERT @table VALUES

    ('Call me at 555-222-6666'),

    ('phone number: (312)555-2323'),

    ('He can be reached at 444.665.4466');

    SELECT *

    FROM @table t

    CROSS APPLY dbo.PatReplace8K(t.oldstring,'[0-9]','*');

    Programmer Notes:

    1. Function runs 2-4 times faster when using make_parallel() (provided that you have

    2 or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance.

    2. Requires SQL Server 2008+

    3. @Pattern IS case sensitive (the function can be easily modified to make it so)

    4. There is no need to include the "%" before and/or after your pattern since since we

    are evaluating each character individually

    5. Certain special characters, such as "$" and "%" need to be escaped with a "/"

    like so: [/$/%]

    6. To strip all non numeric characters use DigitsOnlyEE() for it is at least twice as

    fast. To strip all non-alphanumeric characters use AlphaNumericOnly() which is also

    twice as fast.

    7. For ***removing*** characters based on a pattern use PatExclude8K which is faster.

    Revision History:

    Rev 00 - 10/27/2014 Initial Development - Alan Burstein

    Rev 01 - 10/29/2014 Mar 2007 - Alan Burstein

    - Redesigned based on the dbo.STRIP_NUM_EE by Eirikur Eiriksson

    (see: http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx)

    - change how the cte tally table is created

    - put the include/exclude logic in a CASE statement instead of a WHERE clause

    - Added Latin1_General_BIN Colation

    - Add code to use the pattern as a parameter.

    Rev 02- 11/6/2014

    - Added final performane enhancement (more cudo's to Eirikur Eiriksson)

    - Put 0 = PATINDEX filter logic into the WHERE clause

    Rev 03- 5/16/2015

    - Updated to deal with special XML characters

    Rev 04- 5/26/2015

    - Changed the replacement character from char(1) to varchar(10)

    - tested and determined that there is no performance hit for this change.

    *******************************************************************************/

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    WITH

    E1(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS E1(N)),

    iTally(N) AS (SELECT TOP (DATALENGTH(@String)) CHECKSUM(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1 a,E1 b,E1 c,E1 d)

    SELECT NewString =

    ((

    SELECT

    CASE

    WHEN PATINDEX(@Pattern,SUBSTRING(@String COLLATE Latin1_General_BIN,N,1)) = 0

    THEN SUBSTRING(@String,N,1)

    ELSE CASE @Replace WHEN '' THEN '' ELSE @Replace END-- this line seems silly but really is required (trust me!)

    END +''

    FROM iTally

    FOR XML PATH(''),TYPE

    ).value('.[1]','varchar(8000)'));

    GO

    CREATE FUNCTION dbo.RemoveDupes8K(@string varchar(8000), @preserved varchar(100))

    /*******************************************************************************

    Purpose:

    Given a string (@String) this function will remove all instances of

    repeated characters. The @preserved parameter can be used to ignore

    (and preserve) duplicate characters if they match the pattern.

    This function performs much better when using make_parallel().

    Usage Examples:

    DECLARE @string varchar(8000) = '!!!aa###bb!!!'; -- !a#b!

    -- Remove all duplicate characters

    SELECT * FROM dbo.RemoveDupes8k(@string,'');

    -- Remove all non-alpha duplicates

    SELECT * FROM dbo.RemoveDupes8k(@string,'[A-Za-z]');

    -- Remove only alphabetical duplicates

    SELECT * FROM dbo.RemoveDupes8k(@string,'[^A-Za-z]');

    Revision History:

    Rev 00 - 06/21/2015 Initial Development - Alan Burstein

    *******************************************************************************/

    RETURNS TABLE WITH SCHEMABINDING AS RETURN

    WITH

    E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) T(C)), -- 10

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 100

    iTally(N) AS

    (

    SELECT TOP(DATALENGTH(@string)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E2 a, E2 b -- 10K

    )

    SELECT NewString =

    (

    SELECT SUBSTRING(@string,N,1)+''

    FROM iTally

    WHERE SUBSTRING(@string,N,1) <> SUBSTRING(@string,N+1,1)

    OR SUBSTRING(@string,N,1) LIKE @preserved

    FOR XML PATH(''), TYPE

    ).value('.[1]', 'varchar(8000)');

    GO

    Now that you have the functions here's the solution. Note my comments. Also note that you should probably include a key column so where know where the records came from. Nonetheless, this will produce exactly what you requested.

    SELECT D.ItemNumber, D.Item

    FROM #Sample S

    CROSS APPLY dbo.PatReplace8K(Numbers,'[^0-9]',',') P -- Replace all non-numeric characters with a comma

    CROSS APPLY dbo.RemoveDupes8K(P.NewString,'[0-9]') R -- Remove all duplicate commas

    CROSS APPLY dbo.delimitedsplit8K(R.NewString,',') D -- Now split the comma-delimited string

    WHERE D.item <> '' -- exclude records with a leading commma.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • ... Or you can use PatternSplitCM (Duh...) also referenced in my signature....

    SELECT Item

    FROM #Sample S

    CROSS APPLY dbo.PatternSplitCM(S.Numbers,'%[0-9]%')

    WHERE Matched = 1;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply