Split rows into column

Question

Split rows into column

sharonsql2013

SSCertifiable

Points: 5908
More actions
September 29, 2015 at 3:58 pm

#399112

I am trying to join two tables and looks like the data is messed up.
I want to split the rows into columns as there is more than one value in the row.
But somehow I don't see a pattern in here to split the rows.
This how the data is
Create Table #Sample (Numbers Varchar(MAX))
Insert INTO #Sample Values('1000')
Insert INTO #Sample Values ('1024 AND 1025')
Insert INTO #Sample Values ('109 ,110,111')
Insert INTO #Sample Values ('Old # 1033 replaced with new Invoice # 1544')
Insert INTO #Sample Values ('1355 Cancelled and Invoice 1922 added')
Select * from #Sample
This is what is expected...
Create Table #Result (Numbers Varchar(MAX))
Insert INTO #Result Values('1000')
Insert INTO #Result Values ('1024')
Insert INTO #Result Values ('1025')
Insert INTO #Result Values ('109')
Insert INTO #Result Values ('110')
Insert INTO #Result Values ('111)
Insert INTO #Result Values ('1033')
Insert INTO #Result Values ('1544')
Insert INTO #Result Values ('1355')
Insert INTO #Result Values ('1922')
Select * from #Result
Any idea how I can implement this ? I believe if there are any numbers I need to split into two columns . Any Ideas

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61141 More actions · Answer 1

One way to approach this involves 3 functions: one that removed duplicate characters, one that replaces characters based on a pattern and Jeff Moden's Splitter. There are links to two are the three functions referenced in my signature line (DelimitedSplit8K and PatReplace8K) if you want to learn more about them. The code for RemoveDupes8K will be included below. Here's the code for the three functions:

USE tempdb

GO

CREATE FUNCTION [dbo].[DelimitedSplit8K]

--===== Define I/O parameters

(@pString VARCHAR(8000), @pDelimiter CHAR(1))

--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

-- enough to cover VARCHAR(8000)

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

),

cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT 1 UNION ALL

SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

),

cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

SELECT s.N1,

ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

FROM cteStart s

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

Item = SUBSTRING(@pString, l.N1, l.L1)

FROM cteLen l;

GO

CREATE FUNCTION dbo.PatReplace8K

(

@String VARCHAR(8000),

@Pattern VARCHAR(50),

@Replace VARCHAR(10)

)

/*******************************************************************************

Purpose:

Given a string (@String), a pattern (@Pattern), and a replacement character (@Replace)

PatReplace8K will replace any character in @String that matches the @Pattern parameter

with the character, @Replace.

Usage:

--===== Basic Syntax Example

SELECT newstring

FROM dbo.dbo.PatReplace8K(@String,@Pattern,@Replace);

--===== Replace numeric characters with a "*"

SELECT newstring

FROM dbo.PatReplace8K('My phone number is 555-2211','[0-9]','*');

--==== Using againsts a table

DECLARE @table TABLE(oldstring varchar(40));

INSERT @table VALUES

('Call me at 555-222-6666'),

('phone number: (312)555-2323'),

('He can be reached at 444.665.4466');

SELECT *

FROM @table t

CROSS APPLY dbo.PatReplace8K(t.oldstring,'[0-9]','*');

Programmer Notes:

1. Function runs 2-4 times faster when using make_parallel() (provided that you have

2 or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance.

2. Requires SQL Server 2008+

3. @Pattern IS case sensitive (the function can be easily modified to make it so)

4. There is no need to include the "%" before and/or after your pattern since since we

are evaluating each character individually

5. Certain special characters, such as "$" and "%" need to be escaped with a "/"

like so: [/$/%]

6. To strip all non numeric characters use DigitsOnlyEE() for it is at least twice as

fast. To strip all non-alphanumeric characters use AlphaNumericOnly() which is also

twice as fast.

7. For ***removing*** characters based on a pattern use PatExclude8K which is faster.

Revision History:

Rev 00 - 10/27/2014 Initial Development - Alan Burstein

Rev 01 - 10/29/2014 Mar 2007 - Alan Burstein

- Redesigned based on the dbo.STRIP_NUM_EE by Eirikur Eiriksson

(see: http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx)

- change how the cte tally table is created

- put the include/exclude logic in a CASE statement instead of a WHERE clause

- Added Latin1_General_BIN Colation

- Add code to use the pattern as a parameter.

Rev 02- 11/6/2014

- Added final performane enhancement (more cudo's to Eirikur Eiriksson)

- Put 0 = PATINDEX filter logic into the WHERE clause

Rev 03- 5/16/2015

- Updated to deal with special XML characters

Rev 04- 5/26/2015

- Changed the replacement character from char(1) to varchar(10)

- tested and determined that there is no performance hit for this change.

*******************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

WITH

E1(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS E1(N)),

iTally(N) AS (SELECT TOP (DATALENGTH(@String)) CHECKSUM(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1 a,E1 b,E1 c,E1 d)

SELECT NewString =

((

SELECT

CASE

WHEN PATINDEX(@Pattern,SUBSTRING(@String COLLATE Latin1_General_BIN,N,1)) = 0

THEN SUBSTRING(@String,N,1)

ELSE CASE @Replace WHEN '' THEN '' ELSE @Replace END-- this line seems silly but really is required (trust me!)

END +''

FROM iTally

FOR XML PATH(''),TYPE

).value('.[1]','varchar(8000)'));

GO

CREATE FUNCTION dbo.RemoveDupes8K(@string varchar(8000), @preserved varchar(100))

/*******************************************************************************

Purpose:

Given a string (@String) this function will remove all instances of

repeated characters. The @preserved parameter can be used to ignore

(and preserve) duplicate characters if they match the pattern.

This function performs much better when using make_parallel().

Usage Examples:

DECLARE @string varchar(8000) = '!!!aa###bb!!!'; -- !a#b!

-- Remove all duplicate characters

SELECT * FROM dbo.RemoveDupes8k(@string,'');

-- Remove all non-alpha duplicates

SELECT * FROM dbo.RemoveDupes8k(@string,'[A-Za-z]');

-- Remove only alphabetical duplicates

SELECT * FROM dbo.RemoveDupes8k(@string,'[^A-Za-z]');

Revision History:

Rev 00 - 06/21/2015 Initial Development - Alan Burstein

*******************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS RETURN

WITH

E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) T(C)), -- 10

E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 100

iTally(N) AS

(

SELECT TOP(DATALENGTH(@string)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM E2 a, E2 b -- 10K

)

SELECT NewString =

(

SELECT SUBSTRING(@string,N,1)+''

FROM iTally

WHERE SUBSTRING(@string,N,1) <> SUBSTRING(@string,N+1,1)

OR SUBSTRING(@string,N,1) LIKE @preserved

FOR XML PATH(''), TYPE

).value('.[1]', 'varchar(8000)');

GO

Now that you have the functions here's the solution. Note my comments. Also note that you should probably include a key column so where know where the records came from. Nonetheless, this will produce exactly what you requested.

SELECT D.ItemNumber, D.Item

FROM #Sample S

CROSS APPLY dbo.PatReplace8K(Numbers,'[^0-9]',',') P -- Replace all non-numeric characters with a comma

CROSS APPLY dbo.RemoveDupes8K(P.NewString,'[0-9]') R -- Remove all duplicate commas

CROSS APPLY dbo.delimitedsplit8K(R.NewString,',') D -- Now split the comma-delimited string

WHERE D.item <> '' -- exclude records with a leading commma.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 2

... Or you can use PatternSplitCM (Duh...) also referenced in my signature....

SELECT Item

FROM #Sample S

CROSS APPLY dbo.PatternSplitCM(S.Numbers,'%[0-9]%')

WHERE Matched = 1;

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001