September 29, 2015 at 3:58 pm
I am trying to join two tables and looks like the data is messed up.
I want to split the rows into columns as there is more than one value in the row.
But somehow I don't see a pattern in here to split the rows.
This how the data is
Create Table #Sample (Numbers Varchar(MAX))
Insert INTO #Sample Values('1000')
Insert INTO #Sample Values ('1024 AND 1025')
Insert INTO #Sample Values ('109 ,110,111')
Insert INTO #Sample Values ('Old # 1033 replaced with new Invoice # 1544')
Insert INTO #Sample Values ('1355 Cancelled and Invoice 1922 added')
Select * from #Sample
This is what is expected...
Create Table #Result (Numbers Varchar(MAX))
Insert INTO #Result Values('1000')
Insert INTO #Result Values ('1024')
Insert INTO #Result Values ('1025')
Insert INTO #Result Values ('109')
Insert INTO #Result Values ('110')
Insert INTO #Result Values ('111)
Insert INTO #Result Values ('1033')
Insert INTO #Result Values ('1544')
Insert INTO #Result Values ('1355')
Insert INTO #Result Values ('1922')
Select * from #Result
Any idea how I can implement this ? I believe if there are any numbers I need to split into two columns . Any Ideas
September 29, 2015 at 5:54 pm
One way to approach this involves 3 functions: one that removed duplicate characters, one that replaces characters based on a pattern and Jeff Moden's Splitter. There are links to two are the three functions referenced in my signature line (DelimitedSplit8K and PatReplace8K) if you want to learn more about them. The code for RemoveDupes8K will be included below. Here's the code for the three functions:
USE tempdb
GO
CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l;
GO
CREATE FUNCTION dbo.PatReplace8K
(
@String VARCHAR(8000),
@Pattern VARCHAR(50),
@Replace VARCHAR(10)
)
/*******************************************************************************
Purpose:
Given a string (@String), a pattern (@Pattern), and a replacement character (@Replace)
PatReplace8K will replace any character in @String that matches the @Pattern parameter
with the character, @Replace.
Usage:
--===== Basic Syntax Example
SELECT newstring
FROM dbo.dbo.PatReplace8K(@String,@Pattern,@Replace);
--===== Replace numeric characters with a "*"
SELECT newstring
FROM dbo.PatReplace8K('My phone number is 555-2211','[0-9]','*');
--==== Using againsts a table
DECLARE @table TABLE(oldstring varchar(40));
INSERT @table VALUES
('Call me at 555-222-6666'),
('phone number: (312)555-2323'),
('He can be reached at 444.665.4466');
SELECT *
FROM @table t
CROSS APPLY dbo.PatReplace8K(t.oldstring,'[0-9]','*');
Programmer Notes:
1. Function runs 2-4 times faster when using make_parallel() (provided that you have
2 or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance.
2. Requires SQL Server 2008+
3. @Pattern IS case sensitive (the function can be easily modified to make it so)
4. There is no need to include the "%" before and/or after your pattern since since we
are evaluating each character individually
5. Certain special characters, such as "$" and "%" need to be escaped with a "/"
like so: [/$/%]
6. To strip all non numeric characters use DigitsOnlyEE() for it is at least twice as
fast. To strip all non-alphanumeric characters use AlphaNumericOnly() which is also
twice as fast.
7. For ***removing*** characters based on a pattern use PatExclude8K which is faster.
Revision History:
Rev 00 - 10/27/2014 Initial Development - Alan Burstein
Rev 01 - 10/29/2014 Mar 2007 - Alan Burstein
- Redesigned based on the dbo.STRIP_NUM_EE by Eirikur Eiriksson
(see: http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx)
- change how the cte tally table is created
- put the include/exclude logic in a CASE statement instead of a WHERE clause
- Added Latin1_General_BIN Colation
- Add code to use the pattern as a parameter.
Rev 02- 11/6/2014
- Added final performane enhancement (more cudo's to Eirikur Eiriksson)
- Put 0 = PATINDEX filter logic into the WHERE clause
Rev 03- 5/16/2015
- Updated to deal with special XML characters
Rev 04- 5/26/2015
- Changed the replacement character from char(1) to varchar(10)
- tested and determined that there is no performance hit for this change.
*******************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
E1(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS E1(N)),
iTally(N) AS (SELECT TOP (DATALENGTH(@String)) CHECKSUM(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1 a,E1 b,E1 c,E1 d)
SELECT NewString =
((
SELECT
CASE
WHEN PATINDEX(@Pattern,SUBSTRING(@String COLLATE Latin1_General_BIN,N,1)) = 0
THEN SUBSTRING(@String,N,1)
ELSE CASE @Replace WHEN '' THEN '' ELSE @Replace END-- this line seems silly but really is required (trust me!)
END +''
FROM iTally
FOR XML PATH(''),TYPE
).value('.[1]','varchar(8000)'));
GO
CREATE FUNCTION dbo.RemoveDupes8K(@string varchar(8000), @preserved varchar(100))
/*******************************************************************************
Purpose:
Given a string (@String) this function will remove all instances of
repeated characters. The @preserved parameter can be used to ignore
(and preserve) duplicate characters if they match the pattern.
This function performs much better when using make_parallel().
Usage Examples:
DECLARE @string varchar(8000) = '!!!aa###bb!!!'; -- !a#b!
-- Remove all duplicate characters
SELECT * FROM dbo.RemoveDupes8k(@string,'');
-- Remove all non-alpha duplicates
SELECT * FROM dbo.RemoveDupes8k(@string,'[A-Za-z]');
-- Remove only alphabetical duplicates
SELECT * FROM dbo.RemoveDupes8k(@string,'[^A-Za-z]');
Revision History:
Rev 00 - 06/21/2015 Initial Development - Alan Burstein
*******************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) T(C)), -- 10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 100
iTally(N) AS
(
SELECT TOP(DATALENGTH(@string)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E2 a, E2 b -- 10K
)
SELECT NewString =
(
SELECT SUBSTRING(@string,N,1)+''
FROM iTally
WHERE SUBSTRING(@string,N,1) <> SUBSTRING(@string,N+1,1)
OR SUBSTRING(@string,N,1) LIKE @preserved
FOR XML PATH(''), TYPE
).value('.[1]', 'varchar(8000)');
GO
Now that you have the functions here's the solution. Note my comments. Also note that you should probably include a key column so where know where the records came from. Nonetheless, this will produce exactly what you requested.
SELECT D.ItemNumber, D.Item
FROM #Sample S
CROSS APPLY dbo.PatReplace8K(Numbers,'[^0-9]',',') P -- Replace all non-numeric characters with a comma
CROSS APPLY dbo.RemoveDupes8K(P.NewString,'[0-9]') R -- Remove all duplicate commas
CROSS APPLY dbo.delimitedsplit8K(R.NewString,',') D -- Now split the comma-delimited string
WHERE D.item <> '' -- exclude records with a leading commma.
-- Itzik Ben-Gan 2001
September 29, 2015 at 6:04 pm
... Or you can use PatternSplitCM (Duh...) also referenced in my signature....
SELECT Item
FROM #Sample S
CROSS APPLY dbo.PatternSplitCM(S.Numbers,'%[0-9]%')
WHERE Matched = 1;
-- Itzik Ben-Gan 2001
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply