October 10, 2012 at 6:41 am
Thank you for putting in time for this...
Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.
October 17, 2012 at 4:31 am
Although it works fine but why why do we need the UNION ALL part in the CTE?
mickyT (10/8/2012)
HiHere's another variation that should do the trick. No function this time, just a query
create table #t1(name varchar (100))
GO
INsert into #t1
select '1234ABC123456XYZ1234567890ADS'
GO
INsert into #t1
select 'cbv736456XYZ543534534545XLS'
GO
INsert into #t1
select 'cbv736456XYZ543534534545XLS2134488'
GO
;with cte AS (
SELECT
stuff(name ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition
patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)
replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's
) res,
1 mycount,
row_number() over (order by name) grouper
FROM #t1
UNION ALL
SELECT
stuff(res ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition
patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length
replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's
) res,
mycount + 1,
grouper
FROM cte
WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0
),
maxcte as (select grouper, max(mycount) lastres from cte group by grouper)
SELECT res
FROM cte c
inner join maxcte m on mycount = lastres and c.grouper = m.grouper
Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.
October 17, 2012 at 12:35 pm
S_Kumar_S (10/17/2012)
Although it works fine but why why do we need the UNION ALL part in the CTE?
This is a recursive CTE, so it will step through the string and replace the first group of 6 numbers in each step. A recursive CTE requires a top-level UNION ALL. You will get an error if you try a UNION or if you remove from the UNION ALL on, it will only replace the first set of six.
To get an idea of what it is doing run the following
create table #t1(name varchar (100))
GO
insert into #t1
select '1234ABC123456XYZ1234567890ADS8785445'
GO
;with cte AS (
SELECT
stuff(name ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition
patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)
replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's
) res,
1 mycount,
row_number() over (order by name) grouper
FROM #t1
UNION ALL
SELECT
stuff(res ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition
patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length
replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's
) res,
mycount + 1,
grouper
FROM cte
WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0
),
maxcte as (select grouper, max(mycount) lastres from cte group by grouper)
select * from cte
This will give you the following results
res mycountgrouper
--------------------------------------------------
1234ABCxxxxxxXYZ1234567890ADS878544511
1234ABCxxxxxxXYZxxxxxxxxxxADS878544521
1234ABCxxxxxxXYZxxxxxxxxxxADSxxxxxxx31
As you can see, the row has been passed through the recursive CTE three times.
Hope this helps
October 17, 2012 at 5:04 pm
mickyT (10/17/2012)
S_Kumar_S (10/17/2012)
Although it works fine but why why do we need the UNION ALL part in the CTE?This is a recursive CTE, so it will step through the string and replace the first group of 6 numbers in each step. A recursive CTE requires a top-level UNION ALL. You will get an error if you try a UNION or if you remove from the UNION ALL on, it will only replace the first set of six.
To get an idea of what it is doing run the following
create table #t1(name varchar (100))
GO
insert into #t1
select '1234ABC123456XYZ1234567890ADS8785445'
GO
;with cte AS (
SELECT
stuff(name ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition
patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)
replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's
) res,
1 mycount,
row_number() over (order by name) grouper
FROM #t1
UNION ALL
SELECT
stuff(res ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition
patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length
replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's
) res,
mycount + 1,
grouper
FROM cte
WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0
),
maxcte as (select grouper, max(mycount) lastres from cte group by grouper)
select * from cte
This will give you the following results
res mycountgrouper
--------------------------------------------------
1234ABCxxxxxxXYZ1234567890ADS878544511
1234ABCxxxxxxXYZxxxxxxxxxxADS878544521
1234ABCxxxxxxXYZxxxxxxxxxxADSxxxxxxx31
As you can see, the row has been passed through the recursive CTE three times.
Hope this helps
Be careful, now. Recursive CTE's that count or do other RBAR ops are nearly as bad and sometimes worse than While loops. Please see the following article for alternatives and comparisons.
http://www.sqlservercentral.com/articles/T-SQL/74118/
--Jeff Moden
Change is inevitable... Change for the better is not.
October 17, 2012 at 7:08 pm
Be careful, now. Recursive CTE's that count or do other RBAR ops are nearly as bad and sometimes worse than While loops. Please see the following article for alternatives and comparisons.
Thanks Jeff
I revisited this to see if I could eliminate the count and improve the performance. Unfortunately it is still recursive, but it did improve the performance a bit
;with cte AS (
SELECT
stuff(name ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition
patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)
replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's
) res,
row_number() over (order by name) grouper
FROM #t1
UNION ALL
SELECT
stuff(res ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition
patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length
replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's
) res,
grouper
FROM cte
WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0
)
SELECT res
FROM cte c
where res not like '%[0-9][0-9][0-9][0-9][0-9][0-9]%'
The difference between the two queries over 10,000 rows is approximately 700ms (altered) and 1300ms(original).
(10000 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 116460, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t1_________________________________________________________________________________________________________________000000000A72'. Scan count 1, logical reads 57, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 702 ms, elapsed time = 715 ms.
(10000 row(s) affected)
Table 'Worktable'. Scan count 4, logical reads 233082, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t1_________________________________________________________________________________________________________________000000000A72'. Scan count 2, logical reads 114, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1310 ms, elapsed time = 1297 ms.
I wonder how much impact the max with a group by had on the original?
October 17, 2012 at 9:23 pm
ChrisM@Work (10/5/2012)
Edit: found an error in an extended sample data set.
I tried it and it doesn't work for the first example of '1234ABC123456XYZ1234567890ADS'. It x's out the first 4 characters.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 17, 2012 at 9:24 pm
Of the code that actually returns the correct answers, here's the performance test of 10,000 rows.
--===== Conditionally drop the test table to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#T1','U') IS NOT NULL
DROP TABLE #T1
;
--===== Create and popululate the test table on-the-fly.
SELECT TOP 10000
RowNum = IDENTITY(INT,1,1),
Name = CAST(d.Name AS VARCHAR(100))
INTO #T1
FROM (SELECT '1234ABC123456XYZ1234567890ADS'
UNION ALL
SELECT 'cbv736456XYZ543534534545XLS')d(Name)
CROSS JOIN sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
GO
--=============================================================================
-- Create any functions/procs in TempDB so we don't make a mess in a
-- real database.
--=============================================================================
USE tempdb;
SET NOCOUNT ON;
GO
IF OBJECT_ID('dbo.Replace6IntWithX') IS NOT NULL
DROP FUNCTION dbo.Replace6IntWithX;
GO
CREATE FUNCTION dbo.Replace6IntWithX
/******************************************************************************
Purpose:
Replace any consecutive sets of 6 or more digits with the same length of
replacement characters.
Revision History:
Rev 00 - 17 Oct 2012 - Jeff Moden
- Initial creation.
******************************************************************************/
--===== Declare the I/O for this function
(@pString VARCHAR(8000), @pReplacementCharacter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== Do the replacement
WITH
cteFindStart AS
( --=== Find where character type changes occur and
-- number the postion of the change.
SELECT Element# = ROW_NUMBER()OVER(ORDER BY t.N),
N = t.N
FROM dbo.Tally t
WHERE t.N <= LEN(@pString)+1
AND ( SUBSTRING(@pString,t.N-1,2) LIKE '[0-9][^0-9]' COLLATE Latin1_General_BIN
OR SUBSTRING(@pString,t.N-1,2) LIKE '[^0-9][0-9]' COLLATE Latin1_General_BIN
OR t.N IN (1,LEN(@pString)+1)
)
),
cteSplitAndSize AS
( --=== Split the string at the character type changes
-- and remember the length of the string for later.
SELECT SplitString = SUBSTRING(@pString,begpos.N,endpos.N-begpos.N),
[Length] = endpos.N - begpos.N
FROM cteFindStart begpos
JOIN cteFindStart endpos
ON begpos.Element# + 1 = endpos.Element#
)
--===== Put the string back together making sure that if
-- numeric strings with >= 6 characters are replaced with "x"s.
SELECT XString =
( --=== Test the first character of each split string.
-- If it's not a digit or the length < 6, then
-- use the split string. Otherwise, replace the
-- length of the split string with "x"s
SELECT CASE
WHEN LEFT(SplitString,1) LIKE '[^0-9]'
OR [Length] < 6
THEN SplitString
ELSE REPLICATE(@pReplacementCharacter,[Length])
END
FROM cteSplitAndSize
FOR XML PATH(''),TYPE
).value('.','varchar(8000)')
;
GO
--=============================================================================
-- Start the tests. I use pure duration as a measure of performance
-- because SET STATISTICS can throw a hammer into the gears.
-- See the following URL for what I mean.
-- http://www.sqlservercentral.com/articles/T-SQL/91724/
-- I also throw away the output to take display and disk time out of the
-- picture.
--=============================================================================
RAISERROR ('================================================',0,1) WITH NOWAIT;
RAISERROR ('========== Jeff''s function ==========',0,1) WITH NOWAIT;
DECLARE @Bitbucket VARCHAR(100),
@StartTime DATETIME;
SELECT @StartTime = GETDATE()
;
SELECT @Bitbucket = ca.XString
FROM #T1 t1
CROSS APPLY Replace6IntWithX(t1.Name,'x')ca
;
PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114)
;
GO
--=============================================================================
RAISERROR ('================================================',0,1) WITH NOWAIT;
RAISERROR ('========== MickyT''s Latest ==========',0,1) WITH NOWAIT;
DECLARE @Bitbucket VARCHAR(100),
@StartTime DATETIME;
SELECT @StartTime = GETDATE()
;
;with cte AS (
SELECT
stuff(name ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition
patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)
replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's
) res,
row_number() over (order by name) grouper
FROM #t1
UNION ALL
SELECT
stuff(res ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition
patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length
replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's
) res,
grouper
FROM cte
WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0
)
SELECT @Bitbucket = res
FROM cte c
where res not like '%[0-9][0-9][0-9][0-9][0-9][0-9]%'
;
PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114)
;
GO
--=============================================================================
RAISERROR ('================================================',0,1) WITH NOWAIT;
RAISERROR ('========== Dwain''s "Nastiest" ==========',0,1) WITH NOWAIT;
RAISERROR ('(Killed it after 2 minutes)',0,1) WITH NOWAIT;
--
--DECLARE @Bitbucket VARCHAR(100),
-- @StartTime DATETIME;
-- SELECT @StartTime = GETDATE()
--;
--;WITH ChunkIt (RowID, n, str1, str2, str3) AS (
-- SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), n=1
-- ,CASE WHEN dig < alp THEN SUBSTRING(name, 1, alp-1) ELSE SUBSTRING(name, 1, dig-1) END
-- ,CASE WHEN dig < alp THEN SUBSTRING(name, alp, LEN(name)) ELSE SUBSTRING(name, dig, LEN(name)) END
-- ,CAST('' AS VARCHAR(100))
-- FROM #t1
-- CROSS APPLY (SELECT PATINDEX('%[0-9]%', name), PATINDEX('%[A-Za-z]%', name)) a(dig, alp)
-- UNION ALL
-- SELECT RowID, n+1
-- ,b.str1
-- ,b.str2
-- ,CASE WHEN PATINDEX('%[0-9]%', b.str2) = 0 OR PATINDEX('%[A-Za-z]%', b.str2) = 0 THEN b.str2 ELSE '' END
-- FROM ChunkIt
-- CROSS APPLY (SELECT PATINDEX('%[0-9]%', str2), PATINDEX('%[A-Za-z]%', str2)) a(dig, alp)
-- CROSS APPLY (
-- SELECT CASE WHEN a.dig < a.alp THEN SUBSTRING(str2, 1, a.alp-1) ELSE SUBSTRING(str2, 1, a.dig-1) END
-- ,CASE WHEN a.dig < a.alp THEN SUBSTRING(str2, a.alp, LEN(str2)) ELSE SUBSTRING(str2, a.dig, LEN(str2)) END
-- ) b(str1, str2)
-- WHERE a.dig > 0 AND a.alp > 0
-- ),
-- ForGrouping AS (
-- SELECT RowID, n
-- ,str1=CASE WHEN LEN(str1) >= 6 AND PATINDEX('%[0-9]%', str1) > 0 THEN REPLICATE('x', LEN(str1)) ELSE str1 END
-- ,str2=CASE WHEN LEN(str3) >= 6 AND PATINDEX('%[0-9]%', str3) > 0 THEN REPLICATE('x', LEN(str3)) ELSE str3 END
-- FROM ChunkIt
-- )
--SELECT @Bitbucket =(
-- SELECT str1 + str2
-- FROM ForGrouping b
-- WHERE a.RowID = b.RowID
-- FOR XML PATH(''))
--FROM ForGrouping a
--GROUP BY RowID
-- PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114)
--;
GO
Here are the results...
================================================
========== Jeff's function ==========
Duration = 00:00:00:257
================================================
========== MickyT's Latest ==========
Duration = 00:00:01:380
================================================
========== Dwain's "Nastiest" ==========
(Killed it after 2 minutes)
--Jeff Moden
Change is inevitable... Change for the better is not.
October 17, 2012 at 10:01 pm
Uggh! :w00t:
I knew my "nastiest" would also be the "costliest."
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 18, 2012 at 12:25 pm
This a non- recursice solution.My Numbers table starts with 1.
WITH t as
(
select name = '1234ABC123456XYZ12345678901234567890ADS'
),
Symbols as
(
select Symbol = substring(name,number,1), Position = Number from t join numbers on number <= len(name)
),
NeighborComparison AS
(
select s.Symbol,s.Position, NextNeighbor = n.Symbol,PreviousNeighbor = p.Symbol,
IsConsecutiveNumber = CASE WHEN (ISNUMERIC(s.Symbol)=1 AND ISNUMERIC(n.Symbol)=1 AND
(ASCII(n.Symbol) - ASCII(s.Symbol) = 1 OR n.Symbol+s.Symbol='90')
)
OR
(ISNUMERIC(s.Symbol)=1 AND ISNUMERIC(p.Symbol)=1 AND
(ASCII(s.Symbol) - ASCII(p.Symbol) = 1 OR s.Symbol+p.Symbol='09')
)
THEN 1
ELSE 0
END
from Symbols s
left join Symbols n on n.Position = s.Position + 1
left join Symbols p on p.Position = s.Position - 1
)
select Position, NewSymbol= CASE WHEN [Group] > 0 AND COUNT(*) OVER (Partition by [Group]) > 5
THEN 'x'
ELSE Symbol
END
from
( select Symbol,Position, [Group] = (position - row_number () OVER (ORDER by isconsecutivenumber desc,Position)+1)* isconsecutivenumber
from NeighborComparison
) g
order by position
October 18, 2012 at 2:10 pm
This is other solution. My Numbers table starts with 1.
WITH t as
(
select name = '1234ABC123456XYZ67890123456789012ADS89012hhh'
),
Symbols as
(
select Symbol = substring(name,number,1), Position = Number from t join numbers on number <= len(name)
),
NumberGroups as
(
select Symbol = CONVERT(INT,Symbol),Position , NumberGroup = Position - symbol + 1 from symbols where isnumeric(symbol)=1
),
UnifiedCrossOverGroups as
(
select * ,
ConsecutiveSymbol = NumberGroupTeenths+Symbol,
UnifiedGroup = Position - (NumberGroupTeenths + symbol ) + 1
from NumberGroups
cross apply( select NumberGroupTeenths = floor(NumberGroup/10)*10 ) nt
),
NewSymbols as
(
select NewSymbol = CASE WHEN g.RangeStart IS NULL THEN s.Symbol ELSE 'x' END ,Position
from symbols s
left join
(
select RangeStart = MIN(Position) , RangeEnd = MAX(Position) from UnifiedCrossOverGroups
group by UnifiedGroup having count(*) > 5
) g ON s.Position between g.RangeStart AND g.RangeEnd
)
select NewText = (select NewSymbol as 'text()' from Newsymbols order by position for xml path(''), type).value('.','varchar(max)')
October 18, 2012 at 4:20 pm
This one is insensitive to number order. My computer says it ran in 00:00:00:030, maybe you can tell me how it ran in yours.
;WITH t as
(
select name = '1234ABC123456XYZ67890123456789012ADS890112hhh'
),
Symbols as
(
select Symbol = substring(name,number,1),
Position = Number
from t join numbers on number between 1 and len(name)
),
Groups as
(
select Symbol ,Position , NumberGroup = Position + 1 - row_number () OVER (ORDER BY POSITION)
from symbols where isnumeric(symbol)=1
)
select string = (select case when g.RangeStart is null then s.Symbol else 'x' end as 'text()' from symbols s
left join ( select RangeStart = min(Position),RangeEnd = max(position),NumberGroup from groups
group by NumberGroup having count(*) > 5
) g
on s.Position between g.RangeStart AND g.RangeEnd
for xml path(''),type
).value('.','varchar(8000)')
October 18, 2012 at 4:21 pm
forget what i just said, i tested with only one row hahahah
October 18, 2012 at 4:29 pm
If you need elegant and efficient solution for this problem then CLR using Static RegEx with compiled pattern is the way to go. I don't have VS with me now, but I will update on it tomorrow.
October 18, 2012 at 5:20 pm
Look, i came up with this, how do you like it?. I test the function in Winforms project but im pretty sure it will work nice in sql clr
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions
Partial Public Class UserDefinedFunctions
<Microsoft.SqlServer.Server.SqlFunction()> _
Public Shared Function ReplaceWithX(ByVal StringToReplace As SqlString) As SqlString
' Add your code here
Dim expression As New Regex("[0-9]{6,}", RegexOptions.Compiled)
'Dim evaluator As MatchEvaluator = AddressOf CharReplacement
Dim Result As String = expression.Replace(StringToReplace.ToString, Function(m As Match) New String("x", m.Length))
Return New SqlString(Result)
End Function
End Class
October 18, 2012 at 5:26 pm
it worked on sql server =D
Viewing 15 posts - 16 through 30 (of 44 total)
You must be logged in to reply to this topic. Login to reply