A quick query puzzle:

Question

Post reply

A quick query puzzle:

S_Kumar_S

SSCrazy Eights

Points: 9037
More actions
October 4, 2012 at 6:57 am

#262620

Hi
I have a table with below data. Requirement is to replace all integers with continous 6 or more occurances with 'x'. Less than 6 occurances should not be replaced.
create table t1(name varchar (100))
GO
INsert into t1
select '1234ABC123456XYZ1234567890ADS'
GO
INsert into t1
select 'cbv736456XYZ543534534545XLS'
GO
EXPECTED RESULT:
1234ABCxxxxxxXYZxxxxxxxxxxADS
cbvxxxxxxXYZxxxxxxxxxxxxXLS
drop table t1
Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

Viewing 15 posts - 1 through 15 (of 45 total)

You must be logged in to reply to this topic. Login to reply

mickyT SSChampion Points: 10360 More actions · Answer 1

Hi

You could try the following

CREATE FUNCTION replace6plusint(@s1 varchar(100)) RETURNS TABLE AS RETURN

WITH

firstRun AS (

SELECT replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(@s1,'0','^'),'1','^'),'2','^'),'3','^'),'4','^'),'5','^'),'6','^'),'7','^'),'8','^'),'9','^'),'^^^^^^','------') a

UNION ALL

SELECT replace(a,'-^','--')

FROM firstRun

WHERE a like '%-^%'

),

secondRun AS (

SELECT replace(a,'-','x') b, charindex('^', a) p

FROM firstRun

WHERE a not like '%-^%'

UNION ALL

SELECT stuff(b, p, 1, substring(@s1,p,1)), charindex('^', b)

FROM secondRun

WHERE p <> 0

)

SELECT b result FROM secondRun WHERE p = 0

S_Kumar_S SSCrazy Eights Points: 9037 More actions · Answer 2

This is really good but problem is that my table has multiple rows and this is a table function. So it has to be a scalar function for me to use....

Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

S_Kumar_S (10/5/2012)
This is really good but problem is that my table has multiple rows and this is a table function. So it has to be a scalar function for me to use....

Find some good reading material covering functions. You absolutely don't need a function for this - and most folks would recommend a table-valued function over the alternatives. It's far simpler than you think:

SELECT name,

REPLACE(

REPLACE(name,

'012345','xxxxxx'),

'123456','xxxxxx'),

'234567','xxxxxx'),

'345678','xxxxxx'),

'456789','xxxxxx')

FROM t1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

S_Kumar_S SSCrazy Eights Points: 9037 More actions · Answer 4

Hi Chris.

the numbers are not consecutive as you have assumed. And they may be of length greater than 6. I think it is not so straight as it looks.

Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

ben.brugman SSChampion Points: 13369 More actions · Answer 5

With a large amount of replaces this can be done.

Problem is that the number of replaces is very large.

So I'll show the concept but not the code, because the code would be large.

First of all I assum that there are 'strings' which do not occure in the supplied string.

Here I'll use the strings '^' and 'x', but if these occure replace then with strings which do not occure.

First replace alle the numbers in the strings with

1

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

S_Kumar_S (10/5/2012)
Hi Chris.
the numbers are not consecutive as you have assumed. And they may be of length greater than 6. I think it is not so straight as it looks.

Ah, ok.

SELECT

t.name, x.New_Name

FROM #t1 t

CROSS APPLY (

SELECT New_Name =

(SELECT Newletter + ''

FROM (

SELECT n, Newletter = CASE WHEN d.grouper IS NOT NULL

AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END

FROM (

SELECT n, letter,

grouper = ROW_NUMBER() OVER(PARTITION BY t.name ORDER BY ISNUMERIC(letter), n) - N

FROM (SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

FROM sys.columns a, sys.columns b) tally

CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l

) d

) stri

ORDER BY n

FOR XML PATH('') , TYPE).value('.', 'varchar(max)')

) x

ORDER BY t.name

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ben.brugman SSChampion Points: 13369 More actions · Answer 7

With a large amount of replaces this can be done.

Problem is that the number of replaces is very large.

So I'll show the concept but not the code, because the code would be large.

First of all I assume that there are 'strings' which do not occure in the supplied string.

Here I'll use the strings '^' and 'x', but if these occure replace then with strings which do not occure.

First replace alle the numbers in the strings with

1 becomes ^1 (do this for all 10 numeric characters)

^1^ becomes ^^1 (do this for all 10 numeric characters and repeat this large number of times).

All number strings have now the shape ^^^^1234

X^^^^^^ becomes x^^^^^

EDIT: ABOVE LINE SHOULD BE (sorry)

^^^^^^ becomes x^^^^^

x^ becomes xx (repeat a number of times)

Now the number strings all have the shape ^^^^1234 or xxxxxxxx12345678

x1 becomes x (do this for all 10 numeric characters and repeat this a large number of) times).

^ becomes '' (remove all the ^)

The number of replaces is large, but maybe could be done in a loops.

Or if the database is very large this can be generated.

If the database is large and the numbers are very long you need an awfull lot of replaces.

For large numbers and large number of rows this might not be a handy method.

An alternate solution might be to copy the table change all numbers into '1' do the above with far less substitutes and than replace the 1111 string with the original numbers. With some masking this is possible.

thanks for sharing your problem,

Ben Brugman

S_Kumar_S SSCrazy Eights Points: 9037 More actions · Answer 8

This is quite impressive but it doesn't work for some scenarios. e.g. this one :

insert into t1

select 'AAA22222222222222BB5BB 33CCCCCCCCC342234234'

EDIT: I meant it for Chris response:

SELECT

t.name, x.New_Name

FROM #t1 t

CROSS APPLY (

SELECT New_Name =

(SELECT Newletter + ''

FROM (

SELECT n, Newletter = CASE WHEN d.grouper IS NOT NULL

AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END

FROM (

SELECT n, letter,

grouper = ROW_NUMBER() OVER(PARTITION BY t.name ORDER BY ISNUMERIC(letter), n) - N

FROM (SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

FROM sys.columns a, sys.columns b) tally

CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l

) d

) stri

ORDER BY n

FOR XML PATH('') , TYPE).value('.', 'varchar(max)')

) x

ORDER BY t.name

Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

S_Kumar_S (10/5/2012)
This is quite impressive but it doesn't work for some scenarios. e.g. this one :
insert into t1
select 'AAA22222222222222BB5BB 33CCCCCCCCC342234234'
EDIT: I meant it for Chris response:
SELECT
t.name, x.New_Name
FROM #t1 t
CROSS APPLY (
SELECT New_Name =
(SELECT Newletter + ''
FROM (
SELECT n, Newletter = CASE WHEN d.grouper IS NOT NULL
AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END
FROM (
SELECT n, letter,
grouper = ROW_NUMBER() OVER(PARTITION BY t.name ORDER BY ISNUMERIC(letter), n) - N
FROM (SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM sys.columns a, sys.columns b) tally
CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l
) d
) stri
ORDER BY n
FOR XML PATH('') , TYPE).value('.', 'varchar(max)')
) x
ORDER BY t.name

My apologies, it was a change during coding:

SELECT

t.name, x.New_Name

FROM #t1 t

CROSS APPLY (

SELECT New_Name =

(SELECT Newletter + ''

FROM (

SELECT

n, Newletter = CASE WHEN ISNUMERIC(letter) = 1 -- d.grouper IS NOT NULL

AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END

FROM (

SELECT t.name, n, letter,

grouper = n - ROW_NUMBER() OVER(ORDER BY ISNUMERIC(letter) desc, n)

FROM

(SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

FROM sys.columns a, sys.columns b) tally

CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l

) d

) stri

ORDER BY n

FOR XML PATH('') , TYPE).value('.', 'varchar(max)')

) x

ORDER BY t.name

Edit: found an error in an extended sample data set.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ben.brugman SSChampion Points: 13369 More actions · Answer 10

Hello all,

Because I goofed a bit in my previous solution here the worked out code.

It's a solution which only used simple replaces, not efficient (lots of updates).

ben brugman

-- Ben Brugman

-- 20121005

-- http://www.sqlservercentral.com/Forums/Topic1368367-391-1.aspx?Update=1

-- A quick query puzzle.

-- This solution is a rather cumbersome solution only using a large number of 'simple' replaces.

-- It is also limited to the length defined in @maxlength (this is the maximum length a number can be).

-- Create the table

create table #t1(name varchar (100))

GO

INsert into #t1

select '1234ABC123456XYZ1234567890ADS'

GO

INsert into #t1

select 'cbv736456XYZ543534534545XLS'

GO

-- Create a replace stored procedure.

Create Procedure st_replace

@string1 varchar(8000),

@string2 varchar(8000)

as

begin

Update #t1 set name = replace(name,@string1,@string2)

end

go

declare @maxlength int = 40

declare @tel int

-- Mark all the numeric characters.

exec st_replace '0', '^0'

exec st_replace '1', '^1'

exec st_replace '2', '^2'

exec st_replace '3', '^3'

exec st_replace '4', '^4'

exec st_replace '5', '^5'

exec st_replace '6', '^6'

exec st_replace '7', '^7'

exec st_replace '8', '^8'

exec st_replace '9', '^9'

-- Bring the marks in front of the numbers

set @tel = @maxlength

WHILE @tel > 0 BEGIN

exec st_replace '0^','^0'

exec st_replace '1^','^1'

exec st_replace '2^','^2'

exec st_replace '3^','^3'

exec st_replace '4^','^4'

exec st_replace '5^','^5'

exec st_replace '6^','^6'

exec st_replace '7^','^7'

exec st_replace '8^','^8'

exec st_replace '9^','^9'

set @tel = @tel - 1

END

-- Change the marks which are at least 6 long

exec st_replace '^^^^^^','X^^^^^'

set @tel = @maxlength

WHILE @tel > 0 BEGIN

exec st_replace 'x^','xx'

set @tel = @tel - 1

END

--

-- Numbers are now of the form ^^^^1234 or XXXXXXXX12345678

--

-- Remove the numbers behind the X mark.

set @tel = @maxlength

WHILE @tel > 0 BEGIN

exec st_replace 'X0','X'

exec st_replace 'X1','X'

exec st_replace 'X2','X'

exec st_replace 'X3','X'

exec st_replace 'X4','X'

exec st_replace 'X5','X'

exec st_replace 'X6','X'

exec st_replace 'X7','X'

exec st_replace 'X8','X'

exec st_replace 'X9','X'

set @tel = @tel - 1

END

-- Remove the 'temporary' marks

exec st_replace '^',''

SELECT * FROM #T1

DROP PROCEDURE ST_REPLACE

DROP TABLE #T1

mickyT SSChampion Points: 10360 More actions · Answer 11

Hi

Here's another variation that should do the trick. No function this time, just a query

create table #t1(name varchar (100))

GO

INsert into #t1

select '1234ABC123456XYZ1234567890ADS'

GO

INsert into #t1

select 'cbv736456XYZ543534534545XLS'

GO

INsert into #t1

select 'cbv736456XYZ543534534545XLS2134488'

GO

;with cte AS (

SELECT

stuff(name ,

patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition

patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)

replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's

) res,

1 mycount,

row_number() over (order by name) grouper

FROM #t1

UNION ALL

SELECT

stuff(res ,

patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition

patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length

replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's

) res,

mycount + 1,

grouper

FROM cte

WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0

),

maxcte as (select grouper, max(mycount) lastres from cte group by grouper)

SELECT res

FROM cte c

inner join maxcte m on mycount = lastres and c.grouper = m.grouper

ben.brugman SSChampion Points: 13369 More actions · Answer 12

A small change See in the code.

(My reason for this change change is: The essential part of the code is only written once, so it is smaller, clearer and with less riscs for errors).

Thanks for submitting this code, I am learning from this.

Ben Brugman

mickyT (10/8/2012)
Hi
Here's another variation that should do the trick. No function this time, just a query
create table #t1(name varchar (100))
GO
INsert into #t1
select '1234ABC123456XYZ1234567890ADS'
GO
INsert into #t1
select 'cbv736456XYZ543534534545XLS'
GO
INsert into #t1
select 'cbv736456XYZ543534534545XLS2134488'
GO
;with cte AS (
SELECT
--stuff(name ,
--patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition
--patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)
--replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's
--) res,
convert(varchar(max), res) as res,
1 mycount,
row_number() over (order by name) grouper
FROM #t1
UNION ALL
SELECT
stuff(res ,
patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition
patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length
replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's
) res,
mycount + 1,
grouper
FROM cte
WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0
),
maxcte as (select grouper, max(mycount) lastres from cte group by grouper)
SELECT res
FROM cte c
inner join maxcte m on mycount = lastres and c.grouper = m.grouper

mickyT SSChampion Points: 10360 More actions · Answer 13

ben.brugman (10/8/2012)
A small change See in the code.
(My reason for this change change is: The essential part of the code is only written once, so it is smaller, clearer and with less riscs for errors).
Thanks for submitting this code, I am learning from this.
Ben Brugman

I agree, that change makes it nicer. Cheers:-)

ditto on the learning ... and I have enjoyed thinking about this problem 😎

Normally I would have used a set of CLR functions that I have that replicate the oracle regular expression functions.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 14

Just for fun, here's a nasty piece of work:

create table #t1(name varchar (100))

INsert into #t1

select '1234ABC123456XYZ1234567890ADS'

INsert into #t1

select 'cbv736456XYZ543534534545XLS'

;WITH ChunkIt (RowID, n, str1, str2, str3) AS (

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), n=1

,CASE WHEN dig < alp THEN SUBSTRING(name, 1, alp-1) ELSE SUBSTRING(name, 1, dig-1) END

,CASE WHEN dig < alp THEN SUBSTRING(name, alp, LEN(name)) ELSE SUBSTRING(name, dig, LEN(name)) END

,CAST('' AS VARCHAR(100))

FROM #t1

CROSS APPLY (SELECT PATINDEX('%[0-9]%', name), PATINDEX('%[A-Za-z]%', name)) a(dig, alp)

UNION ALL

SELECT RowID, n+1

,b.str1

,b.str2

,CASE WHEN PATINDEX('%[0-9]%', b.str2) = 0 OR PATINDEX('%[A-Za-z]%', b.str2) = 0 THEN b.str2 ELSE '' END

FROM ChunkIt

CROSS APPLY (SELECT PATINDEX('%[0-9]%', str2), PATINDEX('%[A-Za-z]%', str2)) a(dig, alp)

CROSS APPLY (

SELECT CASE WHEN a.dig < a.alp THEN SUBSTRING(str2, 1, a.alp-1) ELSE SUBSTRING(str2, 1, a.dig-1) END

,CASE WHEN a.dig < a.alp THEN SUBSTRING(str2, a.alp, LEN(str2)) ELSE SUBSTRING(str2, a.dig, LEN(str2)) END

) b(str1, str2)

WHERE a.dig > 0 AND a.alp > 0

),

ForGrouping AS (

SELECT RowID, n

,str1=CASE WHEN LEN(str1) >= 6 AND PATINDEX('%[0-9]%', str1) > 0 THEN REPLICATE('x', LEN(str1)) ELSE str1 END

,str2=CASE WHEN LEN(str3) >= 6 AND PATINDEX('%[0-9]%', str3) > 0 THEN REPLICATE('x', LEN(str3)) ELSE str3 END

FROM ChunkIt

)

SELECT name=(

SELECT str1 + str2

FROM ForGrouping b

WHERE a.RowID = b.RowID

FOR XML PATH(''))

FROM ForGrouping a

GROUP BY RowID

ORDER BY RowID

DROP TABLE #t1

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St