Removing duplicate character from string

Question

Removing duplicate character from string

nadersam

SSCertifiable

Points: 6542
More actions
October 18, 2015 at 4:12 am

#305525

Hi All,
I need a neat way please guys to write a function to remove duplicated characters only if they come in sequence.
Examples
darrk should return dark
eeagle should return eagle
redd should return red
corner should corner as it is as the r's are not in sequence.
Thanks in advance.
Nader

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

nadersam SSCertifiable Points: 6542 More actions · Answer 1

I have written the following code but it has a syntax error that i am not able to fix, not sure if case clause is supported inside CTE or what.

declare @Name as varchar(50) ='parrdetam';

declare @result as varchar(50) ='';

WITH

N(N) AS (

SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1)) N (N)

) -- 8 rows

, TALLY (N) AS (

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N N1, N N2

)

,

NewWord(L) AS (

SELECT @result=case when SUBSTRING(@Name, T.N, 1) = SUBSTRING(@Name, T.N-1, 1)

then

'rr'

else

@Name

end

FROM TALLY T

WHERE SUBSTRING(@Name, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN

)

select * from NewWord

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 2

Quick solution

😎

USE tempdb;

GO

SET NOCOUNT ON;

DECLARE @INPUT_STR VARCHAR(500) = 'WWWWWWHY DDDDDDDDDDDOOOOOOOOO YYYYYYYYOOOOOOOOOOOOOOOOOUUUUUUUUUU WWWWWWWWWWANNNNNNNNNNNNNTTTTTTT TOOOOOOOOO DOOOOOOOO TTTTTTTTHIIIIIIIIIIISSSSSSS??????????';

;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

, NUMS(N) AS (SELECT TOP(LEN(@INPUT_STR)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)

,CHAR_LIST AS

(

SELECT

NM.N

,SUBSTRING(@INPUT_STR,NM.N,1) AS TCHAR

FROM NUMS NM

)

,COMPARE_CHARS AS

(

SELECT

CL.TCHAR

,LAG(CL.TCHAR,1,CHAR(0)) OVER

(

ORDER BY CL.N

) AS PREV_CHAR

FROM CHAR_LIST CL

)

SELECT

(SELECT

CC.TCHAR

FROM COMPARE_CHARS CC

WHERE CC.TCHAR <> CC.PREV_CHAR

FOR XML PATH(''), TYPE).value('.[1]','VARCHAR(500)') AS OUT_STR

;

Results

OUT_STR

----------------------------

WHY DO YOU WANT TO DO THIS?

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 3

nadersam (10/18/2015)
Hi All,
I need a neat way please guys to write a function to remove duplicated characters only if they come in sequence.
Examples
darrk should return dark
eeagle should return eagle
redd should return red
corner should corner as it is as the r's are not in sequence.
Thanks in advance.
Nader

What will you want to do about legal words such as "BOOK", "SEEN", or "ABBOTT"?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

nadersam SSCertifiable Points: 6542 More actions · Answer 4

WHY DO YOU WANT TO DO THIS?

[/code][/quote]

First i would like to thank you for your reply, first time i hear of this LAG function 🙂

This function is needed to cross check a new brand name against existing ones.

There are some rules for this new name to pass , one of them is the number of different characters in it must be more than a certain number.

Some companies add repetitive characters in sequence to pas this check :-), if repetitive characters come in sequence they will not differ significantly in pronunciation which is the main point so consumer is not confused between them.

nadersam SSCertifiable Points: 6542 More actions · Answer 5

What will you want to do about legal words such as "BOOK", "SEEN", or "ABBOTT"?[/quote]

Even in those legal names repetition will be removed because what matters more is how is they are pronounced.

There are some rules for this new name to pass , one of them is the number of different characters in it must be more than a certain number.

Some companies add repetitive characters in sequence to pas this check :-), if repetitive characters come in sequence they will not differ significantly in pronunciation which is the main point so consumer is not confused between them.

Bob JH Cullen SSCrazy Points: 2082 More actions · Answer 6

Maybe soundex() would be a valid approach if it is the pronunciation that matters.

'bok', 'book' and 'booookkk'' all produces a value of B200 in a quick test.

nadersam SSCertifiable Points: 6542 More actions · Answer 7

Bob JH Cullen (10/19/2015)
Maybe soundex() would be a valid approach if it is the pronunciation that matters.
'bok', 'book' and 'booookkk'' all produces a value of B200 in a quick test.

I have added soundex as well to my list of conditions but sometimes it gets totally irrelevant words with regards to pronunciation , that's why i had to add other conditions as well.

nadersam SSCertifiable Points: 6542 More actions · Answer 8

Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.

Any other option please?.

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 9

nadersam (10/19/2015)
Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.
Any other option please?.

Self-join the character list cte is another option

😎

nadersam SSCertifiable Points: 6542 More actions · Answer 10

Eirikur Eiriksson (10/19/2015)
nadersam (10/19/2015)
Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.
Any other option please?.
Self-join the character list cte is another option
😎

Thank you very much Eirikur Eiriksson.

I really appreciate if you could write the syntax, i am trying to grab the tally tables and cte and that will take some time and i need to quickly finish that.

Once again thanks for your help.

nadersam SSCertifiable Points: 6542 More actions · Answer 11

Eirikur Eiriksson (10/19/2015)
nadersam (10/19/2015)
Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.
Any other option please?.
Self-join the character list cte is another option
😎

I have written it like what follows but not getting correct result, not sure exactly what's wrong.

Declare @InputValue as varchar(50)='ttttest';

WITH T(N) AS

(SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

, NUMS(N) AS (SELECT TOP(LEN(@InputValue)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)

,CHAR_LIST AS

(

SELECT

NM.N

,SUBSTRING(@InputValue,NM.N,1) AS TCHAR

FROM NUMS NM

)

,COMPARE_CHARS AS

(

select distinct t1.PREV_CHAR, t2.tchar

from

(

select min(n) mnval, tchar as PREV_CHAR

from CHAR_LIST

group by tchar

) t1

inner join CHAR_LIST t2

on t1.PREV_CHAR = t2.tchar

)

SELECT

(SELECT

CC.TCHAR

FROM COMPARE_CHARS CC

WHERE CC.TCHAR <> CC.PREV_CHAR

FOR XML PATH(''), TYPE).value('.[1]','VARCHAR(500)')

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 12

nadersam (10/19/2015)
Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.
Any other option please?.

In future posts, I recommend you post to the correct forum. This forum is for 2012. Not 2008.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

Would something like this work?

declare @InputValue as varchar(50)='ttttest',

@Word varchar(50);

with e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

eTally(n) as (select row_number() over (order by (select null)) from e1 a cross join e1 b), -- don't need more than a 100 (actually 50)

pivotletters as (

select

n,

ascii(substring(@InputValue,n,1)) LetterValue,

substring(@InputValue,n,1) letters,

rn = n - row_number() over (partition by substring(@InputValue,n,1) order by n)

from eTally where n <= len(@InputValue)

)

select @Word = stuff(( select -- min(n), letters from pivotletters group by letters, rn order by min(n)

'' + letters

from

pivotletters

group by

letters, rn

order by

min(n)

for xml path(''),TYPE).value('.','varchar(50)'),1,0,'');

select @InputValue, @Word;

nadersam SSCertifiable Points: 6542 More actions · Answer 14

Jeff Moden (10/19/2015)
nadersam (10/19/2015)
Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.
Any other option please?.
In future posts, I recommend you post to the correct forum. This forum is for 2012. Not 2008.

The post was not about 2012 or 2008 syntax at the beginning , then i got help using lag function which worked perfectly but unfortunately it didn't work on testing server as it was 2008, sorry about that.