String search and replace it with single letter

  • Hello All,

    Is it possible to find out rows with repeating letters in a word using TSQL, if not what is the best approach ?

    for example naame and name .. so it should give me row with naame as "aa" is repeated in it.

    create table testing (id int identity(1,1), name varchar(20))

    insert into testing values ( 'name')

    insert into testing values ( 'nname')

    insert into testing values ( 'naame')

    insert into testing values ( 'naamme')

    insert into testing values ( 'nnamee')

    insert into testing values ( 'sqlserver')

    insert into testing values ( 'sqqlservver')

    I need to find that row and replace the repeting letter with single letter.

    Thanks

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • And what do you want to do if the word has a legitimate repeat in it? Aardvark, apple, peer, assimilate, better, coordinate, dunnage, tonne, etc., all have repeating letters, but are supposed to. (Hey, even "supposed" does. Isn't that aggravating! Sorry, couldn't resist.)

    If you really need to get rid of all repeats, what do you want to do with tripples and more? Should "aaa" be reduced to "aa" (single-pass dedupe) or to "a" (full dedupe)?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/9/2011)


    And what do you want to do if the word has a legitimate repeat in it? Aardvark, apple, peer, assimilate, better, coordinate, dunnage, tonne, etc., all have repeating letters, but are supposed to. (Hey, even "supposed" does. Isn't that aggravating! Sorry, couldn't resist.)

    If you really need to get rid of all repeats, what do you want to do with tripples and more? Should "aaa" be reduced to "aa" (single-pass dedupe) or to "a" (full dedupe)?

    I want it full dedupe. For legitimate repeate, is there a way to check ? If no, its fine .. as I am putting all values in a different temp table as opposed to the actual live table.

    Do you have a scrip that does it or can u guide me in right direction ?

    Thanks for u r help !

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • GSquared (3/9/2011)


    And what do you want to do if the word has a legitimate repeat in it? Aardvark, apple, peer, assimilate, better, coordinate, dunnage, tonne, etc., all have repeating letters, but are supposed to. (Hey, even "supposed" does. Isn't that aggravating! Sorry, couldn't resist.)

    If you really need to get rid of all repeats, what do you want to do with tripples and more? Should "aaa" be reduced to "aa" (single-pass dedupe) or to "a" (full dedupe)?

    I want it full dedupe. For legitimate repeate, is there a way to check ? If no, its fine .. as I am putting all values in a different temp table as opposed to the actual live table.

    Do you have a scrip that does it or can u guide me in right direction ?

    Thanks for u r help !

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • Something like this will do a full dedupe. It won't leave deliberate dupes intact. The only way I can think of to do that would be a full set of exceptions, which would be difficult to get and would kill the speed on the thing. Not sure what you can do about that. Might be better to use some sort of spellcheck API if you need to do that.

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T ;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T2 ;

    CREATE TABLE #T

    (ID INT IDENTITY

    PRIMARY KEY,

    StringCol VARCHAR(100)) ;

    INSERT INTO #T

    (StringCol)

    VALUES ('aaaaaaaabc'),

    ('abbc'),

    ('aaabbbccc') ;

    SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop

    INTO #T2 ;

    WHILE @@rowcount > 0

    WITH Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    SELECT *

    FROM #T ;

    I use a Numbers table to get a table of the letters of the alphabet. If you don't have a Numbers table, either build one, or just build a table of letters some other way (doesn't really matter how).

    If your database or column collation is case-sensitive, you'll need to expand the Letters dataset to include capital letters as well as lower-case ones.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/9/2011)


    Something like this will do a full dedupe. It won't leave deliberate dupes intact. The only way I can think of to do that would be a full set of exceptions, which would be difficult to get and would kill the speed on the thing. Not sure what you can do about that. Might be better to use some sort of spellcheck API if you need to do that.

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T ;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T2 ;

    CREATE TABLE #T

    (ID INT IDENTITY

    PRIMARY KEY,

    StringCol VARCHAR(100)) ;

    INSERT INTO #T

    (StringCol)

    VALUES ('aaaaaaaabc'),

    ('abbc'),

    ('aaabbbccc') ;

    SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop

    INTO #T2 ;

    WHILE @@rowcount > 0

    WITH Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    SELECT *

    FROM #T ;

    I use a Numbers table to get a table of the letters of the alphabet. If you don't have a Numbers table, either build one, or just build a table of letters some other way (doesn't really matter how).

    If your database or column collation is case-sensitive, you'll need to expand the Letters dataset to include capital letters as well as lower-case ones.

    Thanks for u r help ..

    Here is what I have done..

    I have build

    create table numbers (number int identity(1,1), letter char(1))

    insert into numbers values ('a')

    .

    .

    insert into numbers values ('z')

    when I run your query, it gives me

    Msg 319, Level 15, State 1, Line 26

    Incorrect syntax near the keyword 'with'. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.

    As I am bit new to programming.. can u explain what I need to do for replacing the column name that I want to replace the recurring letters ?

    Thanks

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • GSquared (3/9/2011)


    Something like this will do a full dedupe. It won't leave deliberate dupes intact. The only way I can think of to do that would be a full set of exceptions, which would be difficult to get and would kill the speed on the thing. Not sure what you can do about that. Might be better to use some sort of spellcheck API if you need to do that.

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T ;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T2 ;

    CREATE TABLE #T

    (ID INT IDENTITY

    PRIMARY KEY,

    StringCol VARCHAR(100)) ;

    INSERT INTO #T

    (StringCol)

    VALUES ('aaaaaaaabc'),

    ('abbc'),

    ('aaabbbccc') ;

    SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop

    INTO #T2 ;

    WHILE @@rowcount > 0

    WITH Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    SELECT *

    FROM #T ;

    I use a Numbers table to get a table of the letters of the alphabet. If you don't have a Numbers table, either build one, or just build a table of letters some other way (doesn't really matter how).

    If your database or column collation is case-sensitive, you'll need to expand the Letters dataset to include capital letters as well as lower-case ones.

    Thanks for u r help ..

    Here is what I have done..

    I have build

    create table numbers (number int identity(1,1), letter char(1))

    insert into numbers values ('a')

    .

    .

    insert into numbers values ('z')

    when I run your query, it gives me

    Msg 319, Level 15, State 1, Line 26

    Incorrect syntax near the keyword 'with'. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.

    As I am bit new to programming.. can u explain what I need to do for replacing the column name that I want to replace the recurring letters ?

    Thanks

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • I would need to see the actual code you're using before I could tell you what to do to fix a syntax error.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I am using the one u provided:

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T ;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T2 ;

    CREATE TABLE #T

    (ID INT IDENTITY

    PRIMARY KEY,

    StringCol VARCHAR(100)) ;

    INSERT INTO #T (StringCol) VALUES ('aaaaaaaabc');

    INSERT INTO #T (StringCol) VALUES ('abbc');

    INSERT INTO #T (StringCol) VALUES ('aaabbbccc');

    SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop

    INTO #T2 ;

    WHILE @@rowcount > 0

    with Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    SELECT *

    FROM #T

    create table numbers (number int identity(1,1), letter char(1))

    insert into numbers values ('a')

    insert into numbers values ('b')

    insert into numbers values ('c')

    insert into numbers values ('d')

    insert into numbers values ('e')

    insert into numbers values ('f')

    insert into numbers values ('g')

    insert into numbers values ('h')

    insert into numbers values ('i')

    insert into numbers values ('j')

    insert into numbers values ('k')

    insert into numbers values ('l')

    insert into numbers values ('m')

    insert into numbers values ('n')

    insert into numbers values ('o')

    insert into numbers values ('p')

    insert into numbers values ('q')

    insert into numbers values ('r')

    insert into numbers values ('s')

    insert into numbers values ('t')

    insert into numbers values ('u')

    insert into numbers values ('v')

    insert into numbers values ('w')

    insert into numbers values ('x')

    insert into numbers values ('y')

    insert into numbers values ('z')

    Thanks

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • I am using the one u provided:

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T ;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T2 ;

    CREATE TABLE #T

    (ID INT IDENTITY

    PRIMARY KEY,

    StringCol VARCHAR(100)) ;

    INSERT INTO #T (StringCol) VALUES ('aaaaaaaabc');

    INSERT INTO #T (StringCol) VALUES ('abbc');

    INSERT INTO #T (StringCol) VALUES ('aaabbbccc');

    SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop

    INTO #T2 ;

    WHILE @@rowcount > 0

    with Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    SELECT *

    FROM #T

    create table numbers (number int identity(1,1), letter char(1))

    insert into numbers values ('a')

    insert into numbers values ('b')

    insert into numbers values ('c')

    insert into numbers values ('d')

    insert into numbers values ('e')

    insert into numbers values ('f')

    insert into numbers values ('g')

    insert into numbers values ('h')

    insert into numbers values ('i')

    insert into numbers values ('j')

    insert into numbers values ('k')

    insert into numbers values ('l')

    insert into numbers values ('m')

    insert into numbers values ('n')

    insert into numbers values ('o')

    insert into numbers values ('p')

    insert into numbers values ('q')

    insert into numbers values ('r')

    insert into numbers values ('s')

    insert into numbers values ('t')

    insert into numbers values ('u')

    insert into numbers values ('v')

    insert into numbers values ('w')

    insert into numbers values ('x')

    insert into numbers values ('y')

    insert into numbers values ('z')

    Thanks

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • Since you already have the letters (in your Numbers table), you don't need the CTE at all.

    Change:

    WHILE @@rowcount > 0

    with Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    To:

    WHILE @@rowcount > 0

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN dbo.Numbers

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/9/2011)


    Since you already have the letters (in your Numbers table), you don't need the CTE at all.

    Change:

    WHILE @@rowcount > 0

    with Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    To:

    WHILE @@rowcount > 0

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN dbo.Numbers

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    Thanks a lot for your help.. that worked and did the job .. quick and clever way of doing 🙂

    Just curious, can I extend this to work for below:

    i.e replace aa with a

    and aaa with aa

    and aaaa with aaa

    Cheers !

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • GSquared (3/9/2011)


    Since you already have the letters (in your Numbers table), you don't need the CTE at all.

    Change:

    WHILE @@rowcount > 0

    with Letters(Letter)

    AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)

    FROM dbo.Numbers

    WHERE Number BETWEEN 1 AND 26)

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN Letters

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    To:

    WHILE @@rowcount > 0

    UPDATE T

    SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)

    FROM #T AS T

    INNER JOIN dbo.Numbers

    ON T.StringCol LIKE '%' + Letter + Letter + '%' ;

    Thanks a lot for your help.. that worked and did the job .. quick and clever way of doing 🙂

    Just curious, can I extend this to work for below:

    i.e replace aa with a

    and aaa with aa

    and aaaa with aaa

    Cheers !

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

  • Yes. You just need to change the Like statement to include the Letter column as many times as you're looking for, and then change the Replace statement the same way.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Is there any way to dedupe this posting in the same way? 😀

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply