March 9, 2011 at 12:21 pm
Hello All,
Is it possible to find out rows with repeating letters in a word using TSQL, if not what is the best approach ?
for example naame and name .. so it should give me row with naame as "aa" is repeated in it.
create table testing (id int identity(1,1), name varchar(20))
insert into testing values ( 'name')
insert into testing values ( 'nname')
insert into testing values ( 'naame')
insert into testing values ( 'naamme')
insert into testing values ( 'nnamee')
insert into testing values ( 'sqlserver')
insert into testing values ( 'sqqlservver')
I need to find that row and replace the repeting letter with single letter.
Thanks
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 12:28 pm
And what do you want to do if the word has a legitimate repeat in it? Aardvark, apple, peer, assimilate, better, coordinate, dunnage, tonne, etc., all have repeating letters, but are supposed to. (Hey, even "supposed" does. Isn't that aggravating! Sorry, couldn't resist.)
If you really need to get rid of all repeats, what do you want to do with tripples and more? Should "aaa" be reduced to "aa" (single-pass dedupe) or to "a" (full dedupe)?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 9, 2011 at 12:57 pm
GSquared (3/9/2011)
And what do you want to do if the word has a legitimate repeat in it? Aardvark, apple, peer, assimilate, better, coordinate, dunnage, tonne, etc., all have repeating letters, but are supposed to. (Hey, even "supposed" does. Isn't that aggravating! Sorry, couldn't resist.)If you really need to get rid of all repeats, what do you want to do with tripples and more? Should "aaa" be reduced to "aa" (single-pass dedupe) or to "a" (full dedupe)?
I want it full dedupe. For legitimate repeate, is there a way to check ? If no, its fine .. as I am putting all values in a different temp table as opposed to the actual live table.
Do you have a scrip that does it or can u guide me in right direction ?
Thanks for u r help !
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 12:57 pm
GSquared (3/9/2011)
And what do you want to do if the word has a legitimate repeat in it? Aardvark, apple, peer, assimilate, better, coordinate, dunnage, tonne, etc., all have repeating letters, but are supposed to. (Hey, even "supposed" does. Isn't that aggravating! Sorry, couldn't resist.)If you really need to get rid of all repeats, what do you want to do with tripples and more? Should "aaa" be reduced to "aa" (single-pass dedupe) or to "a" (full dedupe)?
I want it full dedupe. For legitimate repeate, is there a way to check ? If no, its fine .. as I am putting all values in a different temp table as opposed to the actual live table.
Do you have a scrip that does it or can u guide me in right direction ?
Thanks for u r help !
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 1:23 pm
Something like this will do a full dedupe. It won't leave deliberate dupes intact. The only way I can think of to do that would be a full set of exceptions, which would be difficult to get and would kill the speed on the thing. Not sure what you can do about that. Might be better to use some sort of spellcheck API if you need to do that.
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T ;
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
DROP TABLE #T2 ;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
StringCol VARCHAR(100)) ;
INSERT INTO #T
(StringCol)
VALUES ('aaaaaaaabc'),
('abbc'),
('aaabbbccc') ;
SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop
INTO #T2 ;
WHILE @@rowcount > 0
WITH Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
SELECT *
FROM #T ;
I use a Numbers table to get a table of the letters of the alphabet. If you don't have a Numbers table, either build one, or just build a table of letters some other way (doesn't really matter how).
If your database or column collation is case-sensitive, you'll need to expand the Letters dataset to include capital letters as well as lower-case ones.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 9, 2011 at 1:42 pm
GSquared (3/9/2011)
Something like this will do a full dedupe. It won't leave deliberate dupes intact. The only way I can think of to do that would be a full set of exceptions, which would be difficult to get and would kill the speed on the thing. Not sure what you can do about that. Might be better to use some sort of spellcheck API if you need to do that.
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T ;
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
DROP TABLE #T2 ;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
StringCol VARCHAR(100)) ;
INSERT INTO #T
(StringCol)
VALUES ('aaaaaaaabc'),
('abbc'),
('aaabbbccc') ;
SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop
INTO #T2 ;
WHILE @@rowcount > 0
WITH Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
SELECT *
FROM #T ;
I use a Numbers table to get a table of the letters of the alphabet. If you don't have a Numbers table, either build one, or just build a table of letters some other way (doesn't really matter how).
If your database or column collation is case-sensitive, you'll need to expand the Letters dataset to include capital letters as well as lower-case ones.
Thanks for u r help ..
Here is what I have done..
I have build
create table numbers (number int identity(1,1), letter char(1))
insert into numbers values ('a')
.
.
insert into numbers values ('z')
when I run your query, it gives me
Msg 319, Level 15, State 1, Line 26
Incorrect syntax near the keyword 'with'. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.
As I am bit new to programming.. can u explain what I need to do for replacing the column name that I want to replace the recurring letters ?
Thanks
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 1:42 pm
GSquared (3/9/2011)
Something like this will do a full dedupe. It won't leave deliberate dupes intact. The only way I can think of to do that would be a full set of exceptions, which would be difficult to get and would kill the speed on the thing. Not sure what you can do about that. Might be better to use some sort of spellcheck API if you need to do that.
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T ;
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
DROP TABLE #T2 ;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
StringCol VARCHAR(100)) ;
INSERT INTO #T
(StringCol)
VALUES ('aaaaaaaabc'),
('abbc'),
('aaabbbccc') ;
SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop
INTO #T2 ;
WHILE @@rowcount > 0
WITH Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
SELECT *
FROM #T ;
I use a Numbers table to get a table of the letters of the alphabet. If you don't have a Numbers table, either build one, or just build a table of letters some other way (doesn't really matter how).
If your database or column collation is case-sensitive, you'll need to expand the Letters dataset to include capital letters as well as lower-case ones.
Thanks for u r help ..
Here is what I have done..
I have build
create table numbers (number int identity(1,1), letter char(1))
insert into numbers values ('a')
.
.
insert into numbers values ('z')
when I run your query, it gives me
Msg 319, Level 15, State 1, Line 26
Incorrect syntax near the keyword 'with'. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.
As I am bit new to programming.. can u explain what I need to do for replacing the column name that I want to replace the recurring letters ?
Thanks
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 1:44 pm
I would need to see the actual code you're using before I could tell you what to do to fix a syntax error.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 9, 2011 at 1:48 pm
I am using the one u provided:
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T ;
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
DROP TABLE #T2 ;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
StringCol VARCHAR(100)) ;
INSERT INTO #T (StringCol) VALUES ('aaaaaaaabc');
INSERT INTO #T (StringCol) VALUES ('abbc');
INSERT INTO #T (StringCol) VALUES ('aaabbbccc');
SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop
INTO #T2 ;
WHILE @@rowcount > 0
with Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
SELECT *
FROM #T
create table numbers (number int identity(1,1), letter char(1))
insert into numbers values ('a')
insert into numbers values ('b')
insert into numbers values ('c')
insert into numbers values ('d')
insert into numbers values ('e')
insert into numbers values ('f')
insert into numbers values ('g')
insert into numbers values ('h')
insert into numbers values ('i')
insert into numbers values ('j')
insert into numbers values ('k')
insert into numbers values ('l')
insert into numbers values ('m')
insert into numbers values ('n')
insert into numbers values ('o')
insert into numbers values ('p')
insert into numbers values ('q')
insert into numbers values ('r')
insert into numbers values ('s')
insert into numbers values ('t')
insert into numbers values ('u')
insert into numbers values ('v')
insert into numbers values ('w')
insert into numbers values ('x')
insert into numbers values ('y')
insert into numbers values ('z')
Thanks
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 1:48 pm
I am using the one u provided:
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T ;
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
DROP TABLE #T2 ;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
StringCol VARCHAR(100)) ;
INSERT INTO #T (StringCol) VALUES ('aaaaaaaabc');
INSERT INTO #T (StringCol) VALUES ('abbc');
INSERT INTO #T (StringCol) VALUES ('aaabbbccc');
SELECT 1 AS Filler -- just used to set @@rowcount to 1 for While loop
INTO #T2 ;
WHILE @@rowcount > 0
with Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
SELECT *
FROM #T
create table numbers (number int identity(1,1), letter char(1))
insert into numbers values ('a')
insert into numbers values ('b')
insert into numbers values ('c')
insert into numbers values ('d')
insert into numbers values ('e')
insert into numbers values ('f')
insert into numbers values ('g')
insert into numbers values ('h')
insert into numbers values ('i')
insert into numbers values ('j')
insert into numbers values ('k')
insert into numbers values ('l')
insert into numbers values ('m')
insert into numbers values ('n')
insert into numbers values ('o')
insert into numbers values ('p')
insert into numbers values ('q')
insert into numbers values ('r')
insert into numbers values ('s')
insert into numbers values ('t')
insert into numbers values ('u')
insert into numbers values ('v')
insert into numbers values ('w')
insert into numbers values ('x')
insert into numbers values ('y')
insert into numbers values ('z')
Thanks
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 2:32 pm
Since you already have the letters (in your Numbers table), you don't need the CTE at all.
Change:
WHILE @@rowcount > 0
with Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
To:
WHILE @@rowcount > 0
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN dbo.Numbers
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 9, 2011 at 2:52 pm
GSquared (3/9/2011)
Since you already have the letters (in your Numbers table), you don't need the CTE at all.Change:
WHILE @@rowcount > 0
with Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
To:
WHILE @@rowcount > 0
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN dbo.Numbers
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
Thanks a lot for your help.. that worked and did the job .. quick and clever way of doing 🙂
Just curious, can I extend this to work for below:
i.e replace aa with a
and aaa with aa
and aaaa with aaa
Cheers !
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 9, 2011 at 2:53 pm
GSquared (3/9/2011)
Since you already have the letters (in your Numbers table), you don't need the CTE at all.Change:
WHILE @@rowcount > 0
with Letters(Letter)
AS (SELECT SUBSTRING('abcdefghijklmnopqrstuvwxyz', Number, 1)
FROM dbo.Numbers
WHERE Number BETWEEN 1 AND 26)
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN Letters
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
To:
WHILE @@rowcount > 0
UPDATE T
SET StringCol = REPLACE(StringCol, Letter + Letter, Letter)
FROM #T AS T
INNER JOIN dbo.Numbers
ON T.StringCol LIKE '%' + Letter + Letter + '%' ;
Thanks a lot for your help.. that worked and did the job .. quick and clever way of doing 🙂
Just curious, can I extend this to work for below:
i.e replace aa with a
and aaa with aa
and aaaa with aaa
Cheers !
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
March 10, 2011 at 6:55 am
Yes. You just need to change the Like statement to include the Letter column as many times as you're looking for, and then change the Replace statement the same way.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 11, 2011 at 9:00 am
Is there any way to dedupe this posting in the same way? 😀
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply