I need to add a suffix to duplicate values

Question

I need to add a suffix to duplicate values

josetur12

Default port

Points: 1424
More actions
September 9, 2008 at 6:13 pm

#103577

Bellow is some sample data of the table. Somehow I need to loop through and identify each row containing the duplicate value and add a letter to it.
Note: column 5 in rows 3 and 4 is duplicate
The next six rows are also duplicates, and the following two.
I cannot just delete the duplicates because they are not really duplicates.
So, what I need to do is add a suffix to each duplicate value.
For example, rows 3 and 4 should be ZP19188AVHA and ZP19188AVHB and so on with all duplicates
Any idea how I can accomplish this task???
Thank you.
Col1 Col2 Col3 Col4 Col5 Col6
P19188AMR20685210BZP19188AMRHJ
P19188AVE20685210BZP19188AVEHJ
P19188AVH20685210BZP19188AVH *HJ
P19188AVH21091210BZP19188AVH *HJ
P19188COC206143151ZP19188COC *HJ
P19188COC206343151ZP19188COC *HJ
P19188COC206853151ZP19188COC *HJ
P19188COC206863151ZP19188COC *HJ
P19188COC210913151ZP19188COC *HJ
P19188COC211013151ZP19188COC *HJ
P19188PHM20685210BZP19188PHM *HJ
P19188PHM206856101ZP19188PHM *HJ
P19188RIM20685210BZP19188RIMHJ
P19188SAB20685210BZP19188SABHJ
P19188WEH20685210BZP19188WEHHJ

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Anam Verma SSCertifiable Points: 5365 More actions · Answer 1

Hi josetur12

I cannot just delete the duplicates because they are not really duplicates.
So, what I need to do is add a suffix to each duplicate value.

you said that these records are not really duplicates, then on what factors you are identifying the records to be duplicates. From data, i can say that all columns except col3. Please correct me if i am wrong?

Another question why do you want to append alphabets A-Z at the end of the duplicates. What if there are more than 26 duplicate rows, then you'll run out of alphabets A-Z. What will you append after that.

Ron McCullough SSC Guru Points: 63877 More actions · Answer 2

To assist others who desire to help you please read the article in my signature block and post your question accordingly, that is the table schema, test/sample data as an insert into with a union statement etc.

.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

josetur12 Default port Points: 1424 More actions · Answer 3

Thanks for looking into this issue..

Note: I added more info on table schema and sample data in a different format.

USE [TestDB]

GO

/****** Object: Table [dbo].[tblZGAE3HJ] Script Date: 09/10/2008 08:13:16 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[tblZGAE3HJ](

[OE] [nvarchar](12) NULL,

[VendorCode] [nvarchar](3) NULL,

[UC] [nvarchar](5) NULL,

[LastOfDlvArea] [nvarchar](10) NULL,

[DeptOBJ] [nvarchar](4) NULL,

[DocID] [nvarchar](12) NULL,

[FirstOfDeptCode] [nvarchar](6) NULL,

[FirstOfCaps] [nvarchar](10) NULL

) ON [PRIMARY]

SELECT 'SELECT '

+ QUOTENAME([OE],'''')+','

+ QUOTENAME([VendorCode],'''')+','

+ QUOTENAME([UC],'''')+','

+ QUOTENAME([LastOfDlvArea],'''')+','

+ QUOTENAME([DeptOBJ],'''')

+ QUOTENAME([DocID],'''')+','

+ QUOTENAME([FirstOfDeptCode],'''')+','

+ ' UNION ALL' FROM tblZGAE3HJ

SELECT 'P19188','AMR','20685','CPSCR','210B''ZP19188AMR','HJ' UNION ALL

SELECT 'P19188','AVE','20685','CPMNZ','210B''ZP19188AVE','HJ' UNION ALL

SELECT 'P19188','AVH','20685','CPMNZ','210B''ZP19188AVH','HJ' UNION ALL

SELECT 'P19188','AVH','21091','CHG1','210B''ZP19188AVH','HJ', UNION ALL

SELECT 'P19188','COC','20614','D14','3151''ZP19188COC','HJ', UNION ALL

SELECT 'P19188','COC','20634','PROBM','3151''ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','20685','CMGON','3151''ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','20686','MIRLO','3151''ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','21091','CHG1','3151''ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','21101','SFN1','3151''ZP19188COC','HJ, UNION ALL

SELECT 'P19188','PHM','20685','CPMNZ','210B''ZP19188PHM','HJ' UNION ALL

SELECT 'P19188','PHM','20685','CPROC','6101''ZP19188PHM','HJ' UNION ALL

SELECT 'P19188','RIM','20685','CPMNZ','210B''ZP19188RIM','HJ' UNION ALL

SELECT 'P19188','SAB','20685','CPROC','210B''ZP19188SAB','HJ' UNION ALL

SELECT 'P19188','WEH','20685','CPML','210B''ZP19188WEH','HJ' UNION ALL

What I mean is that in one column I have some duplicate values to that column only, the complete row is not a duplicate.

your second question is what if there are more than 26 duplicates, I was told there will never be more than 26 but, ideally, I would like to do something so if later there are more than 26 duplicates there will not be a probelm. I was thinking on adding another table that like this..

and some how use it .

Line#Suffix

1A

2B

3C

4D

5E

6F

7G

8H

9I

10J

11K

12L

13M

14N

15O

16P

17Q

18R

19S

20T

21U

22V

23W

24X

25Y

26Z

27AA

28AB

Ron McCullough SSC Guru Points: 63877 More actions · Answer 4

For my own self editication I created a table and loaded your data and then developed a CTE to identify all the duplicate entries in what was your Column 5. By the way I changed what was in Column 6 to assist me in debugging my CTE, basically I numbered then to correspond with the rows you mentioned in your original post.

Now so far here is what I have done.

CREATE Table #TableX

(Col1 VARCHAR(15),

Col2 VARCHAR(15),

Col3 VARCHAR(15),

Col4 VARCHAR(15),

Col5 VARCHAR(15),

Col6 VARCHAR(2)

)

INSERT INTO #Tablex(Col1,Col2,Col3,Col4,Col5,Col6)

SELECT 'P19188','AMR','20685','210B','ZP19188AMR','aa' UNION ALL

SELECT 'P19188','AVE','20685','210B','ZP19188AVE','02' UNION ALL

SELECT 'P19188','AVH','20685','210B','ZP19188AVH','03' UNION ALL

SELECT 'P19188','AVH','21091','210B','ZP19188AVH','04' UNION ALL

SELECT 'P19188','COC','20614','3151','ZP19188COC','05' UNION ALL

SELECT 'P19188','COC','20634','3151','ZP19188COC','06' UNION ALL

SELECT 'P19188','COC','20685','3151','ZP19188COC','07' UNION ALL

SELECT 'P19188','COC','20686','3151','ZP19188COC','08' UNION ALL

SELECT 'P19188','COC','21091','3151','ZP19188COC','09' UNION ALL

SELECT 'P19188','COC','21101','3151','ZP19188COC','10' UNION ALL

SELECT 'P19188','PHM','20685','210B','ZP19188PHM','11' UNION ALL

SELECT 'P19188','PHM','20685','6101','ZP19188PHM','12' UNION ALL

SELECT 'P19188','RIM','20685','210B','ZP19188RIM','13' UNION ALL

SELECT 'P19188','SAB','20685','210B','ZP19188SAB','14' UNION ALL

SELECT 'P19188','WEH','20685','210B','ZP19188WEH','15'

That takes care of the test data. Here is the CTE

;with numbered as(SELECT rowno=row_number() over

(Partition by COL5, COL2 order by COL5),COL1,COL2, COL3,COL4,COL5, COL6 FROM #TableX)

select * from numbered

Results of the select * from the CTE

Note that the first row of a entry has rowno value = 1,

For duplicates each duplicate entry has a rowno value greater than 1. (Have highlighted those as bold)

rownoCol5 Col6

1ZP19188AMRaa

1ZP19188AVE02

1ZP19188AVH03

2 ZP19188AVH04

1ZP19188COC05

2ZP19188COC06

3ZP19188COC07

4ZP19188COC08

5ZP19188COC09

6ZP19188COC10

1ZP19188PHM11

2ZP19188PHM12

1ZP19188RIM13

1ZP19188SAB14

1ZP19188WEH15

Now having identified the duplicates do you think you can compose the code to append a character to the value in column 6 for all rows with a rowno value greater than 1.

What I do not see in either your table's schema as posted or your original data a unique value (say an identity field) that will allow for the identification of a specific row in the table, thus allowing for the creation of an update statement. Would it be possible to add an identity field to your table named tblZGAE3HJ ?

If not repost and ask for further assistance.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

GSquared SSC Guru Points: 260824 More actions · Answer 5

declare @Alpha char(26)

select @alpha = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

select 1

while @@rowcount > 1

;with Dupes (Row, Col5) as

(select row_number() over (partition by col5 order by col5),

col5

from dbo.MyTable)

update Dupes

set Col5 = Col5 + substring(@alpha, row-1, 1)

where row between 2 and 27

That should do what you need.

If you have more than 26 copies of anything, you'll need to create a table of the append values and join to that. You can find out the largest number of duplicates like this:

select top 1 count(*)

from dbo.MyTable

group by Col5

order by count(*) desc

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Monica Chaturvedi SSC Journeyman Points: 80 More actions · Answer 6

You'll have to go match by match or else use cursors

First: select all rows with same column values

Hold a suffix variable

update each row and increment suffix when updating

Scott D-810830 SSC-Addicted Points: 494 More actions · Answer 7

I Believe this will do what you are wanting to do. It makes use of the Running Total (or whatever it is called) pattern to identify the proper suffix to append. (BTW, I added an identity column to the Main Table. I hope your real table has one.)

--This is your existing table

CREATE TABLE #MainTbl(

tblKey int identity(1,1) NOT NULL,

[OE] [nvarchar](12) NULL,

[VendorCode] [nvarchar](3) NULL,

[UC] [nvarchar](5) NULL,

[LastOfDlvArea] [nvarchar](10) NULL,

[DeptOBJ] [nvarchar](4) NULL,

[DocID] [nvarchar](12) NULL,

[FirstOfDeptCode] [nvarchar](6) NULL,

[FirstOfCaps] [nvarchar](10) NULL) ON [PRIMARY]

INSERT INTO #MainTbl (OE, VendorCode, UC, LastOfDlvArea, DeptOBJ, DocID, FirstOfDeptCode)

SELECT 'P19188','AMR','20685','CPSCR','210B','ZP19188AMR','HJ' UNION ALL

SELECT 'P19188','AVE','20685','CPMNZ','210B','ZP19188AVE','HJ' UNION ALL

SELECT 'P19188','AVH','20685','CPMNZ','210B','ZP19188AVH','HJ' UNION ALL

SELECT 'P19188','AVH','21091','CHG1' ,'210B','ZP19188AVH','HJ' UNION ALL

SELECT 'P19188','COC','20614','D14' ,'3151','ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','20634','PROBM','3151','ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','20685','CMGON','3151','ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','20686','MIRLO','3151','ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','21091','CHG1' ,'3151','ZP19188COC','HJ' UNION ALL

SELECT 'P19188','COC','21101','SFN1' ,'3151','ZP19188COC','HJ' UNION ALL

SELECT 'P19188','PHM','20685','CPMNZ','210B','ZP19188PHM','HJ' UNION ALL

SELECT 'P19188','PHM','20685','CPROC','6101','ZP19188PHM','HJ' UNION ALL

SELECT 'P19188','RIM','20685','CPMNZ','210B','ZP19188RIM','HJ' UNION ALL

SELECT 'P19188','SAB','20685','CPROC','210B','ZP19188SAB','HJ' UNION ALL

SELECT 'P19188','WEH','20685','CPML' ,'210B','ZP19188WEH','HJ'

--This is a suffix table that you can create

CREATE TABLE #Suffix(Cnt int NOT NULL,

Suffix nvarchar(2) NOT NULL,

PRIMARY KEY (Cnt) )

INSERT INTO #Suffix(Cnt, Suffix)

SELECT 1,'A' UNION

SELECT 2,'B' UNION

SELECT 3,'C' UNION

SELECT 4,'D' UNION

SELECT 5,'E' UNION

SELECT 6,'F' UNION

SELECT 7,'G' UNION

SELECT 8,'H' UNION

SELECT 9,'I' UNION

SELECT 10,'J' UNION

SELECT 11,'K' UNION

SELECT 12,'L' UNION

SELECT 13,'M' UNION

SELECT 14,'N' UNION

SELECT 15,'O' UNION

SELECT 16,'P' UNION

SELECT 17,'Q' UNION

SELECT 18,'R' UNION

SELECT 19,'S' UNION

SELECT 20,'T' UNION

SELECT 21,'U' UNION

SELECT 22,'V' UNION

SELECT 23,'W' UNION

SELECT 24,'X' UNION

SELECT 25,'Y' UNION

SELECT 26,'Z' UNION

SELECT 27,'AA' UNION

SELECT 28,'AB' UNION

SELECT 29,'AC' UNION

SELECT 30,'AD'

--Here is where we will find duplicates

CREATE TABLE #temp(

tblKey int NOT NULL,

DocID nvarchar(12) NOT NULL,

Cnt int NOT NULL)

INSERT INTO #temp( tblKey, DocID, Cnt)

SELECT tblKey, DocID, 0

FROM #MainTbl

ORDER BY DocID, tblKey

DECLARE @DocID nvarchar(12)

DECLARE @NewDocID int

DECLARE @Cnt int

SET @DocID = ''

SET @NewDocID = 0

SET @Cnt = 0

UPDATE #temp

SET @NewDocID = CASE WHEN @DocID = DocID THEN 0 ELSE 1 END,

@Cnt = Cnt = CASE WHEN @NewDocID = 1 THEN 1 ELSE @Cnt + 1 END,

@DocID = DocID

UPDATE t

SET Cnt = 0

FROM #temp t

INNER JOIN (

--Anything with a count of 1 is not duplicated

SELECT DocID, MAX(Cnt) MaxCnt

FROM #temp

GROUP BY DocID

HAVING MAX(Cnt) = 1) x ON t.DocID = x.DocID

--Update our DocID with the correct suffix

UPDATE m

SET DocID = m.DocID + s.Suffix

FROM #MainTbl m

INNER JOIN #temp t ON m.tblKey = t.tblKey

INNER JOIN #Suffix s ON t.Cnt = s.Cnt

SELECT * FROM #MainTbl

DROP TABLE #MainTbl

DROP TABLE #temp

DROP TABLE #Suffix

Scott

josetur12 Default port Points: 1424 More actions · Answer 8

Scott, this worked perfectly. Thank you and thanks to all...

Jose