I need to add a suffix to duplicate values

  • Bellow is some sample data of the table. Somehow I need to loop through and identify each row containing the duplicate value and add a letter to it.

    Note: column 5 in rows 3 and 4 is duplicate

    The next six rows are also duplicates, and the following two.

    I cannot just delete the duplicates because they are not really duplicates.

    So, what I need to do is add a suffix to each duplicate value.

    For example, rows 3 and 4 should be ZP19188AVHA and ZP19188AVHB and so on with all duplicates

    Any idea how I can accomplish this task???

    Thank you.

    Col1 Col2 Col3 Col4 Col5 Col6

    P19188AMR20685210BZP19188AMRHJ

    P19188AVE20685210BZP19188AVEHJ

    P19188AVH20685210BZP19188AVH *HJ

    P19188AVH21091210BZP19188AVH *HJ

    P19188COC206143151ZP19188COC *HJ

    P19188COC206343151ZP19188COC *HJ

    P19188COC206853151ZP19188COC *HJ

    P19188COC206863151ZP19188COC *HJ

    P19188COC210913151ZP19188COC *HJ

    P19188COC211013151ZP19188COC *HJ

    P19188PHM20685210BZP19188PHM *HJ

    P19188PHM206856101ZP19188PHM *HJ

    P19188RIM20685210BZP19188RIMHJ

    P19188SAB20685210BZP19188SABHJ

    P19188WEH20685210BZP19188WEHHJ

  • Hi josetur12

    I cannot just delete the duplicates because they are not really duplicates.

    So, what I need to do is add a suffix to each duplicate value.

    you said that these records are not really duplicates, then on what factors you are identifying the records to be duplicates. From data, i can say that all columns except col3. Please correct me if i am wrong?

    Another question why do you want to append alphabets A-Z at the end of the duplicates. What if there are more than 26 duplicate rows, then you'll run out of alphabets A-Z. What will you append after that.

  • To assist others who desire to help you please read the article in my signature block and post your question accordingly, that is the table schema, test/sample data as an insert into with a union statement etc.

    .

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Thanks for looking into this issue..

    Note: I added more info on table schema and sample data in a different format.

    USE [TestDB]

    GO

    /****** Object: Table [dbo].[tblZGAE3HJ] Script Date: 09/10/2008 08:13:16 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[tblZGAE3HJ](

    [OE] [nvarchar](12) NULL,

    [VendorCode] [nvarchar](3) NULL,

    [UC] [nvarchar](5) NULL,

    [LastOfDlvArea] [nvarchar](10) NULL,

    [DeptOBJ] [nvarchar](4) NULL,

    [DocID] [nvarchar](12) NULL,

    [FirstOfDeptCode] [nvarchar](6) NULL,

    [FirstOfCaps] [nvarchar](10) NULL

    ) ON [PRIMARY]

    SELECT 'SELECT '

    + QUOTENAME([OE],'''')+','

    + QUOTENAME([VendorCode],'''')+','

    + QUOTENAME([UC],'''')+','

    + QUOTENAME([LastOfDlvArea],'''')+','

    + QUOTENAME([DeptOBJ],'''')

    + QUOTENAME([DocID],'''')+','

    + QUOTENAME([FirstOfDeptCode],'''')+','

    + ' UNION ALL' FROM tblZGAE3HJ

    SELECT 'P19188','AMR','20685','CPSCR','210B''ZP19188AMR','HJ' UNION ALL

    SELECT 'P19188','AVE','20685','CPMNZ','210B''ZP19188AVE','HJ' UNION ALL

    SELECT 'P19188','AVH','20685','CPMNZ','210B''ZP19188AVH','HJ' UNION ALL

    SELECT 'P19188','AVH','21091','CHG1','210B''ZP19188AVH','HJ', UNION ALL

    SELECT 'P19188','COC','20614','D14','3151''ZP19188COC','HJ', UNION ALL

    SELECT 'P19188','COC','20634','PROBM','3151''ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','20685','CMGON','3151''ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','20686','MIRLO','3151''ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','21091','CHG1','3151''ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','21101','SFN1','3151''ZP19188COC','HJ, UNION ALL

    SELECT 'P19188','PHM','20685','CPMNZ','210B''ZP19188PHM','HJ' UNION ALL

    SELECT 'P19188','PHM','20685','CPROC','6101''ZP19188PHM','HJ' UNION ALL

    SELECT 'P19188','RIM','20685','CPMNZ','210B''ZP19188RIM','HJ' UNION ALL

    SELECT 'P19188','SAB','20685','CPROC','210B''ZP19188SAB','HJ' UNION ALL

    SELECT 'P19188','WEH','20685','CPML','210B''ZP19188WEH','HJ' UNION ALL

    What I mean is that in one column I have some duplicate values to that column only, the complete row is not a duplicate.

    your second question is what if there are more than 26 duplicates, I was told there will never be more than 26 but, ideally, I would like to do something so if later there are more than 26 duplicates there will not be a probelm. I was thinking on adding another table that like this..

    and some how use it .

    Line#Suffix

    1A

    2B

    3C

    4D

    5E

    6F

    7G

    8H

    9I

    10J

    11K

    12L

    13M

    14N

    15O

    16P

    17Q

    18R

    19S

    20T

    21U

    22V

    23W

    24X

    25Y

    26Z

    27AA

    28AB

  • For my own self editication I created a table and loaded your data and then developed a CTE to identify all the duplicate entries in what was your Column 5. By the way I changed what was in Column 6 to assist me in debugging my CTE, basically I numbered then to correspond with the rows you mentioned in your original post.

    Now so far here is what I have done.

    CREATE Table #TableX

    (Col1 VARCHAR(15),

    Col2 VARCHAR(15),

    Col3 VARCHAR(15),

    Col4 VARCHAR(15),

    Col5 VARCHAR(15),

    Col6 VARCHAR(2)

    )

    INSERT INTO #Tablex(Col1,Col2,Col3,Col4,Col5,Col6)

    SELECT 'P19188','AMR','20685','210B','ZP19188AMR','aa' UNION ALL

    SELECT 'P19188','AVE','20685','210B','ZP19188AVE','02' UNION ALL

    SELECT 'P19188','AVH','20685','210B','ZP19188AVH','03' UNION ALL

    SELECT 'P19188','AVH','21091','210B','ZP19188AVH','04' UNION ALL

    SELECT 'P19188','COC','20614','3151','ZP19188COC','05' UNION ALL

    SELECT 'P19188','COC','20634','3151','ZP19188COC','06' UNION ALL

    SELECT 'P19188','COC','20685','3151','ZP19188COC','07' UNION ALL

    SELECT 'P19188','COC','20686','3151','ZP19188COC','08' UNION ALL

    SELECT 'P19188','COC','21091','3151','ZP19188COC','09' UNION ALL

    SELECT 'P19188','COC','21101','3151','ZP19188COC','10' UNION ALL

    SELECT 'P19188','PHM','20685','210B','ZP19188PHM','11' UNION ALL

    SELECT 'P19188','PHM','20685','6101','ZP19188PHM','12' UNION ALL

    SELECT 'P19188','RIM','20685','210B','ZP19188RIM','13' UNION ALL

    SELECT 'P19188','SAB','20685','210B','ZP19188SAB','14' UNION ALL

    SELECT 'P19188','WEH','20685','210B','ZP19188WEH','15'

    That takes care of the test data. Here is the CTE

    ;with numbered as(SELECT rowno=row_number() over

    (Partition by COL5, COL2 order by COL5),COL1,COL2, COL3,COL4,COL5, COL6 FROM #TableX)

    select * from numbered

    Results of the select * from the CTE

    Note that the first row of a entry has rowno value = 1,

    For duplicates each duplicate entry has a rowno value greater than 1. (Have highlighted those as bold)

    rownoCol5 Col6

    1ZP19188AMRaa

    1ZP19188AVE02

    1ZP19188AVH03

    2 ZP19188AVH04

    1ZP19188COC05

    2ZP19188COC06

    3ZP19188COC07

    4ZP19188COC08

    5ZP19188COC09

    6ZP19188COC10

    1ZP19188PHM11

    2ZP19188PHM12

    1ZP19188RIM13

    1ZP19188SAB14

    1ZP19188WEH15

    Now having identified the duplicates do you think you can compose the code to append a character to the value in column 6 for all rows with a rowno value greater than 1.

    What I do not see in either your table's schema as posted or your original data a unique value (say an identity field) that will allow for the identification of a specific row in the table, thus allowing for the creation of an update statement. Would it be possible to add an identity field to your table named tblZGAE3HJ ?

    If not repost and ask for further assistance.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • declare @Alpha char(26)

    select @alpha = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

    select 1

    while @@rowcount > 1

    ;with Dupes (Row, Col5) as

    (select row_number() over (partition by col5 order by col5),

    col5

    from dbo.MyTable)

    update Dupes

    set Col5 = Col5 + substring(@alpha, row-1, 1)

    where row between 2 and 27

    That should do what you need.

    If you have more than 26 copies of anything, you'll need to create a table of the append values and join to that. You can find out the largest number of duplicates like this:

    select top 1 count(*)

    from dbo.MyTable

    group by Col5

    order by count(*) desc

    Does that help?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • You'll have to go match by match or else use cursors

    First: select all rows with same column values

    Hold a suffix variable

    update each row and increment suffix when updating

  • I Believe this will do what you are wanting to do. It makes use of the Running Total (or whatever it is called) pattern to identify the proper suffix to append. (BTW, I added an identity column to the Main Table. I hope your real table has one.)

    --This is your existing table

    CREATE TABLE #MainTbl(

    tblKey int identity(1,1) NOT NULL,

    [OE] [nvarchar](12) NULL,

    [VendorCode] [nvarchar](3) NULL,

    [UC] [nvarchar](5) NULL,

    [LastOfDlvArea] [nvarchar](10) NULL,

    [DeptOBJ] [nvarchar](4) NULL,

    [DocID] [nvarchar](12) NULL,

    [FirstOfDeptCode] [nvarchar](6) NULL,

    [FirstOfCaps] [nvarchar](10) NULL) ON [PRIMARY]

    INSERT INTO #MainTbl (OE, VendorCode, UC, LastOfDlvArea, DeptOBJ, DocID, FirstOfDeptCode)

    SELECT 'P19188','AMR','20685','CPSCR','210B','ZP19188AMR','HJ' UNION ALL

    SELECT 'P19188','AVE','20685','CPMNZ','210B','ZP19188AVE','HJ' UNION ALL

    SELECT 'P19188','AVH','20685','CPMNZ','210B','ZP19188AVH','HJ' UNION ALL

    SELECT 'P19188','AVH','21091','CHG1' ,'210B','ZP19188AVH','HJ' UNION ALL

    SELECT 'P19188','COC','20614','D14' ,'3151','ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','20634','PROBM','3151','ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','20685','CMGON','3151','ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','20686','MIRLO','3151','ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','21091','CHG1' ,'3151','ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','COC','21101','SFN1' ,'3151','ZP19188COC','HJ' UNION ALL

    SELECT 'P19188','PHM','20685','CPMNZ','210B','ZP19188PHM','HJ' UNION ALL

    SELECT 'P19188','PHM','20685','CPROC','6101','ZP19188PHM','HJ' UNION ALL

    SELECT 'P19188','RIM','20685','CPMNZ','210B','ZP19188RIM','HJ' UNION ALL

    SELECT 'P19188','SAB','20685','CPROC','210B','ZP19188SAB','HJ' UNION ALL

    SELECT 'P19188','WEH','20685','CPML' ,'210B','ZP19188WEH','HJ'

    --This is a suffix table that you can create

    CREATE TABLE #Suffix(Cnt int NOT NULL,

    Suffix nvarchar(2) NOT NULL,

    PRIMARY KEY (Cnt) )

    INSERT INTO #Suffix(Cnt, Suffix)

    SELECT 1,'A' UNION

    SELECT 2,'B' UNION

    SELECT 3,'C' UNION

    SELECT 4,'D' UNION

    SELECT 5,'E' UNION

    SELECT 6,'F' UNION

    SELECT 7,'G' UNION

    SELECT 8,'H' UNION

    SELECT 9,'I' UNION

    SELECT 10,'J' UNION

    SELECT 11,'K' UNION

    SELECT 12,'L' UNION

    SELECT 13,'M' UNION

    SELECT 14,'N' UNION

    SELECT 15,'O' UNION

    SELECT 16,'P' UNION

    SELECT 17,'Q' UNION

    SELECT 18,'R' UNION

    SELECT 19,'S' UNION

    SELECT 20,'T' UNION

    SELECT 21,'U' UNION

    SELECT 22,'V' UNION

    SELECT 23,'W' UNION

    SELECT 24,'X' UNION

    SELECT 25,'Y' UNION

    SELECT 26,'Z' UNION

    SELECT 27,'AA' UNION

    SELECT 28,'AB' UNION

    SELECT 29,'AC' UNION

    SELECT 30,'AD'

    --Here is where we will find duplicates

    CREATE TABLE #temp(

    tblKey int NOT NULL,

    DocID nvarchar(12) NOT NULL,

    Cnt int NOT NULL)

    INSERT INTO #temp( tblKey, DocID, Cnt)

    SELECT tblKey, DocID, 0

    FROM #MainTbl

    ORDER BY DocID, tblKey

    DECLARE @DocID nvarchar(12)

    DECLARE @NewDocID int

    DECLARE @Cnt int

    SET @DocID = ''

    SET @NewDocID = 0

    SET @Cnt = 0

    UPDATE #temp

    SET @NewDocID = CASE WHEN @DocID = DocID THEN 0 ELSE 1 END,

    @Cnt = Cnt = CASE WHEN @NewDocID = 1 THEN 1 ELSE @Cnt + 1 END,

    @DocID = DocID

    UPDATE t

    SET Cnt = 0

    FROM #temp t

    INNER JOIN (

    --Anything with a count of 1 is not duplicated

    SELECT DocID, MAX(Cnt) MaxCnt

    FROM #temp

    GROUP BY DocID

    HAVING MAX(Cnt) = 1) x ON t.DocID = x.DocID

    --Update our DocID with the correct suffix

    UPDATE m

    SET DocID = m.DocID + s.Suffix

    FROM #MainTbl m

    INNER JOIN #temp t ON m.tblKey = t.tblKey

    INNER JOIN #Suffix s ON t.Cnt = s.Cnt

    SELECT * FROM #MainTbl

    DROP TABLE #MainTbl

    DROP TABLE #temp

    DROP TABLE #Suffix

    Scott

  • Scott, this worked perfectly. Thank you and thanks to all...

    Jose

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply