September 9, 2008 at 6:13 pm
Bellow is some sample data of the table. Somehow I need to loop through and identify each row containing the duplicate value and add a letter to it.
Note: column 5 in rows 3 and 4 is duplicate
The next six rows are also duplicates, and the following two.
I cannot just delete the duplicates because they are not really duplicates.
So, what I need to do is add a suffix to each duplicate value.
For example, rows 3 and 4 should be ZP19188AVHA and ZP19188AVHB and so on with all duplicates
Any idea how I can accomplish this task???
Thank you.
Col1 Col2 Col3 Col4 Col5 Col6
P19188AMR20685210BZP19188AMRHJ
P19188AVE20685210BZP19188AVEHJ
P19188AVH20685210BZP19188AVH *HJ
P19188AVH21091210BZP19188AVH *HJ
P19188COC206143151ZP19188COC *HJ
P19188COC206343151ZP19188COC *HJ
P19188COC206853151ZP19188COC *HJ
P19188COC206863151ZP19188COC *HJ
P19188COC210913151ZP19188COC *HJ
P19188COC211013151ZP19188COC *HJ
P19188PHM20685210BZP19188PHM *HJ
P19188PHM206856101ZP19188PHM *HJ
P19188RIM20685210BZP19188RIMHJ
P19188SAB20685210BZP19188SABHJ
P19188WEH20685210BZP19188WEHHJ
September 9, 2008 at 7:31 pm
Hi josetur12
I cannot just delete the duplicates because they are not really duplicates.
So, what I need to do is add a suffix to each duplicate value.
you said that these records are not really duplicates, then on what factors you are identifying the records to be duplicates. From data, i can say that all columns except col3. Please correct me if i am wrong?
Another question why do you want to append alphabets A-Z at the end of the duplicates. What if there are more than 26 duplicate rows, then you'll run out of alphabets A-Z. What will you append after that.
September 9, 2008 at 7:41 pm
To assist others who desire to help you please read the article in my signature block and post your question accordingly, that is the table schema, test/sample data as an insert into with a union statement etc.
.
September 10, 2008 at 10:02 am
Thanks for looking into this issue..
Note: I added more info on table schema and sample data in a different format.
USE [TestDB]
GO
/****** Object: Table [dbo].[tblZGAE3HJ] Script Date: 09/10/2008 08:13:16 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tblZGAE3HJ](
[OE] [nvarchar](12) NULL,
[VendorCode] [nvarchar](3) NULL,
[UC] [nvarchar](5) NULL,
[LastOfDlvArea] [nvarchar](10) NULL,
[DeptOBJ] [nvarchar](4) NULL,
[DocID] [nvarchar](12) NULL,
[FirstOfDeptCode] [nvarchar](6) NULL,
[FirstOfCaps] [nvarchar](10) NULL
) ON [PRIMARY]
SELECT 'SELECT '
+ QUOTENAME([OE],'''')+','
+ QUOTENAME([VendorCode],'''')+','
+ QUOTENAME([UC],'''')+','
+ QUOTENAME([LastOfDlvArea],'''')+','
+ QUOTENAME([DeptOBJ],'''')
+ QUOTENAME([DocID],'''')+','
+ QUOTENAME([FirstOfDeptCode],'''')+','
+ ' UNION ALL' FROM tblZGAE3HJ
SELECT 'P19188','AMR','20685','CPSCR','210B''ZP19188AMR','HJ' UNION ALL
SELECT 'P19188','AVE','20685','CPMNZ','210B''ZP19188AVE','HJ' UNION ALL
SELECT 'P19188','AVH','20685','CPMNZ','210B''ZP19188AVH','HJ' UNION ALL
SELECT 'P19188','AVH','21091','CHG1','210B''ZP19188AVH','HJ', UNION ALL
SELECT 'P19188','COC','20614','D14','3151''ZP19188COC','HJ', UNION ALL
SELECT 'P19188','COC','20634','PROBM','3151''ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','20685','CMGON','3151''ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','20686','MIRLO','3151''ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','21091','CHG1','3151''ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','21101','SFN1','3151''ZP19188COC','HJ, UNION ALL
SELECT 'P19188','PHM','20685','CPMNZ','210B''ZP19188PHM','HJ' UNION ALL
SELECT 'P19188','PHM','20685','CPROC','6101''ZP19188PHM','HJ' UNION ALL
SELECT 'P19188','RIM','20685','CPMNZ','210B''ZP19188RIM','HJ' UNION ALL
SELECT 'P19188','SAB','20685','CPROC','210B''ZP19188SAB','HJ' UNION ALL
SELECT 'P19188','WEH','20685','CPML','210B''ZP19188WEH','HJ' UNION ALL
What I mean is that in one column I have some duplicate values to that column only, the complete row is not a duplicate.
your second question is what if there are more than 26 duplicates, I was told there will never be more than 26 but, ideally, I would like to do something so if later there are more than 26 duplicates there will not be a probelm. I was thinking on adding another table that like this..
and some how use it .
Line#Suffix
1A
2B
3C
4D
5E
6F
7G
8H
9I
10J
11K
12L
13M
14N
15O
16P
17Q
18R
19S
20T
21U
22V
23W
24X
25Y
26Z
27AA
28AB
September 10, 2008 at 3:37 pm
For my own self editication I created a table and loaded your data and then developed a CTE to identify all the duplicate entries in what was your Column 5. By the way I changed what was in Column 6 to assist me in debugging my CTE, basically I numbered then to correspond with the rows you mentioned in your original post.
Now so far here is what I have done.
CREATE Table #TableX
(Col1 VARCHAR(15),
Col2 VARCHAR(15),
Col3 VARCHAR(15),
Col4 VARCHAR(15),
Col5 VARCHAR(15),
Col6 VARCHAR(2)
)
INSERT INTO #Tablex(Col1,Col2,Col3,Col4,Col5,Col6)
SELECT 'P19188','AMR','20685','210B','ZP19188AMR','aa' UNION ALL
SELECT 'P19188','AVE','20685','210B','ZP19188AVE','02' UNION ALL
SELECT 'P19188','AVH','20685','210B','ZP19188AVH','03' UNION ALL
SELECT 'P19188','AVH','21091','210B','ZP19188AVH','04' UNION ALL
SELECT 'P19188','COC','20614','3151','ZP19188COC','05' UNION ALL
SELECT 'P19188','COC','20634','3151','ZP19188COC','06' UNION ALL
SELECT 'P19188','COC','20685','3151','ZP19188COC','07' UNION ALL
SELECT 'P19188','COC','20686','3151','ZP19188COC','08' UNION ALL
SELECT 'P19188','COC','21091','3151','ZP19188COC','09' UNION ALL
SELECT 'P19188','COC','21101','3151','ZP19188COC','10' UNION ALL
SELECT 'P19188','PHM','20685','210B','ZP19188PHM','11' UNION ALL
SELECT 'P19188','PHM','20685','6101','ZP19188PHM','12' UNION ALL
SELECT 'P19188','RIM','20685','210B','ZP19188RIM','13' UNION ALL
SELECT 'P19188','SAB','20685','210B','ZP19188SAB','14' UNION ALL
SELECT 'P19188','WEH','20685','210B','ZP19188WEH','15'
That takes care of the test data. Here is the CTE
;with numbered as(SELECT rowno=row_number() over
(Partition by COL5, COL2 order by COL5),COL1,COL2, COL3,COL4,COL5, COL6 FROM #TableX)
select * from numbered
Results of the select * from the CTE
Note that the first row of a entry has rowno value = 1,
For duplicates each duplicate entry has a rowno value greater than 1. (Have highlighted those as bold)
rownoCol5 Col6
1ZP19188AMRaa
1ZP19188AVE02
1ZP19188AVH03
2 ZP19188AVH04
1ZP19188COC05
2ZP19188COC06
3ZP19188COC07
4ZP19188COC08
5ZP19188COC09
6ZP19188COC10
1ZP19188PHM11
2ZP19188PHM12
1ZP19188RIM13
1ZP19188SAB14
1ZP19188WEH15
Now having identified the duplicates do you think you can compose the code to append a character to the value in column 6 for all rows with a rowno value greater than 1.
What I do not see in either your table's schema as posted or your original data a unique value (say an identity field) that will allow for the identification of a specific row in the table, thus allowing for the creation of an update statement. Would it be possible to add an identity field to your table named tblZGAE3HJ ?
If not repost and ask for further assistance.
September 11, 2008 at 8:35 am
declare @Alpha char(26)
select @alpha = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
select 1
while @@rowcount > 1
;with Dupes (Row, Col5) as
(select row_number() over (partition by col5 order by col5),
col5
from dbo.MyTable)
update Dupes
set Col5 = Col5 + substring(@alpha, row-1, 1)
where row between 2 and 27
That should do what you need.
If you have more than 26 copies of anything, you'll need to create a table of the append values and join to that. You can find out the largest number of duplicates like this:
select top 1 count(*)
from dbo.MyTable
group by Col5
order by count(*) desc
Does that help?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 14, 2008 at 11:22 pm
You'll have to go match by match or else use cursors
First: select all rows with same column values
Hold a suffix variable
update each row and increment suffix when updating
September 16, 2008 at 6:29 am
I Believe this will do what you are wanting to do. It makes use of the Running Total (or whatever it is called) pattern to identify the proper suffix to append. (BTW, I added an identity column to the Main Table. I hope your real table has one.)
--This is your existing table
CREATE TABLE #MainTbl(
tblKey int identity(1,1) NOT NULL,
[OE] [nvarchar](12) NULL,
[VendorCode] [nvarchar](3) NULL,
[UC] [nvarchar](5) NULL,
[LastOfDlvArea] [nvarchar](10) NULL,
[DeptOBJ] [nvarchar](4) NULL,
[DocID] [nvarchar](12) NULL,
[FirstOfDeptCode] [nvarchar](6) NULL,
[FirstOfCaps] [nvarchar](10) NULL) ON [PRIMARY]
INSERT INTO #MainTbl (OE, VendorCode, UC, LastOfDlvArea, DeptOBJ, DocID, FirstOfDeptCode)
SELECT 'P19188','AMR','20685','CPSCR','210B','ZP19188AMR','HJ' UNION ALL
SELECT 'P19188','AVE','20685','CPMNZ','210B','ZP19188AVE','HJ' UNION ALL
SELECT 'P19188','AVH','20685','CPMNZ','210B','ZP19188AVH','HJ' UNION ALL
SELECT 'P19188','AVH','21091','CHG1' ,'210B','ZP19188AVH','HJ' UNION ALL
SELECT 'P19188','COC','20614','D14' ,'3151','ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','20634','PROBM','3151','ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','20685','CMGON','3151','ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','20686','MIRLO','3151','ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','21091','CHG1' ,'3151','ZP19188COC','HJ' UNION ALL
SELECT 'P19188','COC','21101','SFN1' ,'3151','ZP19188COC','HJ' UNION ALL
SELECT 'P19188','PHM','20685','CPMNZ','210B','ZP19188PHM','HJ' UNION ALL
SELECT 'P19188','PHM','20685','CPROC','6101','ZP19188PHM','HJ' UNION ALL
SELECT 'P19188','RIM','20685','CPMNZ','210B','ZP19188RIM','HJ' UNION ALL
SELECT 'P19188','SAB','20685','CPROC','210B','ZP19188SAB','HJ' UNION ALL
SELECT 'P19188','WEH','20685','CPML' ,'210B','ZP19188WEH','HJ'
--This is a suffix table that you can create
CREATE TABLE #Suffix(Cnt int NOT NULL,
Suffix nvarchar(2) NOT NULL,
PRIMARY KEY (Cnt) )
INSERT INTO #Suffix(Cnt, Suffix)
SELECT 1,'A' UNION
SELECT 2,'B' UNION
SELECT 3,'C' UNION
SELECT 4,'D' UNION
SELECT 5,'E' UNION
SELECT 6,'F' UNION
SELECT 7,'G' UNION
SELECT 8,'H' UNION
SELECT 9,'I' UNION
SELECT 10,'J' UNION
SELECT 11,'K' UNION
SELECT 12,'L' UNION
SELECT 13,'M' UNION
SELECT 14,'N' UNION
SELECT 15,'O' UNION
SELECT 16,'P' UNION
SELECT 17,'Q' UNION
SELECT 18,'R' UNION
SELECT 19,'S' UNION
SELECT 20,'T' UNION
SELECT 21,'U' UNION
SELECT 22,'V' UNION
SELECT 23,'W' UNION
SELECT 24,'X' UNION
SELECT 25,'Y' UNION
SELECT 26,'Z' UNION
SELECT 27,'AA' UNION
SELECT 28,'AB' UNION
SELECT 29,'AC' UNION
SELECT 30,'AD'
--Here is where we will find duplicates
CREATE TABLE #temp(
tblKey int NOT NULL,
DocID nvarchar(12) NOT NULL,
Cnt int NOT NULL)
INSERT INTO #temp( tblKey, DocID, Cnt)
SELECT tblKey, DocID, 0
FROM #MainTbl
ORDER BY DocID, tblKey
DECLARE @DocID nvarchar(12)
DECLARE @NewDocID int
DECLARE @Cnt int
SET @DocID = ''
SET @NewDocID = 0
SET @Cnt = 0
UPDATE #temp
SET @NewDocID = CASE WHEN @DocID = DocID THEN 0 ELSE 1 END,
@Cnt = Cnt = CASE WHEN @NewDocID = 1 THEN 1 ELSE @Cnt + 1 END,
@DocID = DocID
UPDATE t
SET Cnt = 0
FROM #temp t
INNER JOIN (
--Anything with a count of 1 is not duplicated
SELECT DocID, MAX(Cnt) MaxCnt
FROM #temp
GROUP BY DocID
HAVING MAX(Cnt) = 1) x ON t.DocID = x.DocID
--Update our DocID with the correct suffix
UPDATE m
SET DocID = m.DocID + s.Suffix
FROM #MainTbl m
INNER JOIN #temp t ON m.tblKey = t.tblKey
INNER JOIN #Suffix s ON t.Cnt = s.Cnt
SELECT * FROM #MainTbl
DROP TABLE #MainTbl
DROP TABLE #temp
DROP TABLE #Suffix
Scott
September 16, 2008 at 12:37 pm
Scott, this worked perfectly. Thank you and thanks to all...
Jose
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply