October 19, 2004 at 10:07 am
Hello,
I have an app where I import txt (tab delimited) file(s) into tables. To avoid duplicates I have a constraint of 5 key fields. This works perfectly. But I need to identify duplicates and store them in a seperate table for auditing. To identify duplicates I do 2 things:
1. I check the file (stored in a temporary table _A) against the existing table A using the 5 key fields. this works to identify if the data is already in ths system. This is the first thing that is done.
2. I check _A to see if it has any duplicates contained in it. I use the following:
INSERT INTO _tblDupes ( ....)
SELECT * FROM _A a
WHERE EXISTS (
SELECT NULL FROM _A b
WHERE
b.[fld_1] = a.[fld_1]
AND b.[fld_2] = a.[fld_2]
AND b.[fld_3] = a.[fld_3]
AND b.[fld_4] = a.[fld_4]
GROUP BY
b.[fld_1], b.[fld_2],
b.[fld_3], b.[fld_4]
HAVING
COUNT(b.[ID]) > 1
)
This also works perfectly. BUT the problem is, when I do an insert into A from _A it will take the 1st row and disregard the 2nd row as a duplicate because of the constraint. Where as the above query inserts both rows in _A in the _tbleDupes table. So when I go to sum up all the data I recieved from the file then the dupes are getting counted 2 or even 3 times. So what I want is:
_A A _tblDupes
ab ab
ac ac
ac ac
ad ad
ad ad
right now I am getting:
_A A _tblDupes
ab ab
ac ac ac
ac ac
ad ad ad
ad ad
any help will be greatly appreciated.
Thank you
October 20, 2004 at 1:59 am
You can use the DISTINCT keywords like
INSERT INTO _tblDupes ( ....)
SELECT DISTINCT * FROM _A a
WHERE EXISTS (
SELECT NULL FROM _A b
WHERE
b.[fld_1] = a.[fld_1]
AND b.[fld_2] = a.[fld_2]
AND b.[fld_3] = a.[fld_3]
AND b.[fld_4] = a.[fld_4]
GROUP BY
b.[fld_1], b.[fld_2],
b.[fld_3], b.[fld_4]
HAVING
COUNT(b.[ID]) > 1
)
Patrick Duflot
October 20, 2004 at 10:56 am
I have tried the DISTINCT select but to no avail. I thought that maybe because the duplicate rows dont have anything else common outside of the fields in the constraint but taht is not the case. I have 1 instance where every single field is identical and it still selected both. I have no idea how to fix this!!!
October 20, 2004 at 4:34 pm
"Where exists (Select NULL From....)" should probably be changed to "Where Exists (Select 1 From...)", but I'm not sure that will change anything.
I think your problem has to do with getting the unique row number for duplicates. This can be done this way:
Insert _tblDupes
select a.UniqueID
From _A a
JOIN _A b on
b.[fld_1] = a.[fld_1]
AND b.[fld_2] = a.[fld_2]
AND b.[fld_3] = a.[fld_3]
AND b.[fld_4] = a.[fld_4]
where a.UniqueID < b.UniqueID
Update d
set Field0 = a.Field0,
Field1 = a.Field1
From _tblDupes d
JOIN _A a on d.UniqueID = a.UniqueID
If you don't have "UniqueID" then you can add it to the table as a identity column.
Signature is NULL
October 21, 2004 at 11:27 am
I have an idea on this. seems to work in my 3 three test cases. 1st I create a view X which selects the constraint fields and group them along wi COUNT(ID) > 1. This gives me the records that will show up as dupes. then I do a view Y where i join _A and X on the mobileID and select top 1 w/ id in desc order. this will give me the 2nd record. this assumes that the 1st one got in. But if there is more then 2 duplicate records then this will not work. any thoughts on how i can fix that?
Let me know if this sounds like a good work around (sort of)?
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply