Finding duplicate entries

Question

Finding duplicate entries

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply

Jeswanth Jaishanker SSCommitted Points: 1869 More actions · Answer 1

I am not sure when this was posted or wether this got solved. happend to see one thread while browsing. But I havent found any one suggesting having clause. I always use this to find the duplcate records. if the problem is still existing use thsis.

select

fld1,

fld2,

fld3

from

tbl

group by

fld1,

fld2,

fld3

having

Count(*) > 1

Jeswanth

--------------------------------

homebrew01 SSC Guru Points: 55548 More actions · Answer 2

And if you modify it slightly, adding a qty field, then you'll get a count of how many duplicate records there are for each group.

select

fld1,

fld2,

fld3, count(*) as qty

from

tbl

group by

fld1,

fld2,

fld3

having

Count(*) > 1

Ivelisse Maldonado Old Hand Points: 372 More actions · Answer 3

Ivelisse Maldonado

Old Hand

Points: 372

July 27, 2005 at 6:30 pm

#577752

Thank you, I will give it a try.

liebesiech SSCrazy Points: 2610 More actions · Answer 4

Hi

There is a nice wizard in MS Access allowing you to find duplicates in a table. After a little modification it works on SQL Server as well.

Here is what I have made:

SELECT Test.aTime AS [aTime Field], Test.avalue AS [avalue Field], Test.company AS [company Field], Count(Test.aTime) AS NumberOfDups

FROM Test

GROUP BY Test.aTime, Test.avalue, Test.company

HAVING (((Count(Test.aTime))>1) AND ((Count(Test.company))>1));

Result:

aTime Field avalue Field company Field NumberOfDups

2005-04-02 00:00:00.000bu002 ABC Company2

Give it a try and let me know if this is what you are looking for.

Rene

By the way: Have used the create and insert statement from Mike to have a table with some data in it.

dbuchan SSCrazy Points: 2328 More actions · Answer 5

Or plain vanilla:

select *, count(company) "kaching!" from test

group by aTime, aValue, Company

Having count(company) > 1 kaching

order by count(company) desc

RESULT:

aTime avalue company kaching!

------------------------ --------- -------------------- -------

2005-04-02 00:00:00.000 bu002 ABC Company 2

Cheers,

DB

The systems fine with no users loggged in. Can we keep it that way ?br>

TSQLExplorer Ten Centuries Points: 1279 More actions · Answer 6

Very interesting solution here... elegantly simple, but perfect.

If I may, I am wondering how your query could be modified to enable me to see each of the duplicate entries in order to determine visually which one to delete?

Also, I need to determine that each student in a table has only one student ID number .. .by searching the lastname, firstname and birthdate fieldls for duplicates where the student ID number is not duplicated... thus identifying a student who had managed to get two or more numbers, could I do that?

Finally, I need to find any students where more than one has the same student ID number.. for example, they have different firstname, lastname and birthdates and are obviously different kids, but the same id number.

A brain buster for me... I'm hoping you have a thought!

Thanks in advance for your help!

sushila SSC-Dedicated Points: 35293 More actions · Answer 7

Tom - this is an old post now - you'd be much better off starting a new thread with your questions - you'll get much wider readership that way since this post only comes to the notice of those who'd already participated in it and thus continue getting emails (if they haven't "unsubscribed" from it, i.e.)..

**ASCII stupid question, get a stupid ANSI !!!**

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 8

Tom, ya got lucky... as Sushila stated, most don't read down a thread this deep especially when Sushila and Mike are having a cyber reunion

Anyway, here's a self contained example of how to do just what you ask... it does require some unique row identifier, though. Obviously, the columns you're looking at for dupes don't qualify as a unique identifier or a primary key because, well, they've been duplicated

--===== This temp table will simulate your real table

IF OBJECT_ID('TempDB..#yourtable') IS NOT NULL

DROP TABLE #yourtable

CREATE TABLE #yourtable

(

ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

Date DATETIME,

Unit VARCHAR(10),

Company_Name VARCHAR(20)

)

--===== Loading the temp table with your example data

INSERT INTO #yourtable

(Date,Unit,Company_Name)

SELECT '04/02/05','bu002','ABC Company' UNION ALL

SELECT '11/30/04','bu002','ABC Company' UNION ALL

SELECT '04/02/05','bu002','ABC Company' UNION ALL

SELECT '05/12/05','bu002','ABC Company' UNION ALL

SELECT '11/30/04','bu002','ABC Company'

--===== List whole rows of all duplicates

SELECT t3.*

FROM #yourtable t3,

(--Derived "d" table finds ID's of all duplicated records

SELECT DISTINCT t1.ID

FROM #yourtable t1,

#yourtable t2

WHERE t1.Date = t2.Date

AND t1.Unit = t2.Unit

AND t1.Company_Name = t2.Company_Name

AND t1.ID <> t2.ID

)d --End derived table "d"

WHERE t3.ID = d.ID

ORDER BY t3.Company_Name,

t3.Unit,

t3.Date

I have to tell you that most would look at this and think that it's incredibly slow... not so. It'll find 400 dupes in a million row table in about a minute on a lowly 1.8Ghz single CPU box.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 9

An alternative, which may or may not be faster, I don't know. But it doesn't require a unique id.

SELECT distinct t.date, t.Unit, t.Company_Name /* or simply select t.* */

FROM #yourtable t inner join

(--Derived "d" table finds all duplicated records

SELECT Date, Unit, Company_Name

from #yourtable

group by Date, Unit, Company_Name

having count(*) > 1

) d

on t.Date = d.Date

AND t.Unit = d.Unit

AND t.Company_Name = d.Company_Name

ORDER BY t.Company_Name,

t.Unit,

t.Date

sushila SSC-Dedicated Points: 35293 More actions · Answer 10

Jeff - don't forget to tell Tom that he also managed to benefit from your expertise at a time when you were more relaxed than usual and not biting peoples' heads off...don't know about vladan but you sure scared me....

Just kidding Jeff...couldn't resist it!

**ASCII stupid question, get a stupid ANSI !!!**

TSQLExplorer Ten Centuries Points: 1279 More actions · Answer 11

No Problem ... my head is very large... so beware the choking hazard!!!!

LOL

TSQLExplorer Ten Centuries Points: 1279 More actions · Answer 12

This seems to work exactly right when I ran your test, but when I tried plugging my table/field names in ... I ran into trouble...

My field names are Permnum,Lastname,Firstname,Birthdate

Table name is "ASTUALL"

Here is my modification of your code..

SELECT *

FROM #astuall t3,

(--Derived "d" table finds ID's of all duplicated records

SELECT DISTINCT t1.ID

FROM #astuall t1,

#astuall t2

WHERE t1.Firstname = t2.Firstname

AND t1.Lastname = t2.Lastname

AND t1.Birthdate = t2.Birthdate

AND t1.Permnum <> t2.Permnum

)d --End derived table "d"

WHERE t3.ID = d.ID

ORDER BY t3.Permnum,

t3.Lastname,

t3.Firstname

And the error message I got..

(5 row(s) affected)

Server: Msg 208, Level 16, State 1, Line 23

Invalid object name '#astuall'.

Server: Msg 208, Level 16, State 1, Line 23

Invalid object name '#astuall'.

Server: Msg 208, Level 16, State 1, Line 23

Invalid object name '#astuall'.

thanks!

PW-201837 SSC-Insane Points: 20805 More actions · Answer 13

Isn't this the exact same question as on your other thread ? Can we let this old thread rest in peace now, and continue the discusssion on your new thread ?