Am I crazy?

Question

Post reply

Am I crazy?

hitek2000

SSC Veteran

Points: 244
More actions
May 15, 2003 at 9:34 pm

#101481

I am somewhat new to the SQL game.... Here is my problem. I have a table (hardware) that has 4 columns: site_id,site_name,hardware_type,serial_number
The serial_number column has the may contain the same serial numbers for different records/sites.
I need to get a list of records that have a duplicate serial number. I have tried multiple ways of doing this but the results are always screwed.... Any ideas?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

5409045121009-7368 SSCoach Points: 15740 More actions · Answer 1

Try something similar:

Select *

From SomeTable as t

Inner Join (Select Serial_Number

From SomeTable

Group by serial_number

Having Count(*)>1) as s

On s.Serial_Number=t.Serial_Number

Did not actually create the table!

David Burrows SSC Guru Points: 65143 More actions · Answer 2

or

select site_id,site_name,hardware_type,serial_number

from hardware

group by site_id,site_name,hardware_type,serial_number

having count(*) > 1

Far away is close at hand in the images of elsewhere.
Anon.

jpipes SSChampion Points: 12230 More actions · Answer 3

quote:
or
select site_id,site_name,hardware_type,serial_number
from hardware
group by site_id,site_name,hardware_type,serial_number
having count(*) > 1

This will just identify duplicate records. This query shouldn't actually return any records if there is a primary key defined.

David Burrows SSC Guru Points: 65143 More actions · Answer 4

Quite right jpipes, I must learn to read the question properly before engaging brain

Far away is close at hand in the images of elsewhere.
Anon.

RD Francis SSCarpal Tunnel Points: 4867 More actions · Answer 5

jpipes is correct - this would be better:

SELECT * from hardware

where serial_number in

(select serial_number from hardware

group by serial_number

having count(serial_number) > 1)

That should show every record with a given serial number. You'd probably want to sort this to show where problems are; it's not clear to me how you determine when you have a duplicate serial number.

RD Francis

PS: I think that, once optimized, this is may have the same execution plan as the first query you were given.

Edited to remove HTML tag and add PS

Edited by - rdfozz on 05/16/2003 07:55:33 AM

R David Francis

Jeremy Kemp SSCertifiable Points: 6754 More actions · Answer 6

Is the query in the previous actually that efficient?

IMHO, for each row in the outer select statement it will evaluate the subselect statement - as there in no where clause or join operator the subquery will return all rows where there is a duplicate.

My way of coding the query is:

select *

from SomeTable a

where 2 >= ( select count(*) from SomeTable b

where a.serial_number = b.serial_number)

Jeremy

bp SSCarpal Tunnel Points: 4656 More actions · Answer 7

bp

SSCarpal Tunnel

Points: 4656

May 16, 2003 at 1:35 pm

#457516

do you mean "where 2 <=" ....

NPeeters SSChampion Points: 12554 More actions · Answer 8

Jeremy, I certainly am no expert in performance tuning queries, but keep in mind that SQL Server will try to optimise stuff. Probably, the query of rdfozz will get optimised to get the same performance as using a join, which is probably the most efficient way of coding this problem if you want to get all values from a record that have a duplicate serial number.

lawrencek SSC-Addicted Points: 444 More actions · Answer 9

Production system Table With ~1000000 rows ~43000 rows returned, StopTime is part of clusterd index

Case 1

Select M1.StopHistID, M1.StopTime

From MachineHist M1

Inner Join MachineHist M2 On M1.StopTime = M2.StopTime

Group By M1.StopHistID, M1.StopTime

Having (Count(M1.StopTime) > 1)

11 - 12 seconds

Case 2

select M1.StopHistID, M1.StopTime

from MachineHist M1

where 2 <= (Select count(*) from MachineHist M2

where M1.StopTime = M2.StopTime

)

9 - 11 Seconds

Case 3

Select M1.StopHistID, M1.StopTime

From MachineHist M1

Inner Join (Select M2.StopTime

From MachineHist M2

Group by M2.StopTime

Having Count(*)>1

) M3

On M1.StopTime = M3.StopTime

6-7 Seconds

Case 4

select M1.StopHistID, M1.StopTime

from MachineHist M1

where StopTime in( Select M2.StopTime

From MachineHist M2

Group By M2.StopTime

Having Count(*)>1

)

6-7 Seconds

Cases 3 and 4 Optimized to the same plan

Intersting to me is that StopTime is the first column of a clusted index and it was not chosen as the index to use and index tuner said to add an index to StopTime field only 54% improvement.

Any Ideas on why Clustered index was not chosen?

Jeremy Kemp SSCertifiable Points: 6754 More actions · Answer 10

Interesting timings - you learn something new everyday.

The index was probably not chosen because indexes are only useful when retrieving less than ~5% of the rows. The query is a full traversal of the table and not using the index is faster than reading the index (read index/read row, read index/read row etc vs read row, read row etc).

Jeremy