May 15, 2003 at 9:34 pm
I am somewhat new to the SQL game.... Here is my problem. I have a table (hardware) that has 4 columns: site_id,site_name,hardware_type,serial_number
The serial_number column has the may contain the same serial numbers for different records/sites.
I need to get a list of records that have a duplicate serial number. I have tried multiple ways of doing this but the results are always screwed.... Any ideas?
May 15, 2003 at 10:01 pm
Try something similar:
Select *
From SomeTable as t
Inner Join (Select Serial_Number
From SomeTable
Group by serial_number
Having Count(*)>1) as s
On s.Serial_Number=t.Serial_Number
Did not actually create the table!
May 16, 2003 at 2:07 am
or
select site_id,site_name,hardware_type,serial_number
from hardware
group by site_id,site_name,hardware_type,serial_number
having count(*) > 1
Far away is close at hand in the images of elsewhere.
Anon.
May 16, 2003 at 7:21 am
quote:
orselect site_id,site_name,hardware_type,serial_number
from hardware
group by site_id,site_name,hardware_type,serial_number
having count(*) > 1
This will just identify duplicate records. This query shouldn't actually return any records if there is a primary key defined.
May 16, 2003 at 7:27 am
Quite right jpipes, I must learn to read the question properly before engaging brain
Far away is close at hand in the images of elsewhere.
Anon.
May 16, 2003 at 7:52 am
jpipes is correct - this would be better:
SELECT * from hardware
where serial_number in
(select serial_number from hardware
group by serial_number
having count(serial_number) > 1)
That should show every record with a given serial number. You'd probably want to sort this to show where problems are; it's not clear to me how you determine when you have a duplicate serial number.
RD Francis
PS: I think that, once optimized, this is may have the same execution plan as the first query you were given.
Edited to remove HTML tag and add PS
Edited by - rdfozz on 05/16/2003 07:55:33 AM
R David Francis
May 16, 2003 at 8:56 am
Is the query in the previous actually that efficient?
IMHO, for each row in the outer select statement it will evaluate the subselect statement - as there in no where clause or join operator the subquery will return all rows where there is a duplicate.
My way of coding the query is:
select *
from SomeTable a
where 2 >= ( select count(*) from SomeTable b
where a.serial_number = b.serial_number)
Jeremy
May 16, 2003 at 1:35 pm
do you mean "where 2 <=" ....
May 17, 2003 at 9:38 am
Jeremy, I certainly am no expert in performance tuning queries, but keep in mind that SQL Server will try to optimise stuff. Probably, the query of rdfozz will get optimised to get the same performance as using a join, which is probably the most efficient way of coding this problem if you want to get all values from a record that have a duplicate serial number.
May 19, 2003 at 1:51 pm
Production system Table With ~1000000 rows ~43000 rows returned, StopTime is part of clusterd index
Case 1
Select M1.StopHistID, M1.StopTime
From MachineHist M1
Inner Join MachineHist M2 On M1.StopTime = M2.StopTime
Group By M1.StopHistID, M1.StopTime
Having (Count(M1.StopTime) > 1)
11 - 12 seconds
Case 2
select M1.StopHistID, M1.StopTime
from MachineHist M1
where 2 <= (Select count(*) from MachineHist M2
where M1.StopTime = M2.StopTime
)
9 - 11 Seconds
Case 3
Select M1.StopHistID, M1.StopTime
From MachineHist M1
Inner Join (Select M2.StopTime
From MachineHist M2
Group by M2.StopTime
Having Count(*)>1
) M3
On M1.StopTime = M3.StopTime
6-7 Seconds
Case 4
select M1.StopHistID, M1.StopTime
from MachineHist M1
where StopTime in( Select M2.StopTime
From MachineHist M2
Group By M2.StopTime
Having Count(*)>1
)
6-7 Seconds
Cases 3 and 4 Optimized to the same plan
Intersting to me is that StopTime is the first column of a clusted index and it was not chosen as the index to use and index tuner said to add an index to StopTime field only 54% improvement.
Any Ideas on why Clustered index was not chosen?
May 20, 2003 at 1:12 am
Interesting timings - you learn something new everyday.
The index was probably not chosen because indexes are only useful when retrieving less than ~5% of the rows. The query is a full traversal of the table and not using the index is faster than reading the index (read index/read row, read index/read row etc vs read row, read row etc).
Jeremy
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply