Am I crazy?

  • I am somewhat new to the SQL game.... Here is my problem. I have a table (hardware) that has 4 columns: site_id,site_name,hardware_type,serial_number

    The serial_number column has the may contain the same serial numbers for different records/sites.

    I need to get a list of records that have a duplicate serial number. I have tried multiple ways of doing this but the results are always screwed.... Any ideas?

  • Try something similar:

    Select *

    From SomeTable as t

    Inner Join (Select Serial_Number

    From SomeTable

    Group by serial_number

    Having Count(*)>1) as s

    On s.Serial_Number=t.Serial_Number

    Did not actually create the table!

  • or

    select site_id,site_name,hardware_type,serial_number

    from hardware

    group by site_id,site_name,hardware_type,serial_number

    having count(*) > 1

    Far away is close at hand in the images of elsewhere.
    Anon.

  • quote:


    or

    select site_id,site_name,hardware_type,serial_number

    from hardware

    group by site_id,site_name,hardware_type,serial_number

    having count(*) > 1


    This will just identify duplicate records. This query shouldn't actually return any records if there is a primary key defined.

  • Quite right jpipes, I must learn to read the question properly before engaging brain

    Far away is close at hand in the images of elsewhere.
    Anon.

  • jpipes is correct - this would be better:

    SELECT * from hardware

    where serial_number in

    (select serial_number from hardware

    group by serial_number

    having count(serial_number) > 1)

    That should show every record with a given serial number. You'd probably want to sort this to show where problems are; it's not clear to me how you determine when you have a duplicate serial number.

    RD Francis

    PS: I think that, once optimized, this is may have the same execution plan as the first query you were given.

    Edited to remove HTML tag and add PS

    Edited by - rdfozz on 05/16/2003 07:55:33 AM


    R David Francis

  • Is the query in the previous actually that efficient?

    IMHO, for each row in the outer select statement it will evaluate the subselect statement - as there in no where clause or join operator the subquery will return all rows where there is a duplicate.

    My way of coding the query is:

    select *

    from SomeTable a

    where 2 >= ( select count(*) from SomeTable b

    where a.serial_number = b.serial_number)

    Jeremy

  • do you mean "where 2 <=" ....

  • Jeremy, I certainly am no expert in performance tuning queries, but keep in mind that SQL Server will try to optimise stuff. Probably, the query of rdfozz will get optimised to get the same performance as using a join, which is probably the most efficient way of coding this problem if you want to get all values from a record that have a duplicate serial number.

  • Production system Table With ~1000000 rows ~43000 rows returned, StopTime is part of clusterd index

    Case 1

    Select M1.StopHistID, M1.StopTime

    From MachineHist M1

    Inner Join MachineHist M2 On M1.StopTime = M2.StopTime

    Group By M1.StopHistID, M1.StopTime

    Having (Count(M1.StopTime) > 1)

    11 - 12 seconds

    Case 2

    select M1.StopHistID, M1.StopTime

    from MachineHist M1

    where 2 <= (Select count(*) from MachineHist M2

    where M1.StopTime = M2.StopTime

    )

    9 - 11 Seconds

    Case 3

    Select M1.StopHistID, M1.StopTime

    From MachineHist M1

    Inner Join (Select M2.StopTime

    From MachineHist M2

    Group by M2.StopTime

    Having Count(*)>1

    ) M3

    On M1.StopTime = M3.StopTime

    6-7 Seconds

    Case 4

    select M1.StopHistID, M1.StopTime

    from MachineHist M1

    where StopTime in( Select M2.StopTime

    From MachineHist M2

    Group By M2.StopTime

    Having Count(*)>1

    )

    6-7 Seconds

    Cases 3 and 4 Optimized to the same plan

    Intersting to me is that StopTime is the first column of a clusted index and it was not chosen as the index to use and index tuner said to add an index to StopTime field only 54% improvement.

    Any Ideas on why Clustered index was not chosen?

  • Interesting timings - you learn something new everyday.

    The index was probably not chosen because indexes are only useful when retrieving less than ~5% of the rows. The query is a full traversal of the table and not using the index is faster than reading the index (read index/read row, read index/read row etc vs read row, read row etc).

    Jeremy

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply