Distinct

  • Hi, I've recently moved a site to MSSQL server from Access and I've hit a few problems as it implements SQL differently. I've had a lot of duplicate data on the site which I have partly remedied by using SELECT DISTINCT - though I don't know how to use this in conjunction with SELECT * to cure other parts of my site that return duplicate data. Does anyone know how to do this? Thanks!

  • Firstly, don't use select *. It's lazy coding. Get into the habit of explicitly naming the columns in your query, and only returning the columns that you need.

    Second, don't use distinct unless you absolutely need it. It's slow and expensive performance wise

    Third, if you have duplicate data in your database, the best thing to do is remove the duplicates, not ignore them and work around the problem.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Getting "Dupe" was a bad design to begin with.

    Removing Dupes(or De-Duping as I call it) would require logic on which one survives. The most current(Using datetime field, or Max of Identity Column) could be the survivor. If you don't have those inster them into a temporary table with identity and pick the First or last one and load the "De-Duped" data into the main table. Then Donot forget to put an Unique Index.

    Thanks

    Sreejith

  • Hi

    Gila is right - get rid of the duplicate data, then modify your table design to prevent it happening again.

    One of the easier ways of determining duplicated records, as it sounds as though you are familiar with Access, if to use the duplicates query wizard.  Use this to build yourself a table on which you can base you deletes.  If you don't already have them, you will need to an ODBC connection and link the tables concerned.

    Cheers - all the best.

    Rowan

  • All of the above assumes that you have no related tables with foreign keys referencing one or more of the duplicates.

    If everything in a row is duplicated in one or more other rows, then you likely do not have this problem.  If, instead, you have implemented some sort of "designed" primary key and it is the only thing that is not duplicated (i.e., your meaning of the term "duplicate" is "logicl/business duplicate"), then you have more work ahead of you.

    For example, if you have three contact records:

    id ContactName Phone gender other_data_also_duplicated

    1 John Doe 123-456-7890 M moredupdupdup ...

    2 John Doe 123-456-7890 M moredupdupdup ...

    3 John Doe 123-456-7890 M moredupdupdup ...

    And there are other rows in other tables referencing these, you would need to:

    (1) pick which row above is the "winner"

    (2) update any foreign keys which reference the "lsers" so that they reference the "winner"

    (3) then, delete the "losers"

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply