Select DISTINCT on multiple columns is not returning distinct rows for very large table??

  • Hi, I have the following script segment which is failing:

    CREATE

    TABLE #LatLong (Latitude DECIMAL, Longitude DECIMAL, PRIMARY KEY (Latitude, Longitude))

    INSERT

    INTO #LatLong SELECT DISTINCT Latitude, Longitude FROM RGCcache

     

    When I run it I get the following error: "Violation of PRIMARY KEY constraint 'PK__#LatLong__________7CE3D9D4'. Cannot insert duplicate key in object 'dbo.#LatLong'."

     

    Im not sure how this is failing as when I try creating another table with 2 decimal columns and repeated values, select distinct only returns distinct pairs of values.

    The failure may be related to the fact that RGCcache has about 10 million rows, but I can't see why.

    Any ideas?

  • Oh I forgot to mention... RGCcache has about 10 rows of varying types, but I can't see this as being relevent.

  • DISTINCT works as expected only with a single column for multiple columns it checks if the row is distinct instead of checking the first column alone that as you expect its the row should be distinct not the first column alone.

    Cheers,
    Sugeshkumar Rajendran
    SQL Server MVP
    http://sugeshkr.blogspot.com

  • Hi, could you explain again please? I can't quite understand what you are saying.

     

    I expect DISTINCT to return rows that are unique... my question is why is this not happening?

     

    You seem to be agreeing that DISTINCT should return unique rows, so why am I getting the error?

  • That is indeed a bit strange.  Only thing I can think is that you have a higher precision on the underlying table and it is truncating some of those values on the insert, therefore making the select by itself unique, but not when they are inserted.

    You could try to do a select .. into instead of an insert, then try to put a PK on the resulting table instead.

     

  • Hi Dan

     

    Try running the following:

    SELECT Latitude, Longitude, count(*) as nRows 

    FROM RGCcache

    Group by Latitude, Longitude

    HAVING count(*) > 1

     

    Check that the datatype of lat and long is exactly the same between your temp table and RGCcache: it's possible that your problem is caused by rounding.

    You could create the temp table like this and add the pk afterwards:

    SELECT Latitude, Longitude  

    INTO #LatLong

    FROM RGCcache

    WHERE 0 = 1

     

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • CREATE TABLE #LatLong (Latitude DECIMAL, Longitude DECIMAL, PRIMARY KEY (Latitude, Longitude))

     

    Where are the decimals? Does the statement above not convert into INTs? And you are DISTINCTing by decimals from source.

     

    Something like this

    CREATE TABLE #LatLong (Latitude DECIMAL(10, 7), Longitude DECIMAL(10, 7), PRIMARY KEY (Latitude, Longitude))

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Peter is right on the spot.

    The DEFAULT values for precision and scale of a DECIMAL datatype are: (p,s) = (18,0)

    therefore if you don't specify precision and scale, at insert time your values will be rounded and you can end up with duplicates.

    Cheers,


    * Noel

  • It would be nice to know if the information was helpful or not...

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Hi I'm really sorry about not replying sooner - I forgot this thread!

     

    It was indeed a precision issue - the SELECT was returning a greater precision than the table I was inserting to, which of course meant truncation and therefore duplicates.

     

    Thanks to everyone who contributed and my apologies again.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply