SQL to return only rows that aren't distinct

Question

SQL to return only rows that aren't distinct

Sean Grebey-262535

SSCommitted

Points: 1724
More actions
September 1, 2011 at 2:15 pm

#247368

Anybody know an easy way to do this. I have some rows that aren't distinct by the key I'm using, and I want to return just those so I can figure out how I can differentiate between them.
Thanks.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

It'll depend on the structure of the table(s) involved.

Usually, you can identify the rows by some key column(s), by grouping by those and selecting the ones that have a Count(*) that's higher than 1. Then you can join to that to get the full data of each row.

If you post table definitions, I can get more specific. If you need that.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Lowell SSC Guru Points: 323518 More actions · Answer 2

by are not distinct you mean duplicates?

you can use row_number() function or group by...having count(*) > 1

SELECT * FROM

(select row_number() over (partition by ColumnList Order By ColumnList) AS RW,

ColumnList FROM YourTable

)

WHERE RW > 1

--or

SELECT ColumnList

From YourTable

Group By ColumnList

HAVING count(*) > 1

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Sean Grebey-262535 SSCommitted Points: 1724 More actions · Answer 3

Sean Grebey-262535

SSCommitted

Points: 1724

September 1, 2011 at 2:22 pm

#1376179

Thank you guys

R.P.Rozema SSChampion Points: 12317 More actions · Answer 4

Instead of doing a where count(*) > 1 it is often faster to check for duplicates by looking for any rows for which you can find at least one other row with the same characteristics. For this to work you first need something to identify each row uniquely by. If your table does not have a unique key (yet), you can add an artificial one by using the row_number() ranking function.

In practice the trick is to create a common table expression (cte) that assigns each row it's unique number (by use of row_number()), then do a self-join on that cte to find any rows that have the same values but do not have the same row number.

with cte as (

select row_number() over (order by (select 1)) as nr,

t.col1,

t.col2,

from dbo.yourtable t

)

select t1.*

from cte t1

where exists (

select top (1) *

from cte t2

where t2.col1 = t1.col1

and t2.col2 = t1.col2

and not t2.nr = t1.nr

)

Note: the top (1) is not required for proper functionality. I've just found that this sometimes speeds up the use of exists().

Note: if your intend is to remove all but one of the duplicate rows, then row_number() is your friend too: Have row_number() generate a new range from 1 up to the number of duplicates by using the "partition by"-clause on it and then simply delete from the cte all rows that have a row_number() higher than 1.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?