September 1, 2011 at 2:15 pm
Anybody know an easy way to do this. I have some rows that aren't distinct by the key I'm using, and I want to return just those so I can figure out how I can differentiate between them.
Thanks.
September 1, 2011 at 2:17 pm
It'll depend on the structure of the table(s) involved.
Usually, you can identify the rows by some key column(s), by grouping by those and selecting the ones that have a Count(*) that's higher than 1. Then you can join to that to get the full data of each row.
If you post table definitions, I can get more specific. If you need that.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 1, 2011 at 2:18 pm
by are not distinct you mean duplicates?
you can use row_number() function or group by...having count(*) > 1
SELECT * FROM
(select row_number() over (partition by ColumnList Order By ColumnList) AS RW,
ColumnList FROM YourTable
)
WHERE RW > 1
--or
SELECT ColumnList
From YourTable
Group By ColumnList
HAVING count(*) > 1
Lowell
September 1, 2011 at 2:22 pm
Thank you guys
September 2, 2011 at 11:56 am
Instead of doing a where count(*) > 1 it is often faster to check for duplicates by looking for any rows for which you can find at least one other row with the same characteristics. For this to work you first need something to identify each row uniquely by. If your table does not have a unique key (yet), you can add an artificial one by using the row_number() ranking function.
In practice the trick is to create a common table expression (cte) that assigns each row it's unique number (by use of row_number()), then do a self-join on that cte to find any rows that have the same values but do not have the same row number.
with cte as (
select row_number() over (order by (select 1)) as nr,
t.col1,
t.col2,
from dbo.yourtable t
)
select t1.*
from cte t1
where exists (
select top (1) *
from cte t2
where t2.col1 = t1.col1
and t2.col2 = t1.col2
and not t2.nr = t1.nr
)
Note: the top (1) is not required for proper functionality. I've just found that this sometimes speeds up the use of exists().
Note: if your intend is to remove all but one of the duplicate rows, then row_number() is your friend too: Have row_number() generate a new range from 1 up to the number of duplicates by using the "partition by"-clause on it and then simply delete from the cte all rows that have a row_number() higher than 1.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply