Eliminating Duplicate Rows using The PARTITION BY clause

Question

Eliminating Duplicate Rows using The PARTITION BY clause

Viewing 8 posts - 46 through 52 (of 52 total)

You must be logged in to reply to this topic. Login to reply

suslikovich Old Hand Points: 300 More actions · Answer 1

I was able to use this approach to identify and delete duplicates very easily. I just had my FROM clause in a form of another query that just gets all duplicate rows - very limited output.

I think with careful consideration, the approach is very helpful.

Stan

doodlingdba Hall of Fame Points: 3164 More actions · Answer 2

Wow, what a great article - it was really clear to understand & helped introduce some new ideas to a TSQL newbie like myself. The comments on the article are also really useful, explaining a few different ways of doing the same thing but also highlighting how creative you can be with your code and also how different people approach the same problem in different ways.

I've learned something new & useful today - yay!

Doodles

Scott Abrants SSCommitted Points: 1503 More actions · Answer 3

Scott Abrants

SSCommitted

Points: 1503

June 13, 2011 at 2:47 pm

#1338058

Nicely written. Thank you for your contribution!

okbangas SSChampion Points: 11773 More actions · Answer 4

I agree with those suggesting using a CTE, as I did in my old blog post: How do I remove duplicate tuples from a relation without any candidate keys?

Now, it is seldom a good idea not to have a table without any candidate keys

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

patrickmcginnis59 SSCertifiable Points: 6483 More actions · Answer 5

David Lean (9/21/2010)
Nice article & Nice solution. But I would caution anyone about to blindly using this in production. It is only the 1st part of the process.
Problem 1: need to Fix Declarative Referential Integrity (DRI) for the rows deleted.
Invariably the duplicate rows you are about to remove will be referenced by the foreign keys in another table. When you delete these rows you must keep a copy of the Keys of the rows you deleted & map them to the key of the “duplicate” row you kept. That way you have some lookup table you can use to correct all the foreign key references.

If a foreign key referenced only one of the rows you are deleting, isn't that by definition not a duplicate? So then in that case can't you just delete with the unique key?

jswong05 Hall of Fame Points: 3553 More actions · Answer 6

If a foreign key referenced only one of the rows you are deleting, isn't that by definition not a duplicate? So then in that case can't you just delete with the unique key?

He is saying if the table has a primary key and many columns but the data is already screwed up. Now they identify four other columns combination as unique but data shows duplicates by these four. (The four columns should have had a design of unique key to begin with)

Jason
http://dbace.us

prakashp 84206 Old Hand Points: 303 More actions · Answer 7

Hi,

The output of my query contains duplicate records (i.e. column1 value is similar but other column's value different) as well as single record having column2 value blanks(-). now I want to select only one record out of duplicate records for which column2 values are present (eliminating records for which column2 is blanks(-) from duplicate records) and all other single records having column2 values blanks.

My query's Present Output

column1 column2 column3

abc xyz pqr

abc - klm

ijk - uvw

plo - ujn

yhk ttg wea

yhk - erf

Expected Output

column1 column2 column3

abc xyz pqr

ijk - uvw

plo - ujn

yhk ttg wea

Can someone help me. Thanks in advance.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

prakashp 84206 (6/1/2012)
Hi,
The output of my query contains duplicate records (i.e. column1 value is similar but other column's value different) as well as single record having column2 value blanks(-). now I want to select only one record out of duplicate records for which column2 values are present (eliminating records for which column2 is blanks(-) from duplicate records) and all other single records having column2 values blanks.
My query's Present Output
column1 column2 column3
abc xyz pqr
abc - klm
ijk - uvw
plo - ujn
yhk ttg wea
yhk - erf
Expected Output
column1 column2 column3
abc xyz pqr
ijk - uvw
plo - ujn
yhk ttg wea
Can someone help me. Thanks in advance.

Welcome aboard.

Read the link in my sig, this will assist you in describing your problem and encouraging others to help. Then start a new thread. It's likely that your problem is sufficiently different to the original on this thread that it's better separated from it.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Eliminating Duplicate Rows using The PARTITION BY clause

Cookies on SQLServerCentral