Search for duplicate entries

Question

Search for duplicate entries

Greg Albrecht-252263

Mr or Mrs. 500

Points: 568
More actions
May 23, 2010 at 7:33 pm

#222849

Hi
I want to extract information where names are duplicated. The issue I have is that I need to do this for only those that have the next sequential id. eg:
id name surname
45623 greg albrecht
45624 greg albrecht
45698 steve hanson
45688 amanda ball
45689 amanda ball
Could somebody help me with this
Thanks and regards
Greg

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 1

Greg... for future posts, please see the article at the first link in my signature line below. It'll help you and it'll help us.

Here's one possible solution for your question...

--===== Create a test table. This is NOT a part of the solution

DECLARE @YourTable TABLE

(ID INT, Name VARCHAR(15), SurName VARCHAR(15))

;

--===== Populate the test table. This is NOT a part of the solution

INSERT INTO @YourTable

(ID, Name, Surname)

SELECT 45623,'greg','albrecht' UNION ALL

SELECT 45624,'greg','albrecht' UNION ALL

SELECT 45698,'steve','hanson' UNION ALL

SELECT 45688,'amanda','ball' UNION ALL

SELECT 45689,'amanda','ball'

;

--===== This is one possible solution. It uses an "offset" self-join.

SELECT lo.ID AS Lo_ID, hi.ID AS Hi_ID, hi.Name, hi.Surname

FROM @YourTable lo

INNER JOIN @YourTable hi

ON lo.ID + 1 = hi.ID

AND lo.Name = hi.Name

AND lo.SurName = hi.SurName

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 2

Assuming that the ID column will be an INTEGER column, then this one will help you!

Thanks to Jeff for setting up the test environment!

Here is another version of the code:

SELECT

T1.ID FirstID, T2.ID SecondID, T1.Name, T1.SurName

FROM

@YourTable T1

INNER JOIN

@YourTable T2

ON T1.ID - T2.ID = 1 AND

T1.Name = T2.Name AND

T1.SurName = T2.SurName

C'est Pras!

florim SSC Veteran Points: 293 More actions · Answer 3

As i see from yours solutions i see that if ID jumps (is not sequence) than that query will not work.

Hear is my solution:

SELECT [t0].*

FROM [TBLYourTable] AS [t0], [TBLYourTable] AS [t1]

WHERE ([t0].[Name] = [t1].[Name]) AND ([t0].[SurName] = [t1].[SurName]) AND ([t0].[ID] = ((

SELECT [t3].[ID]

FROM (

SELECT TOP (1) [t2].[ID]

FROM [TBLYourTable] AS [t2]

WHERE [t2].[ID] > [t1].[ID]

) AS [t3]

)))

Yes maybe is litelbit ugly but this will do the job.

I'm not T-SQL fan but I use Lint To SQL to find thease solutions

Sorry for my bad english.

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 4

florim As i see from yours solutions i see that if ID jumps (is not sequence) than that query will not work.

Hi florim, the following is the OP's requirement:

The issue I have is that I need to do this for only those that have the next sequential id

So, thats why we stopped with sequential IDs 🙂

~Edit: Tags Fixed

florim SSC Veteran Points: 293 More actions · Answer 5

O sorry i miss that part sorry.

Sory for my bad english

luckysql.kinda SSCertifiable Points: 7816 More actions · Answer 6

select distinct * from dupnames dup1

inner join dupnames dup2

on dup1.name = dup2.name

where dup1.id <> dup2. id

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 7

luckysql.kinda (5/24/2010)
select distinct * from dupnames dup1
inner join dupnames dup2
on dup1.name = dup2.name
where dup1.id <> dup2. id

That's nice but it doesn't meet the OPs request. You need to check for both names and you have to use dup.ID+1 = dup2.ID because the OP specifically identified that things will be consider to be dupes only if the are apart by 1 in ID.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

brosspaxedi SSCommitted Points: 1591 More actions · Answer 8

I've bee perusing the eBook "Best of SqlServerCentral Volume 7" and there is an article on deleting dups. It is written in the book using SqlCmd script. I tried to modify it to a "regular" script for use within SSMS but it seems to confuse the variable type.

Here are the error messages:

/*****************************************/

Msg 1087, Level 15, State 2, Line 28

Must declare the table variable "@TableName".

Msg 1087, Level 15, State 2, Line 39

Must declare the table variable "@TableName".

Msg 1087, Level 15, State 2, Line 47

Must declare the table variable "@TableName".

/*****************************************/

Here's the code:

/*************************************************/

/*

Generic De-Duper

From: The Best of Sql Server Central volume 7

Modified to use Declare/Set instead of :SETVAR

*/

DECLARE @TableName VARCHAR(max) -- Name of table to deduplicate

DECLARE @UniqueColumnList VARCHAR(max) -- Comma separated list of column names

DECLARE @JoinCondition VARCHAR(max) -- Use in WHERE clause to show sample

DECLARE @SampleSize INT

DECLARE @MaxRowsPerGroup INT

SET @TableName = 'eligibility_exception';

SET @UniqueColumnList = 'member_first_name'

SET @JoinCondition = 'T1.member_first_name=T2.member_first_name';

SET @SampleSize = 20;

SET @MaxRowsPerGroup = 2;

SET NOCOUNT ON;

PRINT 'Count / show / delete duplicates records from

(@TableName) based on ((@UniqueColumnList))';

-- 1. Count the duplicated records

-- This is the number of records that will be deleted

-- For example if there are five records in a group with

-- the same values for (@UniqueColumnList) this counts four

WITH DupCounts AS

( SELECT _RowNum_ = row_number()

OVER (

PARTITION BY (@UniqueColumnList)

ORDER BY (@UniqueColumnList) ),*

FROM (@TableName)

)

SELECT CountOfDuplicateRows = count(*) FROM DupCounts WHERE

DupCounts._RowNum_ > 1;

-- 3. Show a sample of the duplicated records

WITH DupCounts AS

( SELECT _RowNum_ = row_number()

OVER (

PARTITION BY (@UniqueColumnList)

ORDER BY (@UniqueColumnList) ),*

FROM (@TableName)

)

SELECT TOP ((@SampleSize)) T1.*

FROM DupCounts T1

WHERE

T1._RowNum_ <= (@MaxRowsPerGroup)

AND EXISTS

(SELECT *

FROM (@TableName) T2

WHERE (@JoinCondition)

GROUP BY (@UniqueColumnList)

HAVING COUNT(*) >1)

ORDER BY (@UniqueColumnList), T1._RowNum_;

-- Delete duplicates if you need to:

-- I leave this commented to avoid disasters

/*

WITH DupCounts AS

( SELECT _RowNum_ = row_number()

OVER (

PARTITION BY (@UniqueColumnList)

ORDER BY (@UniqueColumnList) )

FROM (@TableName)

)

DELETE FROM DupCounts WHERE DupCounts._RowNum_ > 1;

SELECT RowsDeleted = @rowcount;

*/

/***************************************************************/

The issue seems to be that the FROM statement wants a TABLE type variable but the declaration is varchar, but I don't want to declare a TABLE. Is there a workaround for this? Or do I just have to use SqlCmd?

I'm not familiar with SqlCmd... but I guess it gets executed from a command line, yes?

*******************
What I lack in youth, I make up for in immaturity!

rVadim Hall of Fame Points: 3969 More actions · Answer 9

bross 52202 (5/24/2010)
The issue seems to be that the FROM statement wants a TABLE type variable but the declaration is varchar, but I don't want to declare a TABLE. Is there a workaround for this? Or do I just have to use SqlCmd?
I'm not familiar with SqlCmd... but I guess it gets executed from a command line, yes?

Simple replacement of :SETVAR with Declare/Set won't work in this case. You will need to create dynamic SQL to do that.

You can run SQLCMD from SSMS. Go to Query menu and check SQLCMD Mode option.

--Vadim.

--Vadim R.

brosspaxedi SSCommitted Points: 1591 More actions · Answer 10

Works like a charm. Many thanks. I think I finally have an elegant solution for duplicates.

*******************
What I lack in youth, I make up for in immaturity!