May 23, 2010 at 7:33 pm
Hi
I want to extract information where names are duplicated. The issue I have is that I need to do this for only those that have the next sequential id. eg:
id name surname
45623 greg albrecht
45624 greg albrecht
45698 steve hanson
45688 amanda ball
45689 amanda ball
Could somebody help me with this
Thanks and regards
Greg
May 23, 2010 at 7:51 pm
Greg... for future posts, please see the article at the first link in my signature line below. It'll help you and it'll help us.
Here's one possible solution for your question...
--===== Create a test table. This is NOT a part of the solution
DECLARE @YourTable TABLE
(ID INT, Name VARCHAR(15), SurName VARCHAR(15))
;
--===== Populate the test table. This is NOT a part of the solution
INSERT INTO @YourTable
(ID, Name, Surname)
SELECT 45623,'greg','albrecht' UNION ALL
SELECT 45624,'greg','albrecht' UNION ALL
SELECT 45698,'steve','hanson' UNION ALL
SELECT 45688,'amanda','ball' UNION ALL
SELECT 45689,'amanda','ball'
;
--===== This is one possible solution. It uses an "offset" self-join.
SELECT lo.ID AS Lo_ID, hi.ID AS Hi_ID, hi.Name, hi.Surname
FROM @YourTable lo
INNER JOIN @YourTable hi
ON lo.ID + 1 = hi.ID
AND lo.Name = hi.Name
AND lo.SurName = hi.SurName
--Jeff Moden
Change is inevitable... Change for the better is not.
May 23, 2010 at 10:40 pm
Assuming that the ID column will be an INTEGER column, then this one will help you!
Thanks to Jeff for setting up the test environment!
Here is another version of the code:
SELECT
T1.ID FirstID, T2.ID SecondID, T1.Name, T1.SurName
FROM
@YourTable T1
INNER JOIN
@YourTable T2
ON T1.ID - T2.ID = 1 AND
T1.Name = T2.Name AND
T1.SurName = T2.SurName
C'est Pras!
May 24, 2010 at 2:12 am
As i see from yours solutions i see that if ID jumps (is not sequence) than that query will not work.
Hear is my solution:
SELECT [t0].*
FROM [TBLYourTable] AS [t0], [TBLYourTable] AS [t1]
WHERE ([t0].[Name] = [t1].[Name]) AND ([t0].[SurName] = [t1].[SurName]) AND ([t0].[ID] = ((
SELECT [t3].[ID]
FROM (
SELECT TOP (1) [t2].[ID]
FROM [TBLYourTable] AS [t2]
WHERE [t2].[ID] > [t1].[ID]
) AS [t3]
)))
Yes maybe is litelbit ugly but this will do the job.
I'm not T-SQL fan but I use Lint To SQL to find thease solutions
Sorry for my bad english.
May 24, 2010 at 2:20 am
florim As i see from yours solutions i see that if ID jumps (is not sequence) than that query will not work.
Hi florim, the following is the OP's requirement:
The issue I have is that I need to do this for only those that have the next sequential id
So, thats why we stopped with sequential IDs 🙂
~Edit: Tags Fixed
May 24, 2010 at 2:37 am
O sorry i miss that part sorry.
Sory for my bad english
May 24, 2010 at 2:43 am
select distinct * from dupnames dup1
inner join dupnames dup2
on dup1.name = dup2.name
where dup1.id <> dup2. id
May 24, 2010 at 6:14 am
luckysql.kinda (5/24/2010)
select distinct * from dupnames dup1inner join dupnames dup2
on dup1.name = dup2.name
where dup1.id <> dup2. id
That's nice but it doesn't meet the OPs request. You need to check for both names and you have to use dup.ID+1 = dup2.ID because the OP specifically identified that things will be consider to be dupes only if the are apart by 1 in ID.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2010 at 12:37 pm
I've bee perusing the eBook "Best of SqlServerCentral Volume 7" and there is an article on deleting dups. It is written in the book using SqlCmd script. I tried to modify it to a "regular" script for use within SSMS but it seems to confuse the variable type.
Here are the error messages:
/*****************************************/
Msg 1087, Level 15, State 2, Line 28
Must declare the table variable "@TableName".
Msg 1087, Level 15, State 2, Line 39
Must declare the table variable "@TableName".
Msg 1087, Level 15, State 2, Line 47
Must declare the table variable "@TableName".
/*****************************************/
Here's the code:
/*************************************************/
/*
Generic De-Duper
From: The Best of Sql Server Central volume 7
Modified to use Declare/Set instead of :SETVAR
*/
DECLARE @TableName VARCHAR(max) -- Name of table to deduplicate
DECLARE @UniqueColumnList VARCHAR(max) -- Comma separated list of column names
DECLARE @JoinCondition VARCHAR(max) -- Use in WHERE clause to show sample
DECLARE @SampleSize INT
DECLARE @MaxRowsPerGroup INT
SET @TableName = 'eligibility_exception';
SET @UniqueColumnList = 'member_first_name'
SET @JoinCondition = 'T1.member_first_name=T2.member_first_name';
SET @SampleSize = 20;
SET @MaxRowsPerGroup = 2;
SET NOCOUNT ON;
PRINT 'Count / show / delete duplicates records from
(@TableName) based on ((@UniqueColumnList))';
-- 1. Count the duplicated records
-- This is the number of records that will be deleted
-- For example if there are five records in a group with
-- the same values for (@UniqueColumnList) this counts four
WITH DupCounts AS
( SELECT _RowNum_ = row_number()
OVER (
PARTITION BY (@UniqueColumnList)
ORDER BY (@UniqueColumnList) ),*
FROM (@TableName)
)
SELECT CountOfDuplicateRows = count(*) FROM DupCounts WHERE
DupCounts._RowNum_ > 1;
-- 3. Show a sample of the duplicated records
WITH DupCounts AS
( SELECT _RowNum_ = row_number()
OVER (
PARTITION BY (@UniqueColumnList)
ORDER BY (@UniqueColumnList) ),*
FROM (@TableName)
)
SELECT TOP ((@SampleSize)) T1.*
FROM DupCounts T1
WHERE
T1._RowNum_ <= (@MaxRowsPerGroup)
AND EXISTS
(SELECT *
FROM (@TableName) T2
WHERE (@JoinCondition)
GROUP BY (@UniqueColumnList)
HAVING COUNT(*) >1)
ORDER BY (@UniqueColumnList), T1._RowNum_;
-- Delete duplicates if you need to:
-- I leave this commented to avoid disasters
/*
WITH DupCounts AS
( SELECT _RowNum_ = row_number()
OVER (
PARTITION BY (@UniqueColumnList)
ORDER BY (@UniqueColumnList) )
FROM (@TableName)
)
DELETE FROM DupCounts WHERE DupCounts._RowNum_ > 1;
SELECT RowsDeleted = @rowcount;
*/
/***************************************************************/
The issue seems to be that the FROM statement wants a TABLE type variable but the declaration is varchar, but I don't want to declare a TABLE. Is there a workaround for this? Or do I just have to use SqlCmd?
I'm not familiar with SqlCmd... but I guess it gets executed from a command line, yes?
*******************
What I lack in youth, I make up for in immaturity!
May 24, 2010 at 3:11 pm
bross 52202 (5/24/2010)
The issue seems to be that the FROM statement wants a TABLE type variable but the declaration is varchar, but I don't want to declare a TABLE. Is there a workaround for this? Or do I just have to use SqlCmd?I'm not familiar with SqlCmd... but I guess it gets executed from a command line, yes?
Simple replacement of :SETVAR with Declare/Set won't work in this case. You will need to create dynamic SQL to do that.
You can run SQLCMD from SSMS. Go to Query menu and check SQLCMD Mode option.
--Vadim.
--Vadim R.
May 25, 2010 at 8:05 am
Works like a charm. Many thanks. I think I finally have an elegant solution for duplicates.
*******************
What I lack in youth, I make up for in immaturity!
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply