need to find duplicate rows like same entries more than one time

Question

Post reply

need to find duplicate rows like same entries more than one time

neethu payal

SSC Eights!

Points: 867
More actions
September 11, 2014 at 1:33 am

#311488

i have a table like below
create table staff_attendance
(
attendance_id int,
attendace_date datetime,
staff_id int,
working_year int,
hours int
)
values like
1 2014-06-30 00:00:00.0ST10121
2 2014-06-30 00:00:00.0ST10122
3 2014-06-30 00:00:00.0ST10122 ----same entry like previous one
4 2014-07-01 00:00:00.0ST10121
5 2014-07-01 00:00:00.0ST10122
6 2014-07-02 00:00:00.0ST10121
7 2014-07-02 00:00:00.0ST10122
8 2014-06-30 00:00:00.0ST10221
9 2014-06-30 00:00:00.0ST10222
10 2014-07-01 00:00:00.0ST1022 1
11 2014-07-01 00:00:00.0ST102 22
12 2014-07-02 00:00:00.0ST102 21
13 2014-07-02 00:00:00.0ST102 22
I Need to find the duplicate rows like same entries which is having more than 1 rows.... how do i find???

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 1

Quick code to get you passed the hurdle

😎

USE tempdb;

GO

IF OBJECT_ID('dbo.staff_attendance') IS NOT NULL

DROP TABLE dbo.staff_attendance;

create table dbo.staff_attendance

(

attendance_id int

,attendace_date datetime

,staff_id VARCHAR(6)

,working_year int

,[hours] int

);

--attendance_id,attendace_date,staff_id,working_year,[hours]

INSERT INTO dbo.staff_attendance (attendance_id,attendace_date,staff_id,working_year,[hours])

VALUES

(1 ,'2014-06-30 00:00:00.0','ST101',2,1)

,(2 ,'2014-06-30 00:00:00.0','ST101',2,2)

,(3 ,'2014-06-30 00:00:00.0','ST101',2,2) ----same entry like previous one

,(4 ,'2014-07-01 00:00:00.0','ST101',2,1)

,(5 ,'2014-07-01 00:00:00.0','ST101',2,2)

,(6 ,'2014-07-02 00:00:00.0','ST101',2,1)

,(7 ,'2014-07-02 00:00:00.0','ST101',2,2)

,(8 ,'2014-06-30 00:00:00.0','ST102',2,1)

,(9 ,'2014-06-30 00:00:00.0','ST102',2,2)

,(10 ,'2014-07-01 00:00:00.0','ST102',2,1)

,(11 ,'2014-07-01 00:00:00.0','ST102',2,2)

,(12 ,'2014-07-02 00:00:00.0','ST102',2,1)

,(13 ,'2014-07-02 00:00:00.0','ST102',2,2);

SELECT

SA.attendance_id

,SA.attendace_date

,SA.staff_id

,SA.working_year

,SA.[hours]

,COUNT(SA.attendance_id) OVER

(

PARTITION BY SA.attendace_date

,SA.staff_id

,SA.working_year

,SA.[hours]

) AS INSTANCE_COUNT

,ROW_NUMBER() OVER

(

PARTITION BY SA.attendace_date

,SA.staff_id

,SA.working_year

,SA.[hours]

ORDER BY (SA.attendance_id)

) AS INSTANCE_RID

FROM dbo.staff_attendance SA

Results

attendance_id attendace_date staff_id working_year hours INSTANCE_COUNT INSTANCE_RID

------------- ----------------------- -------- ------------ ----------- -------------- --------------

1 2014-06-30 00:00:00.000 ST101 2 1 1 1

2 2014-06-30 00:00:00.000 ST101 2 2 2 1

3 2014-06-30 00:00:00.000 ST101 2 2 2 2

8 2014-06-30 00:00:00.000 ST102 2 1 1 1

9 2014-06-30 00:00:00.000 ST102 2 2 1 1

4 2014-07-01 00:00:00.000 ST101 2 1 1 1

5 2014-07-01 00:00:00.000 ST101 2 2 1 1

10 2014-07-01 00:00:00.000 ST102 2 1 1 1

11 2014-07-01 00:00:00.000 ST102 2 2 1 1

6 2014-07-02 00:00:00.000 ST101 2 1 1 1

7 2014-07-02 00:00:00.000 ST101 2 2 1 1

12 2014-07-02 00:00:00.000 ST102 2 1 1 1

13 2014-07-02 00:00:00.000 ST102 2 2 1 1

Ozzmodiar Ten Centuries Points: 1345 More actions · Answer 2

I think you're looking for something like this:

SELECT

attendace_date

, staff_id

, working_year

, hours

, COUNT(1)

FROM

staff_attendance

GROUP BY

attendace_date

, staff_id

, working_year

, hours

HAVING

COUNT(1) > 1

;

This is the standard query for this type of thing.

Cheers!

milos.radivojevic Default port Points: 1499 More actions · Answer 3

If we define duplicate rows as rows where the following attributes are equal: attendace_date, staff_id, working_year, [hours] and that the original row within them is the row with the smallest attendance_id, the following statement identifies duplicate rows (you see the attendance_id, so you can easily remove them if needed):

;WITH cte AS

(SELECT attendance_id, attendace_date, staff_id, working_year, [hours],

ROW_NUMBER() OVER(PARTITION BY attendace_date, staff_id, working_year, [hours]

ORDER BY attendance_id ASC) rn

FROM dbo.staff_attendance

)

SELECT * FROM cte WHERE rn > 1;

Note: the statement uses the table defined by Eirikur Eiriksson. Thanks to Erikur and next time please use the same apporach to define and populate tables by yourself.

Another note: When you want to remove duplicates you need to define them first. This sounds maybe trivial, but sometimes your perception of duplicates does not correspond to the one of developers or business guys. Simple define when two rows are duplicates (i.e. which columns must be equal) and also what is the original row within them (the smallest Id, or highest Id, timestamp whatever). This will take not more than a minute but can avoid potential misunderstandings and misinterpretations.

___________________________
Do Not Optimize for Exceptions!