September 11, 2014 at 1:33 am
i have a table like below
create table staff_attendance
(
attendance_id int,
attendace_date datetime,
staff_id int,
working_year int,
hours int
)
values like
1 2014-06-30 00:00:00.0ST10121
2 2014-06-30 00:00:00.0ST10122
3 2014-06-30 00:00:00.0ST10122 ----same entry like previous one
4 2014-07-01 00:00:00.0ST10121
5 2014-07-01 00:00:00.0ST10122
6 2014-07-02 00:00:00.0ST10121
7 2014-07-02 00:00:00.0ST10122
8 2014-06-30 00:00:00.0ST10221
9 2014-06-30 00:00:00.0ST10222
10 2014-07-01 00:00:00.0ST1022 1
11 2014-07-01 00:00:00.0ST102 22
12 2014-07-02 00:00:00.0ST102 21
13 2014-07-02 00:00:00.0ST102 22
I Need to find the duplicate rows like same entries which is having more than 1 rows.... how do i find???
September 11, 2014 at 1:46 am
Quick code to get you passed the hurdle
😎
USE tempdb;
GO
IF OBJECT_ID('dbo.staff_attendance') IS NOT NULL
DROP TABLE dbo.staff_attendance;
create table dbo.staff_attendance
(
attendance_id int
,attendace_date datetime
,staff_id VARCHAR(6)
,working_year int
,[hours] int
);
--attendance_id,attendace_date,staff_id,working_year,[hours]
INSERT INTO dbo.staff_attendance (attendance_id,attendace_date,staff_id,working_year,[hours])
VALUES
(1 ,'2014-06-30 00:00:00.0','ST101',2,1)
,(2 ,'2014-06-30 00:00:00.0','ST101',2,2)
,(3 ,'2014-06-30 00:00:00.0','ST101',2,2) ----same entry like previous one
,(4 ,'2014-07-01 00:00:00.0','ST101',2,1)
,(5 ,'2014-07-01 00:00:00.0','ST101',2,2)
,(6 ,'2014-07-02 00:00:00.0','ST101',2,1)
,(7 ,'2014-07-02 00:00:00.0','ST101',2,2)
,(8 ,'2014-06-30 00:00:00.0','ST102',2,1)
,(9 ,'2014-06-30 00:00:00.0','ST102',2,2)
,(10 ,'2014-07-01 00:00:00.0','ST102',2,1)
,(11 ,'2014-07-01 00:00:00.0','ST102',2,2)
,(12 ,'2014-07-02 00:00:00.0','ST102',2,1)
,(13 ,'2014-07-02 00:00:00.0','ST102',2,2);
SELECT
SA.attendance_id
,SA.attendace_date
,SA.staff_id
,SA.working_year
,SA.[hours]
,COUNT(SA.attendance_id) OVER
(
PARTITION BY SA.attendace_date
,SA.staff_id
,SA.working_year
,SA.[hours]
) AS INSTANCE_COUNT
,ROW_NUMBER() OVER
(
PARTITION BY SA.attendace_date
,SA.staff_id
,SA.working_year
,SA.[hours]
ORDER BY (SA.attendance_id)
) AS INSTANCE_RID
FROM dbo.staff_attendance SA
Results
attendance_id attendace_date staff_id working_year hours INSTANCE_COUNT INSTANCE_RID
------------- ----------------------- -------- ------------ ----------- -------------- --------------
1 2014-06-30 00:00:00.000 ST101 2 1 1 1
2 2014-06-30 00:00:00.000 ST101 2 2 2 1
3 2014-06-30 00:00:00.000 ST101 2 2 2 2
8 2014-06-30 00:00:00.000 ST102 2 1 1 1
9 2014-06-30 00:00:00.000 ST102 2 2 1 1
4 2014-07-01 00:00:00.000 ST101 2 1 1 1
5 2014-07-01 00:00:00.000 ST101 2 2 1 1
10 2014-07-01 00:00:00.000 ST102 2 1 1 1
11 2014-07-01 00:00:00.000 ST102 2 2 1 1
6 2014-07-02 00:00:00.000 ST101 2 1 1 1
7 2014-07-02 00:00:00.000 ST101 2 2 1 1
12 2014-07-02 00:00:00.000 ST102 2 1 1 1
13 2014-07-02 00:00:00.000 ST102 2 2 1 1
September 11, 2014 at 10:44 am
I think you're looking for something like this:
SELECT
attendace_date
, staff_id
, working_year
, hours
, COUNT(1)
FROM
staff_attendance
GROUP BY
attendace_date
, staff_id
, working_year
, hours
HAVING
COUNT(1) > 1
;
This is the standard query for this type of thing.
Cheers!
September 17, 2014 at 1:37 am
If we define duplicate rows as rows where the following attributes are equal: attendace_date, staff_id, working_year, [hours] and that the original row within them is the row with the smallest attendance_id, the following statement identifies duplicate rows (you see the attendance_id, so you can easily remove them if needed):
;WITH cte AS
(SELECT attendance_id, attendace_date, staff_id, working_year, [hours],
ROW_NUMBER() OVER(PARTITION BY attendace_date, staff_id, working_year, [hours]
ORDER BY attendance_id ASC) rn
FROM dbo.staff_attendance
)
SELECT * FROM cte WHERE rn > 1;
Note: the statement uses the table defined by Eirikur Eiriksson. Thanks to Erikur and next time please use the same apporach to define and populate tables by yourself.
Another note: When you want to remove duplicates you need to define them first. This sounds maybe trivial, but sometimes your perception of duplicates does not correspond to the one of developers or business guys. Simple define when two rows are duplicates (i.e. which columns must be equal) and also what is the original row within them (the smallest Id, or highest Id, timestamp whatever). This will take not more than a minute but can avoid potential misunderstandings and misinterpretations.
___________________________
Do Not Optimize for Exceptions!
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply