March 11, 2015 at 8:43 am
Hi there,
I have found a bunch of duplicate records in our housing database that ideally I need to delete.
There are two tables that I need to remove data from -
ih_cml_log_entry and ih_cml_log_notes. There is no unique identifier between the tables for a log entry. So I have had to join on the person_ref, log_seq and the date/time of entry.
How do I go about deleting the data - I've used the script below to identify what I need to delete -
SELECT *
FROM
(
select cml.person_ref, cml.open_date + open_time as 'datetime',cml.open_user,cml.log_type
,ROW_NUMBER() OVER (PARTITION BY cml.person_ref, cml.open_date + cml.open_time,cml.open_user,cml.log_type ORDER BY (SELECT 0)) AS RowNo
,n.note
FROM ih_cml_log_entry cml
left outer join
ih_cml_log_notes n
on cml.person_ref = n.person_ref
and cml.open_date + open_time = n.created_date + n.created_time
andcml.log_seq = n.log_seq
) as sub
where [RowNo] <> 1
order by [datetime]
March 11, 2015 at 10:57 am
With no unique identifier you may struggle to delete some rows and keep others. You could run a query to all the unique rows into another table, then drop/rename or delete every from the existing table and replace it the de-duplicated data.
March 11, 2015 at 11:05 am
The relatively simple way is to put your check for duplicate data into a CTE then just delete from that CTE where ROW_NUMBER != 1.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply