Remove duplicates based on occurrences with join on other table

  • petervdkerk (8/10/2011)


    Ok, I started that. Instead of doing it all on companyname I've now done it on companyphone, because you guys then have the same testdata I already posted earlier and technically it shouldnt matter if I apply the statement on companyname column or companyphone column.

    I think I found where it goes wrong, but dont know how to fix it. Please see my comments in code below.

    Sorry for not getting back to your previous post sooner, I've been busy.

    I don't think the error is where you think it is. The second delete statement depends on the first delete statement having been done - otherwise it may select extra rows. If you change the first delete back from a select to a delete, you will see that the select (that was originally the second delete) doesn't deliver a wrong row to delete.

    Tom

  • That's quite alright, you've been a great help so far. 🙂

    I will look into this after my holiday, so dont think I'm ignoring you ^^

  • Are the rows that get deleted but shouldn't in prospects or in prospectincampaigns or are there wrongly deleted rows in both? Does the wrong delete happen when @campid is 1 (as in your post)?

    I can't see anything being deleted that shouldn't, but maybe I'm misunderstanding what should be deleted; so it would help if you could tell me the propspectid of any wrongly deleted prospects row or the prospectid, campaignid and result of a wrongly deleted prospectsincampaigns row.

    Tom

  • Ok, Im back from my holiday and started testing it right away. This is what happens:

    Ok, so I have this as my data:

    DELETE FROM prospects

    DELETE FROM prospectsincampaigns

    declare @newid int

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Smeets M.B. Zuid-Limburg BV','Handelsstraat','2','','6433 KB','Heerlen (MAASTRICHT)','0464209900','1','','','Laumans','n.laumans@smeets-mb.nl','','15','T-Mobile','2/1/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('F.J. ten Berge BV','Wattstraat','5','','7461AB','RIJSSEN','0548512525','1','','','Nijkamp','','','18','Telfort','4/10/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('F.J. ten Berge BV','Wattstraat','5','B','7461AB','RIJSSEN','0548512525','1','','','Nijkamp','mnijkamp@tenbergerijssen.nl','','18','Telfort','4/10/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('MacDonalds','Rozsemarylane','1 B','','AG','','','-1','Markos','','markos@macdonalds.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,5 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('MacDonalds','','','','AG','','0123456789','-1','Markos','','markos@macdonalds.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,5 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','Noord IJsseldijk','95 B','','PG','','0306304444','-1','Uzehir??','','u.duran@duranbv.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','','95 B','B','3402','IJsselstein','','1','','','','','15','','T-Mobile','1/9/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','Noord IJsseldijk','','','3402PG','','','-1','','','','u.duran@duranbv.nl','','','T-Mobile','1/9/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert same record for same campaign

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,14 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'positief gesprek!' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert same record for different campaign

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,15 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'AFSPRAAK GESCOORD!' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,8 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert record that is not assigned to any campaign

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Company in No campaign','Mijnlaan','3','','45435 KB','Ablasserdam','0464209900','1','','','Peters','peters@nocamp.nl','','11','T-Mobile','2/1/2011')

    Then I execute the following statement (1 of the companies named 'F.J. ten Berge BV' is deleted correctly, but the record that SHOULDNT be deleted, but IS deleted from [prospects] is the last inserted record with companytitle 'Company in No campaign'):

    declare @campid int

    set @campid=1

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyphone = p2.companyphone and p.companyphone<>'' and c2.campaignid = c.campaignid and c.campaignid=@campid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c

    where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid and c.campaignid=@campid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID and c.campaignid=@campid

    and p2.companyphone = p.companyphone and p.companyphone<>'' and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p

    where exists (select * from #deletions d where d.prospectID = p.id)

    and not exists (select * from prospectsincampaigns C where c.prospectid = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid <> @campid)

    and exists (select * from ProspectsInCampaigns c inner join prospects p2

    on p2.id = c.prospectid where p2.id <> p.id and p2.companyphone = p.companyphone and p.companyphone<>'' and c.campaignid = @campid)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = @campid

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

  • petervdkerk (8/31/2011)


    Ok, Im back from my holiday and started testing it right away. This is what happens:

    I'm in the nmiddle os shifting base at the moment, will try to get back to you on this before Monday.

    (Unless someone else fixes it first)

    Tom

  • Ok, great! 🙂

    And good luck with with moving.

  • That "company in no campaign" (company id 9) is being deleted because it has the same phone number as "Smeets M.B. Zuid-Limburg BV" (company id 1, which doesn't get deleted) and it isn't in any campaign; since the code checks for same company by looking at phone number, and company id 9 is in no campaign, the code treats company id 9 is a redundant copy of company id 1.

    Are they meant to have the same phone number? If so the phone number can't be used for checking whether two companies are the same as each other.

    Tom

  • you sir...are correct! Thanks so much, my client will be happy now, thanks to you 🙂

Viewing 8 posts - 31 through 37 (of 37 total)

You must be logged in to reply to this topic. Login to reply