Remove duplicates based on occurrences with join on other table

  • I have 2 tables:

    prospects and prospectsincampaigns

    CREATE TABLE [dbo].[prospects](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [companyname] [nvarchar](255) NULL,

    CONSTRAINT [PK_intercity] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[ProspectsInCampaigns](

    [prospectid] [int] NOT NULL,

    [campaignid] [int] NOT NULL

    ) ON [PRIMARY]

    GO

    in [prospects] I have all the details of a prospect.

    in [prospectsincampaigns] I have appointment details on prospects

    Now, based on companyname I want to remove duplicates from [prospects] and [prospectsincampaign] where the prospect shares the same campaignid in [prospectsincampaigns]

    The prospect with the lowest [prospect].id may remain, the rest can be removed.

    e.g. this is the CURRENT situation:

    [prospects]

    idcompanyname

    3IBM

    6IBM

    7IBM

    10Microsoft

    [prospectsincampaigns]

    prospectidcampaignid

    31

    61

    72

    102

    this should be the AFTER situation

    [prospects]

    idcompanyname

    3IBM

    7IBM

    10Microsoft

    [prospectsincampaigns]

    prospectidcampaignid

    31

    72

    102

    Notice how in the AFTER situation the IBM companyname may exist multiple times in [prospects] but not for the same campaign anymore.

    I already had some good feedback on restructuring my tables, but for now I NEED it to work like this!

    I already had a stament where I tried to remove all duplicates in [prospects] regardless of campaigns, couldnt get that to work either :), but perhaps you can use this as a starting point for the above requirement.

    Also any SQL optimization is more than welcome 🙂

    ;with dupes as

    (select id,companyname,COUNT(*) over(partition by companyname) AS counter,ROW_NUMBER()

    OVER(partition by companyname order by id) AS rownum,

    pic AS pic,SUM(pic)

    OVER(partition by companyname) AS totalpic,MIN(pic)

    OVER(partition by companyname) AS minpic from prospects

    outer apply

    (select COUNT(*) from ProspectsInCampaigns where prospectid = id) a(pic))

    DELETE p from dupes

    cross apply

    (SELECT COALESCE(case when totalpic =0 and rownum>1 then 'DELETE ME' else null end, case when minpic>0 then NULL when totalpic>0 and pic=0 then 'DELETE ME' else null end )

    AS deleteFlag ) x

    join prospects p on p.id = dupes.id

    where counter>1 AND deleteFlag='DELETE ME'

  • This was removed by the editor as SPAM

  • It does! 🙂 Thanks!

  • Hi,

    One final question with which I hope you are able to help me 🙂

    Basically the statement you gave me consists of 3 separate statements. I've placed them below and placed some comments with them.

    --1. select DUPLICATES (based on companyname) from prospectsincampaign

    --? my question on statement below: how can i select only the duplicates from the same campaign based on prospect.companyname?

    WITH MinProspectPerCampaign AS

    (SELECT MIN(p.id) id, p.companyname, c.campaignid FROM [dbo].[prospects] p JOIN [dbo].[ProspectsInCampaigns] c ON c.prospectid = p.id

    GROUP BY c.campaignid, p.companyname)

    --2. from these duplicates delete duplicate records that are in same campaign from prospects (delete the ones where prospectsincampaigns.result=0 first), after deletion at least ONE unqiue record should remain in prospects (it can be that for that record prospectsincampaigns.resul=0)

    --? my question on statement below: this statement deletes all records in prospects that do not occur in prospectsincampaigns, and does not take into account the campaign in which the records are

    DELETE [dbo].[prospects] WHERE NOT EXISTS (

    SELECT prospectid FROM [dbo].[ProspectsInCampaigns]

    WHERE [dbo].[ProspectsInCampaigns].prospectid = [dbo].[prospects].id

    --3. from these duplicates delete duplicate records that are in same campaign from prospectsincampaigns (delete the ones where prospectsincampaigns.result=0 first), after deletion at least ONE unqiue record should remain in prospectsincampaign (it can be that for that record prospectsincampaigns.resul=0)

    --? my question on statement below: this statement does not take the prospectsincampaigns.result value into account where result=0 should be deleted first

    DELETE [dbo].[ProspectsInCampaigns]

    WHERE NOT EXISTS (

    SELECT id, campaignid FROM MinProspectPerCampaign m

    WHERE m.id = [dbo].[ProspectsInCampaigns].prospectid AND m.campaignid = [dbo].[ProspectsInCampaigns].campaignid

    The above total statement does something that had some unexpected effect for me

    - prospects that are in NO campaign are now also deleted! This is not desirable

    - IF the same prospectid occurs MULTIPLE times in the same campaign in prospectsincampaigns, only the records that have prospectsincampaigns.result=0 should be deleted. Currently the same records within same campaign are all preserved regardless of the resultcode.

    Can you help with that?

    edit: added more text

  • update (am working on it), the first statement would be:

    SELECT MIN(p.id) id, p.companyname,c.campaignid FROM [dbo].[prospects] p

    JOIN [dbo].[ProspectsInCampaigns] c ON c.prospectid = p.id

    GROUP BY c.campaignid, p.companyname

    HAVING COUNT(p.companyname)>1

    This gives me all records that have the same prospectsincampaigns.campaignid and share the same prospects.companyname and have MORE than 1 occurencens (so are duplicate)

  • I'm having real difficulties with the rule: all duplicate records where result=0 should be deleted, but at least 1 record should remain even if the result of that record is 0

    This is what I need:

    BEFORE

    prospectidcampaignid result

    135504440

    135504448

    1355044414

    135505040

    135505050

    AFTER

    prospectidcampaignid result

    135504448

    1355044414

    135505040

    135505050

    And sorry for spamming :s

  • petervdkerk (7/30/2011)


    The above total statement does something that had some unexpected effect for me

    - prospects that are in NO campaign are now also deleted! This is not desirable

    So presumably the only prospacts deleted are those for which the prospectsincampaigns row (if any) has been deleted.

    -IF the same prospectid occurs MULTIPLE times in the same campaign in prospectsincampaigns, only the records that have prospectsincampaigns.result=0 should be deleted. Currently the same records within same campaign are all preserved regardless of the resultcode.

    Can you help with that?

    The DDL you provided doesn't include that column. Will there be any more surprises? It's best if you tell us everything that matters.

    Assuming that result in an int (of some sort) or bit column in propspectsincampaigns, the code you need may be something like

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid=c.campaignid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid)

    --delete redundant (result 0) min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c where result = 0

    and exists (select * from ProspectsInCampaigns c2

    where c2.prospectID > c.prospectID and c2.campaignid = c.campaignID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    -- throw away temporary data

    drop table #deletions

    This isn't elegant, maybe isn't efficient, but it works if I have understood the requirement. It uses a temporary table instead of a CTE because it needs to use the temp table in two delete statements, to avoid deleting prospects that have no campaigns

    Tom

  • Believe me, Im trying to describe my requirement as best as possible, my mistake if I missed something :$ For me: am not a big fan of surprises unless its Christmas either ^^

    To prevent any further surprises hereby below the 2 relevant tables and ALL their data scripted.

    I've also created the before and after situation where I indented rows which should be deleted and behind it the rule WHY they should be deleted.

    I hope this clarifies what I need 🙂

    Thanks ahead for the help!

    BEFORE SITUATION

    [prospects]

    idcompanyname

    1355005Smeets M.B. Zuid-Limburg BV

    1355006F.J. ten Berge BV

    1355007F.J. ten Berge BV

    1355008Duran BV

    1355009Duran BV

    1355010Duran BV

    1355011Company in No campaign

    1355012MedicalInc

    [prospectsincampaigns]

    prospectidcampaignidresult

    135500510

    135500610

    135500710

    135500810

    135500910

    135501040

    135501048

    1355010414

    1355010615

    135501270

    135501270

    DESIRED AFTER SITUATION

    [prospects]

    idcompanyname

    1355005Smeets M.B. Zuid-Limburg BV

    1355006F.J. ten Berge BV

    1355007F.J. ten Berge BV //deleted because this id was deleted from prospectsincampaigns AND because 1355007 does NOT occur for any other campaign in prospectsincampaigns

    1355008Duran BV

    1355009Duran BV //deleted because this id was deleted from prospectsincampaigns AND because 1355009 does NOT occur for any other campaign in prospectsincampaigns

    1355010Duran BV

    1355011Company in No campaign

    1355012MedicalInc

    [prospectsincampaigns]

    prospectidcampaignidresult

    135500510

    135500610

    135500710//deleted because id with exact same companyname (F.J. ten Berge BV) already occurs in this campaignid (1)

    135500810

    135500910// deleted because same id (1355009) already occurs for same campaignid (1)

    135501040 // deleted because same id (1355010) already occurs for same campaignid (4) and has result>0

    135501048

    1355010414

    1355010615

    135501270 //this one is not deleted because id 1355012 with campaignid 7 is a unique combination

    135501270// deleted because same id (1355012) already occurs for same campaignid (7)

    GO

    /****** Object: Table [dbo].[ProspectsInCampaigns] Script Date: 07/30/2011 16:56:14 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[ProspectsInCampaigns](

    [prospectid] [int] NOT NULL,

    [campaignid] [int] NOT NULL,

    [result] [int] NOT NULL,

    [personalcallbackid] [nvarchar](50) NULL,

    [callbackdate] [datetime] NOT NULL,

    [appointmentdate] [datetime] NOT NULL,

    [notes] [nvarchar](4000) NULL,

    [updatedate] [datetime] NOT NULL,

    [updatedby] [nvarchar](50) NULL,

    [recordlastrequestedby] [nvarchar](50) NULL,

    [recordlastrequestdate] [datetime] NOT NULL,

    [exportdate] [datetime] NOT NULL,

    [createdate] [datetime] NOT NULL

    ) ON [PRIMARY]

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'1=mobiel,2=vast,3=managed service,4=0800,5=0900,6=digitale pen' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'ProspectsInCampaigns', @level2type=N'COLUMN',@level2name=N'notes'

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'user who last updated this record' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'ProspectsInCampaigns', @level2type=N'COLUMN',@level2name=N'updatedby'

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'who is the last user who requested this record' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'ProspectsInCampaigns', @level2type=N'COLUMN',@level2name=N'recordlastrequestedby'

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'who is the last user who requested this record' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'ProspectsInCampaigns', @level2type=N'COLUMN',@level2name=N'recordlastrequestdate'

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'when was this record exported to excel' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'ProspectsInCampaigns', @level2type=N'COLUMN',@level2name=N'exportdate'

    GO

    INSERT [dbo].[ProspectsInCampaigns] ([prospectid], [campaignid], [result], [personalcallbackid], [callbackdate], [appointmentdate], [notes], [updatedate], [updatedby], [recordlastrequestedby], [recordlastrequestdate], [exportdate], [createdate]) VALUES (1355001, 1, 0, N'', CAST(0x0000000000000000 AS DateTime), CAST(0x0000000000000000 AS DateTime), N'', CAST(0x0000000000000000 AS DateTime), N'', N'', CAST(0x0000000000000000 AS DateTime), CAST(0x0000000000000000 AS DateTime), CAST(0x00009F300112FFE2 AS DateTime))

    INSERT [dbo].[ProspectsInCampaigns] ([prospectid], [campaignid], [result], [personalcallbackid], [callbackdate], [appointmentdate], [notes], [updatedate], [updatedby], [recordlastrequestedby], [recordlastrequestdate], [exportdate], [createdate]) VALUES (1355003, 4, 0, N'', CAST(0x0000000000000000 AS DateTime), CAST(0x00009E0E009450C0 AS DateTime), N'', CAST(0x0000000000000000 AS DateTime), N'', N'', CAST(0x0000000000000000 AS DateTime), CAST(0x0000000000000000 AS DateTime), CAST(0x00009F300112FFE3 AS DateTime))

    INSERT [dbo].[ProspectsInCampaigns] ([prospectid], [campaignid], [result], [personalcallbackid], [callbackdate], [appointmentdate], [notes], [updatedate], [updatedby], [recordlastrequestedby], [recordlastrequestdate], [exportdate], [createdate]) VALUES (1355003, 4, 8, N'', CAST(0x0000000000000000 AS DateTime), CAST(0x00009E0E009450C0 AS DateTime), N'', CAST(0x0000000000000000 AS DateTime), N'', N'', CAST(0x0000000000000000 AS DateTime), CAST(0x0000000000000000 AS DateTime), CAST(0x00009F300112FFE3 AS DateTime))

    INSERT [dbo].[ProspectsInCampaigns] ([prospectid], [campaignid], [result], [personalcallbackid], [callbackdate], [appointmentdate], [notes], [updatedate], [updatedby], [recordlastrequestedby], [recordlastrequestdate], [exportdate], [createdate]) VALUES (1355003, 4, 14, N'', CAST(0x0000000000000000 AS DateTime), CAST(0x00009E0E009450C0 AS DateTime), N'positief gesprek!', CAST(0x0000000000000000 AS DateTime), N'', N'', CAST(0x0000000000000000 AS DateTime), CAST(0x0000000000000000 AS DateTime), CAST(0x00009F300112FFE3 AS DateTime))

    INSERT [dbo].[ProspectsInCampaigns] ([prospectid], [campaignid], [result], [personalcallbackid], [callbackdate], [appointmentdate], [notes], [updatedate], [updatedby], [recordlastrequestedby], [recordlastrequestdate], [exportdate], [createdate]) VALUES (1355003, 6, 15, N'', CAST(0x0000000000000000 AS DateTime), CAST(0x00009E0E009450C0 AS DateTime), N'AFSPRAAK GESCOORD!', CAST(0x0000000000000000 AS DateTime), N'', N'', CAST(0x0000000000000000 AS DateTime), CAST(0x0000000000000000 AS DateTime), CAST(0x00009F300112FFE3 AS DateTime))

    /****** Object: Table [dbo].[prospects] Script Date: 07/30/2011 16:56:14 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[prospects](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [uuid] [uniqueidentifier] NOT NULL,

    [companyname] [nvarchar](255) NULL,

    [companytype] [int] NOT NULL,

    [companyemployees] [int] NOT NULL,

    [companyaddress] [nvarchar](255) NULL,

    [companyhousenr] [nvarchar](255) NULL,

    [companyhousenradd] [nvarchar](255) NULL,

    [companyzipcode] [nvarchar](255) NULL,

    [companycity] [nvarchar](255) NULL,

    [companyfax] [nvarchar](255) NULL,

    [companybranche] [tinyint] NOT NULL,

    [companyphone] [nvarchar](255) NULL,

    [companywebsite] [nvarchar](255) NULL,

    [companyvehicles] [int] NOT NULL,

    [cp_sex] [int] NOT NULL,

    [cp_initials] [nvarchar](255) NULL,

    [cp_firstname] [nvarchar](255) NULL,

    [cp_middlename] [nvarchar](255) NULL,

    [cp_lastname] [nvarchar](255) NULL,

    [cp_jobtitle] [nvarchar](255) NULL,

    [cp_email] [nvarchar](255) NULL,

    [cp_phone] [nvarchar](255) NULL,

    [cp_phone2] [nvarchar](255) NULL,

    [cp_phonemobile] [nvarchar](255) NULL,

    [telco_mob] [nvarchar](255) NULL,

    [telco_mob_connections] [int] NULL,

    [telco_mob_provider] [nvarchar](255) NULL,

    [telco_mob_dealer] [nvarchar](255) NULL,

    [telco_mob_enddate] [datetime] NULL,

    [telco_mob_costs] [int] NULL,

    [telco_vast] [nvarchar](255) NULL,

    [telco_vast_connections] [int] NULL,

    [telco_vast_provider] [nvarchar](255) NULL,

    [telco_vast_dealer] [nvarchar](255) NULL,

    [telco_vast_enddate] [datetime] NULL,

    [telco_vast_costs] [int] NULL,

    [telco_data] [nvarchar](255) NULL,

    [telco_data_connections] [int] NULL,

    [telco_data_provider] [nvarchar](255) NULL,

    [telco_data_dealer] [nvarchar](255) NULL,

    [telco_data_enddate] [datetime] NULL,

    [telco_data_costs] [int] NULL,

    [mraqua_interest] [nvarchar](50) NULL,

    [mraqua_coolers] [int] NOT NULL,

    [contactmoments] [nvarchar](4000) NULL,

    [importdate] [datetime] NULL,

    [filename] [nvarchar](255) NULL,

    [presenter_interest] [nvarchar](500) NULL,

    [presenter_budget] [nvarchar](50) NULL,

    [campaignid_delme] [int] NOT NULL,

    [importcode] [nvarchar](50) NULL,

    CONSTRAINT [PK_intercity] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'this is a unique code that can be given to the import, so that its easy to find the imported records and also assign them to campaigns based on this code' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'prospects', @level2type=N'COLUMN',@level2name=N'importcode'

    GO

    SET IDENTITY_INSERT [dbo].[prospects] ON

    INSERT [dbo].[prospects] ([id], [uuid], [companyname], [companytype], [companyemployees], [companyaddress], [companyhousenr], [companyhousenradd], [companyzipcode], [companycity], [companyfax], [companybranche], [companyphone], [companywebsite], [companyvehicles], [cp_sex], [cp_initials], [cp_firstname], [cp_middlename], [cp_lastname], [cp_jobtitle], [cp_email], [cp_phone], [cp_phone2], [cp_phonemobile], [telco_mob], [telco_mob_connections], [telco_mob_provider], [telco_mob_dealer], [telco_mob_enddate], [telco_mob_costs], [telco_vast], [telco_vast_connections], [telco_vast_provider], [telco_vast_dealer], [telco_vast_enddate], [telco_vast_costs], [telco_data], [telco_data_connections], [telco_data_provider], [telco_data_dealer], [telco_data_enddate], [telco_data_costs], [mraqua_interest], [mraqua_coolers], [contactmoments], [importdate], [filename], [presenter_interest], [presenter_budget], [campaignid_delme], [importcode]) VALUES (1354998, N'b9128aae-ba76-4779-b632-6e3a8bdbb467', N'Smeets M.B. Zuid-Limburg BV', 0, 0, N'Handelsstraat', N'2', N'', N'6433 KB', N'Heerlen (MAASTRICHT)', NULL, 0, N'0464209900', NULL, 0, 1, NULL, N'', N'', N'Laumans', NULL, N'n.laumans@smeets-mb.nl', N'', NULL, NULL, NULL, 15, N'T-Mobile', NULL, CAST(0x00009E7D00000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, CAST(0x00009F300112FFE2 AS DateTime), NULL, NULL, NULL, 0, NULL)

    INSERT [dbo].[prospects] ([id], [uuid], [companyname], [companytype], [companyemployees], [companyaddress], [companyhousenr], [companyhousenradd], [companyzipcode], [companycity], [companyfax], [companybranche], [companyphone], [companywebsite], [companyvehicles], [cp_sex], [cp_initials], [cp_firstname], [cp_middlename], [cp_lastname], [cp_jobtitle], [cp_email], [cp_phone], [cp_phone2], [cp_phonemobile], [telco_mob], [telco_mob_connections], [telco_mob_provider], [telco_mob_dealer], [telco_mob_enddate], [telco_mob_costs], [telco_vast], [telco_vast_connections], [telco_vast_provider], [telco_vast_dealer], [telco_vast_enddate], [telco_vast_costs], [telco_data], [telco_data_connections], [telco_data_provider], [telco_data_dealer], [telco_data_enddate], [telco_data_costs], [mraqua_interest], [mraqua_coolers], [contactmoments], [importdate], [filename], [presenter_interest], [presenter_budget], [campaignid_delme], [importcode]) VALUES (1354999, N'cae25fb3-5f0e-4b78-b881-9fe540b9a633', N'F.J. ten Berge BV', 0, 0, N'Wattstraat', N'5', N'', N'7461AB', N'RIJSSEN', NULL, 0, N'0548512525', NULL, 0, 1, NULL, N'', N'', N'Nijkamp', NULL, N'', N'', NULL, NULL, NULL, 18, N'Telfort', NULL, CAST(0x00009EC100000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, CAST(0x00009F300112FFE2 AS DateTime), NULL, NULL, NULL, 0, NULL)

    INSERT [dbo].[prospects] ([id], [uuid], [companyname], [companytype], [companyemployees], [companyaddress], [companyhousenr], [companyhousenradd], [companyzipcode], [companycity], [companyfax], [companybranche], [companyphone], [companywebsite], [companyvehicles], [cp_sex], [cp_initials], [cp_firstname], [cp_middlename], [cp_lastname], [cp_jobtitle], [cp_email], [cp_phone], [cp_phone2], [cp_phonemobile], [telco_mob], [telco_mob_connections], [telco_mob_provider], [telco_mob_dealer], [telco_mob_enddate], [telco_mob_costs], [telco_vast], [telco_vast_connections], [telco_vast_provider], [telco_vast_dealer], [telco_vast_enddate], [telco_vast_costs], [telco_data], [telco_data_connections], [telco_data_provider], [telco_data_dealer], [telco_data_enddate], [telco_data_costs], [mraqua_interest], [mraqua_coolers], [contactmoments], [importdate], [filename], [presenter_interest], [presenter_budget], [campaignid_delme], [importcode]) VALUES (1355001, N'5bd21906-28f8-4b8a-afa0-592f3c1e64a4', N'Duran BV', 0, 0, N'Noord IJsseldijk', N'95 B', N'', N'PG', N'', NULL, 0, N'0306304444', NULL, 0, -1, NULL, N'Uzehir??', N'', N'u.duran@duranbv.nl', NULL, N'', N'', NULL, NULL, NULL, 0, N'T-Mobile', NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, CAST(0x00009F300112FFE2 AS DateTime), NULL, NULL, NULL, 0, NULL)

    INSERT [dbo].[prospects] ([id], [uuid], [companyname], [companytype], [companyemployees], [companyaddress], [companyhousenr], [companyhousenradd], [companyzipcode], [companycity], [companyfax], [companybranche], [companyphone], [companywebsite], [companyvehicles], [cp_sex], [cp_initials], [cp_firstname], [cp_middlename], [cp_lastname], [cp_jobtitle], [cp_email], [cp_phone], [cp_phone2], [cp_phonemobile], [telco_mob], [telco_mob_connections], [telco_mob_provider], [telco_mob_dealer], [telco_mob_enddate], [telco_mob_costs], [telco_vast], [telco_vast_connections], [telco_vast_provider], [telco_vast_dealer], [telco_vast_enddate], [telco_vast_costs], [telco_data], [telco_data_connections], [telco_data_provider], [telco_data_dealer], [telco_data_enddate], [telco_data_costs], [mraqua_interest], [mraqua_coolers], [contactmoments], [importdate], [filename], [presenter_interest], [presenter_budget], [campaignid_delme], [importcode]) VALUES (1355003, N'1bde540d-ab65-4eee-9ea1-9b35ce9f349b', N'Duran BV', 0, 0, N'Noord IJsseldijk', N'', N'', N'3402PG', N'', NULL, 0, N'', NULL, 0, -1, NULL, N'', N'', N'', NULL, N'u.duran@duranbv.nl', N'', NULL, NULL, NULL, 0, N'T-Mobile', NULL, CAST(0x00009E6600000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, CAST(0x00009F300112FFE3 AS DateTime), NULL, NULL, NULL, 0, NULL)

    INSERT [dbo].[prospects] ([id], [uuid], [companyname], [companytype], [companyemployees], [companyaddress], [companyhousenr], [companyhousenradd], [companyzipcode], [companycity], [companyfax], [companybranche], [companyphone], [companywebsite], [companyvehicles], [cp_sex], [cp_initials], [cp_firstname], [cp_middlename], [cp_lastname], [cp_jobtitle], [cp_email], [cp_phone], [cp_phone2], [cp_phonemobile], [telco_mob], [telco_mob_connections], [telco_mob_provider], [telco_mob_dealer], [telco_mob_enddate], [telco_mob_costs], [telco_vast], [telco_vast_connections], [telco_vast_provider], [telco_vast_dealer], [telco_vast_enddate], [telco_vast_costs], [telco_data], [telco_data_connections], [telco_data_provider], [telco_data_dealer], [telco_data_enddate], [telco_data_costs], [mraqua_interest], [mraqua_coolers], [contactmoments], [importdate], [filename], [presenter_interest], [presenter_budget], [campaignid_delme], [importcode]) VALUES (1355004, N'731ac42b-0b2e-40e0-b2d4-a7f1d3a526b5', N'Company in No campaign', 0, 0, N'Mijnlaan', N'3', N'', N'45435 KB', N'Ablasserdam', NULL, 0, N'0464209900', NULL, 0, 1, NULL, N'', N'', N'Peters', NULL, N'peters@nocamp.nl', N'', NULL, NULL, NULL, 11, N'T-Mobile', NULL, CAST(0x00009E7D00000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, NULL, CAST(0x0000000000000000 AS DateTime), 0, NULL, 0, NULL, CAST(0x00009F300112FFE3 AS DateTime), NULL, NULL, NULL, 0, NULL)

    SET IDENTITY_INSERT [dbo].[prospects] OFF

    /****** Object: Default [DF_prospects_uuid_1] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_uuid_1] DEFAULT (newid()) FOR [uuid]

    GO

    /****** Object: Default [DF_prospects_companytype_1] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_companytype_1] DEFAULT ((0)) FOR [companytype]

    GO

    /****** Object: Default [DF_prospects_companyemployees_1] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_companyemployees_1] DEFAULT ((0)) FOR [companyemployees]

    GO

    /****** Object: Default [DF_prospects_companybranche] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_companybranche] DEFAULT ((0)) FOR [companybranche]

    GO

    /****** Object: Default [DF_prospects_companyvehicles] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_companyvehicles] DEFAULT ((0)) FOR [companyvehicles]

    GO

    /****** Object: Default [DF_prospects_cp_sex_1] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_cp_sex_1] DEFAULT ((-1)) FOR [cp_sex]

    GO

    /****** Object: Default [DF_prospects_telco_mob_connections] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_mob_connections] DEFAULT ((0)) FOR [telco_mob_connections]

    GO

    /****** Object: Default [DF_prospects_telco_mob_enddate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_mob_enddate] DEFAULT (((1)/(1))/(1900)) FOR [telco_mob_enddate]

    GO

    /****** Object: Default [DF_prospects_telco_mob_costs] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_mob_costs] DEFAULT ((0)) FOR [telco_mob_costs]

    GO

    /****** Object: Default [DF_prospects_telco_vast_connections] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_vast_connections] DEFAULT ((0)) FOR [telco_vast_connections]

    GO

    /****** Object: Default [DF_prospects_telco_vast_enddate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_vast_enddate] DEFAULT (((1)/(1))/(1900)) FOR [telco_vast_enddate]

    GO

    /****** Object: Default [DF_prospects_telco_vast_costs] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_vast_costs] DEFAULT ((0)) FOR [telco_vast_costs]

    GO

    /****** Object: Default [DF_prospects_telco_data_connections] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_data_connections] DEFAULT ((0)) FOR [telco_data_connections]

    GO

    /****** Object: Default [DF_prospects_telco_data_enddate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_data_enddate] DEFAULT (((1)/(1))/(1900)) FOR [telco_data_enddate]

    GO

    /****** Object: Default [DF_prospects_telco_data_costs] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_telco_data_costs] DEFAULT ((0)) FOR [telco_data_costs]

    GO

    /****** Object: Default [DF_prospects_mraqua_coolers_1] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_mraqua_coolers_1] DEFAULT ((0)) FOR [mraqua_coolers]

    GO

    /****** Object: Default [DF_prospects_importdate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_importdate] DEFAULT (getdate()) FOR [importdate]

    GO

    /****** Object: Default [DF_prospects_campaignid] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[prospects] ADD CONSTRAINT [DF_prospects_campaignid] DEFAULT ((0)) FOR [campaignid_delme]

    GO

    /****** Object: Default [DF_ProspectsInCampaigns_result] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[ProspectsInCampaigns] ADD CONSTRAINT [DF_ProspectsInCampaigns_result] DEFAULT ((0)) FOR [result]

    GO

    /****** Object: Default [DF_ProspectsInCampaigns_callbackdate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[ProspectsInCampaigns] ADD CONSTRAINT [DF_ProspectsInCampaigns_callbackdate] DEFAULT (((1)/(1))/(1900)) FOR [callbackdate]

    GO

    /****** Object: Default [DF_ProspectsInCampaigns_appointmentdate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[ProspectsInCampaigns] ADD CONSTRAINT [DF_ProspectsInCampaigns_appointmentdate] DEFAULT (((1)/(1))/(1900)) FOR [appointmentdate]

    GO

    /****** Object: Default [DF_ProspectsInCampaigns_updatedate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[ProspectsInCampaigns] ADD CONSTRAINT [DF_ProspectsInCampaigns_updatedate] DEFAULT (((1)/(1))/(1900)) FOR [updatedate]

    GO

    /****** Object: Default [DF_ProspectsInCampaigns_recordlastrequestdate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[ProspectsInCampaigns] ADD CONSTRAINT [DF_ProspectsInCampaigns_recordlastrequestdate] DEFAULT (((1)/(1))/(1900)) FOR [recordlastrequestdate]

    GO

    /****** Object: Default [DF_ProspectsInCampaigns_exportdate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[ProspectsInCampaigns] ADD CONSTRAINT [DF_ProspectsInCampaigns_exportdate] DEFAULT (((1)/(1))/(1900)) FOR [exportdate]

    GO

    /****** Object: Default [DF_ProspectsInCampaigns_createdate] Script Date: 07/30/2011 16:56:14 ******/

    ALTER TABLE [dbo].[ProspectsInCampaigns] ADD CONSTRAINT [DF_ProspectsInCampaigns_createdate] DEFAULT (getdate()) FOR [createdate]

    GO

  • petervdkerk (7/30/2011)


    To prevent any further surprises hereby below the 2 relevant tables and ALL their data scripted.

    I've also created the before and after situation where I indented rows which should be deleted and behind it the rule WHY they should be deleted.

    I hope this clarifies what I need 🙂

    yes, it seems pretty clear. I was surprised to discover a need to delete duplicates, though!

    Here's some code that does it. As before, it's not in the least elegant, but it appears to work on what I believe are your requirements. Test results match those you provided.

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid

    where f.N < f2.N

    -- throw away temporary data

    drop table #deletions

    For future reference, please try to conform to the best practises listed at teh two links following when posting questions. It makes it easier for people to help you, so you'll get the right answer sooner that way.

    For better, quicker answers on T-SQL questions click here[/url]

    For better answers on performance questions click here[/url]

    Tom

  • Thanks again for the quick response 🙂

    It almost does what I need, except for ONE thing: currently your code also deletes this row from prospectsincampaigns:

    135501048

    But that shouldnt be deleted since the result is already>0 (namely 8)

    And yes, some good advice was already given by you and some other posters here. I have already talked to my client about a restructuring of the DB...but its a time and money issue 🙂

  • petervdkerk (7/30/2011)


    Thanks again for the quick response 🙂

    It almost does what I need, except for ONE thing: currently your code also deletes this row from prospectsincampaigns:

    135501048

    But that shouldnt be deleted since the result is already>0 (namely 8)

    Careless of me. I fixed that bug in my test setup but not in the version I posted. It's just a missing "and result = 0" in the dedup code. Here's the correct version (I hope).

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    Tom

  • Cool!! 😀

    One more thing: Im now also trying to narrow it all down to a single campaign (4 in this case), so I added " and c.campaignid=4" on several lines, but that doesnt seem to work...what should I add to achieve that?

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid and c.campaignid=4

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid and c.campaignid=4)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID and c.campaignid=4

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

  • You just need to add a condition to the WHERE Clause on the DELETE from the ProspectsInCampaigns.

    After you do that and remove the duplicates you should create a Primary Key on the ProspectsInCampaigns Table and add Foreign Key Constraints

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Which one? :$

    I see 2 of them and I think I added the condition to both of them already...

  • petervdkerk (7/30/2011)


    Which one? :$

    I see 2 of them and I think I added the condition to both of them already...

    I guess you need "and f.campaignid = 4" in the dup delete (the very last delete statement).

    Tom

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply