Ranking duplicate contacts using multiple columns for ranking. Phone and emails that are blank can match to phones and emails that are not blank.

  • I'm in the process of trying to identify duplicate contacts. I doing this for millions of contacts and have gotten stuck and could use some elegant solutions!

    The business rule is this:

    Any contact that has the same name, phone and email address are the same contact

    Any contact that has the same name, and email address are the same contact

    Any contact that has the same name, email address, but different phone are a different contact.

    Any contact that has the same name, email address, and a blank phone can be the same contact as one that has the same name, email address, and has an email address

    Rank by the DataSource_fk. 1 being the highest

    Put another way:

    If 3 contacts have the same name, 2 have phone '1112223344' and all three have the email address 'johndoe@gmail.com' they are the same contact and the lowest DataSource_fk should be ranked the highest.

    I've used the Row_number over (Partition by) in the past, but am unsure how to deal with the blanks in email and phone.

    DROP TABLE [dbo].[TestBusinessContact];

    GO

    CREATE TABLE [dbo].[TestBusinessContact]

    (

    [TestBusinessContact_pk] INT IDENTITY(1,1)NOT NULL,

    [Business_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_Business_fk DEFAULT(0),

    [DataSource_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_DataSourceEditionXRef_fk DEFAULT(0),

    [FirstName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_FirstName DEFAULT(''),

    [LastName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_LastName DEFAULT(''),

    [FullName] AS [FirstName] + ' ' + [LastName],

    [NPA]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NPA DEFAULT(''),

    [NXX]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NXX DEFAULT(''),

    [NXXLine]VARCHAR(4) NOT NUll CONSTRAINT DF_TestBusinessContact_NXXLine DEFAULT(''),

    [Phone] AS (NPA + NXX + NXXLine),

    [FormattedPhone]AS ('(' + NPA + ') ' + NXX + '-' + NXXLine),

    [Email]VARCHAR(128)NOT NULL CONSTRAINT DF_TestBusinessContact_Email DEFAULT(''),

    CONSTRAINT PK_TestBusinessContact PRIMARY KEY CLUSTERED

    (

    [TestBusinessContact_pk] ASC

    )

    )

    GO

    TRUNCATE TABLE [dbo].[TestBusinessContact];

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,4 ,'', '', '818', '591', '0776', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,5 ,'', '', '818', '910', '0776', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Alan', 'Leong', '310', '641', '7700', 'alan.leong@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Allison', 'Kelley', '714', '879', '6110', 'allison.kelley@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Arturo', 'Villa', '818', '591', '0776', 'artvilla@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Brian', 'Fox', '818', '591', '0776', 'brianfox@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Chris', 'Coomey', '818', '591', '0776', 'ccoomey@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Clayton', 'Forbes', '818', '591', '0776', 'clayton.forbes@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Cynthia', 'Maher', '714', '998', '4355', 'cynthia.maher@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Dave', 'Mann', '818', '591', '0776', 'dmann@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'David', 'Schuster', '714', '879', '6110', 'daves@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Doug M', 'Briskie', '818', '591', '0776', 'dougb@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Douglas', 'Briskie', '818', '591', '0776', 'dougbriskie@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Dwight', 'Moore', '818', '591', '0776', 'dwightmoore@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Dwight', 'Moore', '978', '266', '1001', 'dwightm@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Gary', 'Freed', '818', '591', '0776', 'gary.freed@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Jack', 'Lin', '818', '591', '0776', 'jack.lin@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Larry', 'Sanks', '818', '591', '0776', 'larry.sanks@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Lori', 'Gaynor', '732', '936', '0800', 'lori.gaynor@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Louis', 'Winoski', '818', '591', '0776', 'louis.winoski@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Manny', 'Schare', '818', '591', '0776', 'manny.schare@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Mark', 'Betts', '818', '591', '0776', 'mbetts@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Martha', 'Varney', '818', '591', '0776', 'marthavarney@nts-group.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Marty', 'Lonky', '818', '591', '0776', 'marty.lonky@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Marvin', 'Hoffman', '812', '428', '9100', 'bonnie.dell@redspot.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Marvin', 'Hoffman', '818', '591', '0776', 'marvin.hoffman@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Michael', 'Rowe', '818', '591', '0776', 'michael.rowe@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Mike', 'Shook', '818', '591', '0776', 'mshook@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Osman', 'Sakr', '818', '591', '0776', 'osman.sakr@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Paul', 'Sturges', '818', '591', '0776', 'paul.sturges@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Rachel', 'Barry', '818', '591', '0776', 'rbarry@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Rachel', 'Joshi', '818', '591', '0776', 'rachel.joshi@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Raffy', 'Lorentzian', '818', '591', '0776', 'raffy.lorentzian@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Raffy', 'Lorentzian', '818', '591', '0776', 'raffylorentzian@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Richard', 'Dunne', '978', '263', '2933', 'richd@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Richard', 'Gaynor', '818', '591', '0776', 'rgaynor@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Robert', 'Lin', '818', '591', '0776', 'robert.lin@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Robert', 'Rogers', '313', '835', '0044', 'rrogers@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Siou', 'Leong', '818', '591', '0776', 'sleong@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Siou', 'Leong', '818', '591', '0776', 'sleong@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Spiro', 'Sacre', '310', '641', '7700', 'spiro.sacre@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tammy', 'Buckley', '818', '591', '0776', 'tammy_buckley@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tammy', 'Buckly', '818', '591', '0776', 'tammyb@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tim', 'Samaras', '818', '591', '0776', 'tim.samaras@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tim', 'Sturkie', '714', '879', '6110', 'tsturkie@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Todd', 'Hosier', '818', '591', '0776', 'todd.hosier@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tom', 'Tidwell', '818', '591', '0776', 'tom.tidwell@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Victor', 'Alfano', '818', '591', '0776', 'victor.alfano@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,3 ,'WILLIAM C', 'MC GINNIS', '818', '591', '0776', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,3 ,'WILLIAM C', 'MC GINNIS', '818', '591', '0776', 'Mw@Werner.Com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'William C', 'McGinnis', '818', '591', '0776', 'bill.mcginnis@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,2 ,'WILLIAM', 'MCGINNIS', '818', '591', '0776', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,1 ,'', '', '', '', '', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,5 ,'', '', '234', '665', '5221', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Elizabeth', 'Weinhold', '330', '666', '1183', 'eweinhold@vased.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Heidi', 'Busse', '330', '666', '1183', 'hbusse@stedwardschurch.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,2 ,'JOHN', 'HENNELLY', '', '', '', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'John', 'Hennelly', '330', '666', '1183', 'jhennelly@stedwardschurch.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,3 ,'JOHN', 'HENNELLY', '330', '668', '2828', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,3 ,'JOHN', 'HENNELLY', '330', '668', '2828', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,3 ,'JOHN', 'HENNELLY', '330', '668', '2828', 'Johnh@Vased.Org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Judy', 'Foster', '330', '666', '1183', 'jfoster@stedwardschurch.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Kathy', 'Franey', '330', '666', '1183', 'kathyf@vased.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Mike', 'Valentine', '330', '666', '1183', 'mike.valentine@vased.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Robert', 'Norman', '330', '666', '1183', 'rnorman@vased.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Stephanie', 'Dusz', '330', '666', '1183', 'sdusz@vased.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Thomas', 'Brown', '330', '666', '1183', 'tbrown@vased.org');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,1 ,'', '', '', '', '', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,5 ,'', '', '661', '598', '8184', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Jason', 'Youmazzo', '818', '591', '0776', 'jason.youmazzo@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,3 ,'JOHN', 'CZAJKOWSKI', '661', '259', '8185', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Kathy', 'Kogen', '818', '591', '0776', 'kathy.kogen@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Pat', 'Leblanc', '661', '259', '8184', 'patl@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,2 ,'RICK', 'REYES', '661', '259', '8184', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Willie', 'Seebert', '818', '591', '0776', 'willie.seebert@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,5 ,'', '', '714', '796', '6110', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Allan', 'Lario', '714', '879', '6110', 'allan.lario@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Betty', 'Matteson', '714', '879', '6110', 'bettym@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,3 ,'HECTOR', 'PAEZ', '323', '583', '1227', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,3 ,'HECTOR', 'PAEZ', '323', '583', '1227', 'Kpatel@Wika.Com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,3 ,'HECTOR', 'PAEZ', '323', '583', '1227', 'Kpatel@Wika.Com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Hector', 'Paez', '714', '879', '6110', 'hpaez@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,2 ,'HECTOR', 'PAEZ', '714', '998', '4351', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'John', 'Smith', '714', '879', '6110', 'john.smith@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Timothy', 'Sturkie', '818', '591', '0776', 'timothy.sturkie@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,5 ,'', '', '310', '417', '7700', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,3 ,'ANDREA', 'BRANCH', '310', '641', '7700', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,7 ,'Arsineh', 'Davidian', '818', '591', '0776', 'arsineh.davidian@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,2 ,'OSMAN', 'SAKR', '310', '641', '7700', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,5 ,'', '', '310', '417', '7700', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Alan', 'Leong', '818', '591', '0776', 'aleong@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Art', 'Villa', '818', '591', '0776', 'art@nts.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Brian', 'Adam', '818', '591', '0776', 'brian.adam@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Marty', 'Lonky', '818', '591', '0776', 'mlonky@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Richard', 'Short', '310', '348', '0900', 'rshort@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,2 ,'TOM', 'RUSSELL', '310', '348', '0900', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,3 ,'TOM', 'RUSSELL', '310', '348', '0900', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Tom', 'Russell', '818', '591', '0776', 'trussell@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (239366326,7 ,'Louis', 'Winoski', '818', '591', '0776', 'lou.winoski@ntscorp.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (264974906,5 ,'', '', '818', '910', '0776', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (264974906,2 ,'JACK', 'LIN', '818', '591', '0776', '');

    GO

  • It seems the second rule listed serves no purpose, if I understand this correctly.

    The case with the same name and email, but different phone is handled by the third.

    The case with the same name and email, and the same phone is handled by the first.

    The case with the same name and email, but a blank phone is handled by the fourth.

    I'm not sure what work would be left for the second rule to do (and it seems actually contradicted by the third).

    I might be misunderstanding something here, but I figured I'd lay that out to see what that thing I'm missing is 🙂

    The other thing that was unclear to me is the handling of rows with the same name and email, but a blank phone. You mentioned that a contact with a blank phone can be a match to another that has the same name, email address, and has a phone (at least, I'm assuming that was supposed to be "and has a phone"). What happens if the name/email combination that has a row with a blank phone number also has rows with a couple different phone numbers? Is it arbitrary which one the blank phone number will get marked as the same contact with?

    Concretely, if you have 3 contacts, all with the same name and email, but with the phone numbers '1111111111','2222222222', and blank, how should those be treated?

    All of the rules provided also apply only when the name and email are the same. How are other combinations handled (say, same name and same phone, but one has a blank email)? Are any rows that don't share name and email automatically separate contacts?

    Also, since you provided some sample data, what would the desired result be with that data?

    I know that's a lot of questions, but before I try to propose a solution, it would be good for me to have a crystal-clear idea of what's desired.

    Cheers!

  • Jacob Wilkins (6/8/2015)


    It seems the second rule listed serves no purpose, if I understand this correctly.

    I might be misunderstanding something here, but I figured I'd lay that out to see what that thing I'm missing is 🙂

    I'm having a hard time writing up the business rules to it myself. I'm hoping the example below help.

    The other thing that was unclear to me is the handling of rows with the same name and email, but a blank phone. You mentioned that a contact with a blank phone can be a match to another that has the same name, email address, and has a phone (at least, I'm assuming that was supposed to be "and has a phone"). What happens if the name/email combination that has a row with a blank phone number also has rows with a couple different phone numbers? Is it arbitrary which one the blank phone number will get marked as the same contact with?

    Yes it would be arbitrary.

    key, Name, Phone, email, datasource

    1, John Snow, 1111111111, '', 1

    2, John Snow, 1111111111, 'johnsnow@got.com', 2

    3, John Snow, 1111111111, 'johnsnow@dayne.com', 3

    Contacts 2 and 3 would be retained as not duplicate, 1 should point to 2 or 3, it doesn't matter.

    Concretely, if you have 3 contacts, all with the same name and email, but with the phone numbers '1111111111','2222222222', and blank, how should those be treated?

    Should result in 2 contacts, it doesn't matter which contact the one that has only name and email matches to, but in the end I should know the key of the one chosen.

    All of the rules provided also apply only when the name and email are the same. How are other combinations handled (say, same name and same phone, but one has a blank email)? Are any rows that don't share name and email automatically separate contacts?

    This is how I'd like it matched for this contacts listed here.

    key, Name, Phone, email, datasource

    1, John Snow, 1111111111, '', 1

    2, John Snow, 1111111111, 'johnsnow@got.com', 2

    3, John Snow, 2222222222, 'johnsnow@got.com', 3

    In the above example 2 and 3 are retained as non duplicate contacts, 1 is part of 2.

    key, Name, Phone, email, datasource

    1, John Snow, 1111111111, '', 1

    2, John Snow, 1111111111, 'johnsnow@got.com', 2

    3, John Snow, , 'johnsnow@got.com', 3

    In this example 2 is the main contact and 1 and 3 are removed, I should be able to tell before removing that 1 and 3 point to 2 however.

    key, Name, Phone, email, datasource

    1, John Snow, 1111111111, '', 1

    2, John Snow, 1111111111, '', 2

    3, John Snow, 2222222222, '', 3

    2 and 3 are retained, 1 points to 2

    key, Name, Phone, email, datasource

    1, John Snow, 1111111111, '', 1

    2, John Snow, 1111111111, 'johnsnow@got.com', 2

    3, John Snow, 2222222222, 'johnsnow@got.com', 3

    4, John Dayne, 2222222222, 'johnsnow@got.com', 4

    2, 3 and 4 are retained, 1 points to 2

    key, Name, Phone, email, datasource

    4, John Dayne, 2222222222, 'johnsnow@got.com', 4

    5, John Dayne, 2222222222, 'johnsnow@got.com', 5

    4 would be retained because the datasource key is lower.

    After writing up my question last night I wrote up my approach which should help:

    Pass 1 ignore everything with a blank phone or email, start a temp table with all the best contact that has name, phone and email. mark them in the work table.

    Pass 2 mark ones with a blank phone or email that have a name already marked and either phone or email match, these are duplicates. Nothing added to main temp table. In a work table, mark the contacts just found as duplicates.

    Pass 3 add names not included that have an email address to the temp table, mark them in the work table

    Pass 4 add names not included that have a phone number to the temp table, mark them in the work table.

    Pass 5 add names not included that do not have a phone or email address.

  • @jeff, please can you amend your sample data set so that it contains each of the conditions you describe? Thanks. Then run this and have a look at the results. You can expand the CASE to include several conditions, then use the value constructed in the CASE in something else e.g. RANK() or ROW_NUMBER().

    /*

    TRUNCATE TABLE #TestBusinessContact;

    INSERT INTO #TestBusinessContact

    ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email])

    VALUES

    (136,4 ,'', '', '818', '591', '0776', ''),

    (136,5 ,'', '', '818', '910', '0776', ''),

    .

    .

    .

    (264974906,5 ,'', '', '818', '910', '0776', ''),

    (264974906,2 ,'JACK', 'LIN', '818', '591', '0776', '')

    */

    SELECT x.*, a.*, '#' '#', b.*

    FROM #TestBusinessContact a

    inner JOIN #TestBusinessContact b

    ON b.TestBusinessContact_pk > a.TestBusinessContact_pk

    AND b.FullName = a.FullName

    AND b.Email = a.Email

    CROSS APPLY (

    SELECT MatchType = CASE

    WHEN b.Phone = a.Phone THEN 1

    -- WHEN ...

    -- WHEN ...

    ELSE 0 END

    ) x

    WHERE a.Email <> ''

    ORDER BY a.TestBusinessContact_pk --a.FullName

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (6/9/2015)


    @Jeff, please can you amend your sample data set so that it contains each of the conditions you describe? Thanks. Then run this and have a look at the results. You can expand the CASE to include several conditions, then use the value constructed in the CASE in something else e.g. RANK() or ROW_NUMBER().

    I've attached some more samples and have one other rule that I'm to familiar with the data and forgot to enumerate. That rule is you can have duplicate contact in a different business. So John Snow can be in business 1, 2, 3, etc. I'm looking for unique contacts by business. I took the cross apply and muddled with it a bit. I'm not sure if it will work for me, but it is fast, so if it can be updated to suite my needs....

    Here is what I have so far:

    SELECT x.*, a.*, '#' AS '#', b.*

    FROM [dbo].[TestBusinessContact] a

    INNER JOIN [dbo].[TestBusinessContact] b

    ON b.TestBusinessContact_pk > a.TestBusinessContact_pk

    AND b.FullName = a.FullName

    AND b.[Business_fk] = a.[Business_fk]

    CROSS APPLY (

    SELECT MatchType = CASE

    WHEN a.[Phone] = b.[Phone] AND a.[Email] = b.[Email] THEN 1

    WHEN a.[Email] = b.[Email] THEN 2

    WHEN a.[Phone] = b.[Phone] THEN 3

    ELSE 0 END

    ) x

    ORDER BY a.TestBusinessContact_pk

    For Business 1 I'd like to see just TestBusinessContact_pk of 2 and 3 listed in the a.* results, never showing 1 since it would be dropped as a duplicate.

    For Business 2 TestBusinessContact_pk of 6 is not listed. The results should show TestBusinessContact_pk as being unique.

    Here are the updated test cases:

    TRUNCATE TABLE [dbo].[TestBusinessContact];

    GO

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,3 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@dayne.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,3 ,'John', 'Snow', '222', '222', '222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,3 ,'John', 'Snow', '', '', '', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,2 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,3 ,'John', 'Snow', '222', '222', '2222', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,3 ,'John', 'Snow', '222', '222', '2222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,4 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,1 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,2 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');

    GO

  • jeff.born (6/9/2015)


    ChrisM@Work (6/9/2015)


    @Jeff, please can you amend your sample data set so that it contains each of the conditions you describe? Thanks. Then run this and have a look at the results. You can expand the CASE to include several conditions, then use the value constructed in the CASE in something else e.g. RANK() or ROW_NUMBER().

    I've attached some more samples and have one other rule that I'm to familiar with the data and forgot to enumerate. That rule is you can have duplicate contact in a different business. So John Snow can be in business 1, 2, 3, etc. I'm looking for unique contacts by business. I took the cross apply and muddled with it a bit. I'm not sure if it will work for me, but it is fast, so if it can be updated to suite my needs....

    Here is what I have so far:

    SELECT x.*, a.*, '#' AS '#', b.*

    FROM [dbo].[TestBusinessContact] a

    INNER JOIN [dbo].[TestBusinessContact] b

    ON b.TestBusinessContact_pk > a.TestBusinessContact_pk

    AND b.FullName = a.FullName

    AND b.[Business_fk] = a.[Business_fk]

    CROSS APPLY (

    SELECT MatchType = CASE

    WHEN a.[Phone] = b.[Phone] AND a.[Email] = b.[Email] THEN 1

    WHEN a.[Email] = b.[Email] THEN 2

    WHEN a.[Phone] = b.[Phone] THEN 3

    ELSE 0 END

    ) x

    ORDER BY a.TestBusinessContact_pk

    For Business 1 I'd like to see just TestBusinessContact_pk of 2 and 3 listed in the a.* results, never showing 1 since it would be dropped as a duplicate.

    For Business 2 TestBusinessContact_pk of 6 is not listed. The results should show TestBusinessContact_pk as being unique.

    Here are the updated test cases:

    TRUNCATE TABLE [dbo].[TestBusinessContact];

    GO

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,3 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@dayne.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,3 ,'John', 'Snow', '222', '222', '222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,3 ,'John', 'Snow', '', '', '', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,2 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,3 ,'John', 'Snow', '222', '222', '2222', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,1 ,'John', 'Snow', '111', '111', '1111', '');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,3 ,'John', 'Snow', '222', '222', '2222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,4 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,1 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');

    INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,2 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');

    GO

    With this new requirement you might be better off changing tack and trying a different query altogether:

    IF OBJECT_ID('Tempdb..#TestBusinessContact') IS NOT NULL DROP TABLE #TestBusinessContact;

    CREATE TABLE #TestBusinessContact

    (

    [TestBusinessContact_pk] INT IDENTITY(1,1)NOT NULL,

    [Business_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_Business_fk DEFAULT(0),

    [DataSource_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_DataSourceEditionXRef_fk DEFAULT(0),

    [FirstName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_FirstName DEFAULT(''),

    [LastName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_LastName DEFAULT(''),

    [FullName] AS [FirstName] + ' ' + [LastName],

    [NPA]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NPA DEFAULT(''),

    [NXX]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NXX DEFAULT(''),

    [NXXLine]VARCHAR(4) NOT NUll CONSTRAINT DF_TestBusinessContact_NXXLine DEFAULT(''),

    [Phone] AS (NPA + NXX + NXXLine),

    [FormattedPhone]AS ('(' + NPA + ')' + ' ' + NXX + '-' + NXXLine),

    [Email]VARCHAR(128)NOT NULL CONSTRAINT DF_TestBusinessContact_Email DEFAULT(''),

    CONSTRAINT PK_TestBusinessContact PRIMARY KEY CLUSTERED

    ([TestBusinessContact_pk] ASC)

    );

    INSERT INTO #TestBusinessContact

    ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email])

    VALUES

    (1,1 ,'John', 'Snow', '111', '111', '1111', ''),

    (1,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),

    (1,3 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@dayne.com'),

    (2,1 ,'John', 'Snow', '111', '111', '1111', ''),

    (2,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),

    (2,3 ,'John', 'Snow', '222', '222', '222', 'johnsnow@got.com'),

    (3,1 ,'John', 'Snow', '111', '111', '1111', ''),

    (3,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),

    (3,3 ,'John', 'Snow', '', '', '', 'johnsnow@got.com'),

    (4,1 ,'John', 'Snow', '111', '111', '1111', ''),

    (4,2 ,'John', 'Snow', '111', '111', '1111', ''),

    (4,3 ,'John', 'Snow', '222', '222', '2222', ''),

    (5,1 ,'John', 'Snow', '111', '111', '1111', ''),

    (5,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),

    (5,3 ,'John', 'Snow', '222', '222', '2222', 'johnsnow@got.com'),

    (5,4 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com'),

    (6,1 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com'),

    (6,2 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');

    UPDATE STATISTICS #TestBusinessContact WITH FULLSCAN

    SELECT *

    FROM #TestBusinessContact a

    CROSS APPLY ( -- Match on phone and email address

    SELECT MatchParent1 = MIN(b1.TestBusinessContact_pk)

    FROM #TestBusinessContact b1

    WHERE b1.FullName = a.FullName

    AND b1.Business_fk = a.Business_fk

    AND b1.Phone = a.Phone

    AND b1.TestBusinessContact_pk < a.TestBusinessContact_pk

    AND b1.Email = a.Email AND b1.Email <> ''

    ) m1

    CROSS APPLY ( -- Otherwise, try to match on phone alone

    SELECT MatchParent2 = MIN(b2.TestBusinessContact_pk)

    FROM #TestBusinessContact b2

    WHERE b2.FullName = a.FullName

    AND b2.Business_fk = a.Business_fk

    AND b2.Phone = a.Phone

    AND b2.TestBusinessContact_pk <> a.TestBusinessContact_pk

    AND m1.MatchParent1 IS NULL

    AND NOT (b2.Email = '' AND a.Email > '')

    AND NOT (b2.Email > '' AND a.Email > '' AND b2.TestBusinessContact_pk > a.TestBusinessContact_pk)

    AND NOT (b2.Email = '' AND a.Email = '' AND b2.TestBusinessContact_pk > a.TestBusinessContact_pk)

    ) m2

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • SSCertifiable,

    I've been playing with the cross apply and trying to get it to work with a few other requirements. Each business group I should easily be able to tell that a contact is the anchor contact and should not be deleted. In business group 1 the query you gave me has TestBusinessContact_pk of 1 and 3 matching to 2. I'd like to see that 1's parent is 2 because they share the same name and phone and 2 has the lower DataSource_fk. contacts 2 and 3 should be listed as keepers since they have different emails.

  • jeff.born (6/11/2015)


    SSCertifiable,

    I've been playing with the cross apply and trying to get it to work with a few other requirements. Each business group I should easily be able to tell that a contact is the anchor contact and should not be deleted. In business group 1 the query you gave me has TestBusinessContact_pk of 1 and 3 matching to 2. I'd like to see that 1's parent is 2 because they share the same name and phone and 2 has the lower DataSource_fk. contacts 2 and 3 should be listed as keepers since they have different emails.

    Have a play with this operator:

    AND b1.TestBusinessContact_pk < a.TestBusinessContact_pk

    Try > and <> instead of <.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply