June 8, 2015 at 3:06 pm
I'm in the process of trying to identify duplicate contacts. I doing this for millions of contacts and have gotten stuck and could use some elegant solutions!
The business rule is this:
Any contact that has the same name, phone and email address are the same contact
Any contact that has the same name, and email address are the same contact
Any contact that has the same name, email address, but different phone are a different contact.
Any contact that has the same name, email address, and a blank phone can be the same contact as one that has the same name, email address, and has an email address
Rank by the DataSource_fk. 1 being the highest
Put another way:
If 3 contacts have the same name, 2 have phone '1112223344' and all three have the email address 'johndoe@gmail.com' they are the same contact and the lowest DataSource_fk should be ranked the highest.
I've used the Row_number over (Partition by) in the past, but am unsure how to deal with the blanks in email and phone.
DROP TABLE [dbo].[TestBusinessContact];
GO
CREATE TABLE [dbo].[TestBusinessContact]
(
[TestBusinessContact_pk] INT IDENTITY(1,1)NOT NULL,
[Business_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_Business_fk DEFAULT(0),
[DataSource_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_DataSourceEditionXRef_fk DEFAULT(0),
[FirstName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_FirstName DEFAULT(''),
[LastName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_LastName DEFAULT(''),
[FullName] AS [FirstName] + ' ' + [LastName],
[NPA]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NPA DEFAULT(''),
[NXX]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NXX DEFAULT(''),
[NXXLine]VARCHAR(4) NOT NUll CONSTRAINT DF_TestBusinessContact_NXXLine DEFAULT(''),
[Phone] AS (NPA + NXX + NXXLine),
[FormattedPhone]AS ('(' + NPA + ') ' + NXX + '-' + NXXLine),
[Email]VARCHAR(128)NOT NULL CONSTRAINT DF_TestBusinessContact_Email DEFAULT(''),
CONSTRAINT PK_TestBusinessContact PRIMARY KEY CLUSTERED
(
[TestBusinessContact_pk] ASC
)
)
GO
TRUNCATE TABLE [dbo].[TestBusinessContact];
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,4 ,'', '', '818', '591', '0776', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,5 ,'', '', '818', '910', '0776', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Alan', 'Leong', '310', '641', '7700', 'alan.leong@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Allison', 'Kelley', '714', '879', '6110', 'allison.kelley@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Arturo', 'Villa', '818', '591', '0776', 'artvilla@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Brian', 'Fox', '818', '591', '0776', 'brianfox@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Chris', 'Coomey', '818', '591', '0776', 'ccoomey@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Clayton', 'Forbes', '818', '591', '0776', 'clayton.forbes@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Cynthia', 'Maher', '714', '998', '4355', 'cynthia.maher@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Dave', 'Mann', '818', '591', '0776', 'dmann@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'David', 'Schuster', '714', '879', '6110', 'daves@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Doug M', 'Briskie', '818', '591', '0776', 'dougb@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Douglas', 'Briskie', '818', '591', '0776', 'dougbriskie@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Dwight', 'Moore', '818', '591', '0776', 'dwightmoore@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Dwight', 'Moore', '978', '266', '1001', 'dwightm@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Gary', 'Freed', '818', '591', '0776', 'gary.freed@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Jack', 'Lin', '818', '591', '0776', 'jack.lin@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Larry', 'Sanks', '818', '591', '0776', 'larry.sanks@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Lori', 'Gaynor', '732', '936', '0800', 'lori.gaynor@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Louis', 'Winoski', '818', '591', '0776', 'louis.winoski@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Manny', 'Schare', '818', '591', '0776', 'manny.schare@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Mark', 'Betts', '818', '591', '0776', 'mbetts@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Martha', 'Varney', '818', '591', '0776', 'marthavarney@nts-group.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Marty', 'Lonky', '818', '591', '0776', 'marty.lonky@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Marvin', 'Hoffman', '812', '428', '9100', 'bonnie.dell@redspot.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Marvin', 'Hoffman', '818', '591', '0776', 'marvin.hoffman@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Michael', 'Rowe', '818', '591', '0776', 'michael.rowe@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Mike', 'Shook', '818', '591', '0776', 'mshook@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Osman', 'Sakr', '818', '591', '0776', 'osman.sakr@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Paul', 'Sturges', '818', '591', '0776', 'paul.sturges@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Rachel', 'Barry', '818', '591', '0776', 'rbarry@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Rachel', 'Joshi', '818', '591', '0776', 'rachel.joshi@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Raffy', 'Lorentzian', '818', '591', '0776', 'raffy.lorentzian@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Raffy', 'Lorentzian', '818', '591', '0776', 'raffylorentzian@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Richard', 'Dunne', '978', '263', '2933', 'richd@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Richard', 'Gaynor', '818', '591', '0776', 'rgaynor@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Robert', 'Lin', '818', '591', '0776', 'robert.lin@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Robert', 'Rogers', '313', '835', '0044', 'rrogers@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Siou', 'Leong', '818', '591', '0776', 'sleong@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Siou', 'Leong', '818', '591', '0776', 'sleong@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Spiro', 'Sacre', '310', '641', '7700', 'spiro.sacre@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tammy', 'Buckley', '818', '591', '0776', 'tammy_buckley@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tammy', 'Buckly', '818', '591', '0776', 'tammyb@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tim', 'Samaras', '818', '591', '0776', 'tim.samaras@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tim', 'Sturkie', '714', '879', '6110', 'tsturkie@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Todd', 'Hosier', '818', '591', '0776', 'todd.hosier@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Tom', 'Tidwell', '818', '591', '0776', 'tom.tidwell@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'Victor', 'Alfano', '818', '591', '0776', 'victor.alfano@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,3 ,'WILLIAM C', 'MC GINNIS', '818', '591', '0776', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,3 ,'WILLIAM C', 'MC GINNIS', '818', '591', '0776', 'Mw@Werner.Com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,7 ,'William C', 'McGinnis', '818', '591', '0776', 'bill.mcginnis@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (136,2 ,'WILLIAM', 'MCGINNIS', '818', '591', '0776', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,1 ,'', '', '', '', '', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,5 ,'', '', '234', '665', '5221', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Elizabeth', 'Weinhold', '330', '666', '1183', 'eweinhold@vased.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Heidi', 'Busse', '330', '666', '1183', 'hbusse@stedwardschurch.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,2 ,'JOHN', 'HENNELLY', '', '', '', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'John', 'Hennelly', '330', '666', '1183', 'jhennelly@stedwardschurch.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,3 ,'JOHN', 'HENNELLY', '330', '668', '2828', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,3 ,'JOHN', 'HENNELLY', '330', '668', '2828', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,3 ,'JOHN', 'HENNELLY', '330', '668', '2828', 'Johnh@Vased.Org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Judy', 'Foster', '330', '666', '1183', 'jfoster@stedwardschurch.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Kathy', 'Franey', '330', '666', '1183', 'kathyf@vased.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Mike', 'Valentine', '330', '666', '1183', 'mike.valentine@vased.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Robert', 'Norman', '330', '666', '1183', 'rnorman@vased.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Stephanie', 'Dusz', '330', '666', '1183', 'sdusz@vased.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (70214,7 ,'Thomas', 'Brown', '330', '666', '1183', 'tbrown@vased.org');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,1 ,'', '', '', '', '', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,5 ,'', '', '661', '598', '8184', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Jason', 'Youmazzo', '818', '591', '0776', 'jason.youmazzo@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,3 ,'JOHN', 'CZAJKOWSKI', '661', '259', '8185', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Kathy', 'Kogen', '818', '591', '0776', 'kathy.kogen@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Pat', 'Leblanc', '661', '259', '8184', 'patl@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,2 ,'RICK', 'REYES', '661', '259', '8184', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (510118,7 ,'Willie', 'Seebert', '818', '591', '0776', 'willie.seebert@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,5 ,'', '', '714', '796', '6110', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Allan', 'Lario', '714', '879', '6110', 'allan.lario@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Betty', 'Matteson', '714', '879', '6110', 'bettym@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,3 ,'HECTOR', 'PAEZ', '323', '583', '1227', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,3 ,'HECTOR', 'PAEZ', '323', '583', '1227', 'Kpatel@Wika.Com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,3 ,'HECTOR', 'PAEZ', '323', '583', '1227', 'Kpatel@Wika.Com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Hector', 'Paez', '714', '879', '6110', 'hpaez@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,2 ,'HECTOR', 'PAEZ', '714', '998', '4351', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'John', 'Smith', '714', '879', '6110', 'john.smith@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6089626,7 ,'Timothy', 'Sturkie', '818', '591', '0776', 'timothy.sturkie@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,5 ,'', '', '310', '417', '7700', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,3 ,'ANDREA', 'BRANCH', '310', '641', '7700', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,7 ,'Arsineh', 'Davidian', '818', '591', '0776', 'arsineh.davidian@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6780648,2 ,'OSMAN', 'SAKR', '310', '641', '7700', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,5 ,'', '', '310', '417', '7700', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Alan', 'Leong', '818', '591', '0776', 'aleong@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Art', 'Villa', '818', '591', '0776', 'art@nts.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Brian', 'Adam', '818', '591', '0776', 'brian.adam@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Marty', 'Lonky', '818', '591', '0776', 'mlonky@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Richard', 'Short', '310', '348', '0900', 'rshort@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,2 ,'TOM', 'RUSSELL', '310', '348', '0900', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,3 ,'TOM', 'RUSSELL', '310', '348', '0900', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (10782322,7 ,'Tom', 'Russell', '818', '591', '0776', 'trussell@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (239366326,7 ,'Louis', 'Winoski', '818', '591', '0776', 'lou.winoski@ntscorp.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (264974906,5 ,'', '', '818', '910', '0776', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (264974906,2 ,'JACK', 'LIN', '818', '591', '0776', '');
GO
June 8, 2015 at 3:54 pm
It seems the second rule listed serves no purpose, if I understand this correctly.
The case with the same name and email, but different phone is handled by the third.
The case with the same name and email, and the same phone is handled by the first.
The case with the same name and email, but a blank phone is handled by the fourth.
I'm not sure what work would be left for the second rule to do (and it seems actually contradicted by the third).
I might be misunderstanding something here, but I figured I'd lay that out to see what that thing I'm missing is 🙂
The other thing that was unclear to me is the handling of rows with the same name and email, but a blank phone. You mentioned that a contact with a blank phone can be a match to another that has the same name, email address, and has a phone (at least, I'm assuming that was supposed to be "and has a phone"). What happens if the name/email combination that has a row with a blank phone number also has rows with a couple different phone numbers? Is it arbitrary which one the blank phone number will get marked as the same contact with?
Concretely, if you have 3 contacts, all with the same name and email, but with the phone numbers '1111111111','2222222222', and blank, how should those be treated?
All of the rules provided also apply only when the name and email are the same. How are other combinations handled (say, same name and same phone, but one has a blank email)? Are any rows that don't share name and email automatically separate contacts?
Also, since you provided some sample data, what would the desired result be with that data?
I know that's a lot of questions, but before I try to propose a solution, it would be good for me to have a crystal-clear idea of what's desired.
Cheers!
June 9, 2015 at 6:27 am
Jacob Wilkins (6/8/2015)
It seems the second rule listed serves no purpose, if I understand this correctly.I might be misunderstanding something here, but I figured I'd lay that out to see what that thing I'm missing is 🙂
I'm having a hard time writing up the business rules to it myself. I'm hoping the example below help.
The other thing that was unclear to me is the handling of rows with the same name and email, but a blank phone. You mentioned that a contact with a blank phone can be a match to another that has the same name, email address, and has a phone (at least, I'm assuming that was supposed to be "and has a phone"). What happens if the name/email combination that has a row with a blank phone number also has rows with a couple different phone numbers? Is it arbitrary which one the blank phone number will get marked as the same contact with?
Yes it would be arbitrary.
key, Name, Phone, email, datasource
1, John Snow, 1111111111, '', 1
2, John Snow, 1111111111, 'johnsnow@got.com', 2
3, John Snow, 1111111111, 'johnsnow@dayne.com', 3
Contacts 2 and 3 would be retained as not duplicate, 1 should point to 2 or 3, it doesn't matter.
Concretely, if you have 3 contacts, all with the same name and email, but with the phone numbers '1111111111','2222222222', and blank, how should those be treated?
Should result in 2 contacts, it doesn't matter which contact the one that has only name and email matches to, but in the end I should know the key of the one chosen.
All of the rules provided also apply only when the name and email are the same. How are other combinations handled (say, same name and same phone, but one has a blank email)? Are any rows that don't share name and email automatically separate contacts?
This is how I'd like it matched for this contacts listed here.
key, Name, Phone, email, datasource
1, John Snow, 1111111111, '', 1
2, John Snow, 1111111111, 'johnsnow@got.com', 2
3, John Snow, 2222222222, 'johnsnow@got.com', 3
In the above example 2 and 3 are retained as non duplicate contacts, 1 is part of 2.
key, Name, Phone, email, datasource
1, John Snow, 1111111111, '', 1
2, John Snow, 1111111111, 'johnsnow@got.com', 2
3, John Snow, , 'johnsnow@got.com', 3
In this example 2 is the main contact and 1 and 3 are removed, I should be able to tell before removing that 1 and 3 point to 2 however.
key, Name, Phone, email, datasource
1, John Snow, 1111111111, '', 1
2, John Snow, 1111111111, '', 2
3, John Snow, 2222222222, '', 3
2 and 3 are retained, 1 points to 2
key, Name, Phone, email, datasource
1, John Snow, 1111111111, '', 1
2, John Snow, 1111111111, 'johnsnow@got.com', 2
3, John Snow, 2222222222, 'johnsnow@got.com', 3
4, John Dayne, 2222222222, 'johnsnow@got.com', 4
2, 3 and 4 are retained, 1 points to 2
key, Name, Phone, email, datasource
4, John Dayne, 2222222222, 'johnsnow@got.com', 4
5, John Dayne, 2222222222, 'johnsnow@got.com', 5
4 would be retained because the datasource key is lower.
After writing up my question last night I wrote up my approach which should help:
Pass 1 ignore everything with a blank phone or email, start a temp table with all the best contact that has name, phone and email. mark them in the work table.
Pass 2 mark ones with a blank phone or email that have a name already marked and either phone or email match, these are duplicates. Nothing added to main temp table. In a work table, mark the contacts just found as duplicates.
Pass 3 add names not included that have an email address to the temp table, mark them in the work table
Pass 4 add names not included that have a phone number to the temp table, mark them in the work table.
Pass 5 add names not included that do not have a phone or email address.
June 9, 2015 at 7:04 am
@jeff, please can you amend your sample data set so that it contains each of the conditions you describe? Thanks. Then run this and have a look at the results. You can expand the CASE to include several conditions, then use the value constructed in the CASE in something else e.g. RANK() or ROW_NUMBER().
/*
TRUNCATE TABLE #TestBusinessContact;
INSERT INTO #TestBusinessContact
([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email])
VALUES
(136,4 ,'', '', '818', '591', '0776', ''),
(136,5 ,'', '', '818', '910', '0776', ''),
.
.
.
(264974906,5 ,'', '', '818', '910', '0776', ''),
(264974906,2 ,'JACK', 'LIN', '818', '591', '0776', '')
*/
SELECT x.*, a.*, '#' '#', b.*
FROM #TestBusinessContact a
inner JOIN #TestBusinessContact b
ON b.TestBusinessContact_pk > a.TestBusinessContact_pk
AND b.FullName = a.FullName
AND b.Email = a.Email
CROSS APPLY (
SELECT MatchType = CASE
WHEN b.Phone = a.Phone THEN 1
-- WHEN ...
-- WHEN ...
ELSE 0 END
) x
WHERE a.Email <> ''
ORDER BY a.TestBusinessContact_pk --a.FullName
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 9, 2015 at 1:49 pm
ChrisM@Work (6/9/2015)
@Jeff, please can you amend your sample data set so that it contains each of the conditions you describe? Thanks. Then run this and have a look at the results. You can expand the CASE to include several conditions, then use the value constructed in the CASE in something else e.g. RANK() or ROW_NUMBER().
I've attached some more samples and have one other rule that I'm to familiar with the data and forgot to enumerate. That rule is you can have duplicate contact in a different business. So John Snow can be in business 1, 2, 3, etc. I'm looking for unique contacts by business. I took the cross apply and muddled with it a bit. I'm not sure if it will work for me, but it is fast, so if it can be updated to suite my needs....
Here is what I have so far:
SELECT x.*, a.*, '#' AS '#', b.*
FROM [dbo].[TestBusinessContact] a
INNER JOIN [dbo].[TestBusinessContact] b
ON b.TestBusinessContact_pk > a.TestBusinessContact_pk
AND b.FullName = a.FullName
AND b.[Business_fk] = a.[Business_fk]
CROSS APPLY (
SELECT MatchType = CASE
WHEN a.[Phone] = b.[Phone] AND a.[Email] = b.[Email] THEN 1
WHEN a.[Email] = b.[Email] THEN 2
WHEN a.[Phone] = b.[Phone] THEN 3
ELSE 0 END
) x
ORDER BY a.TestBusinessContact_pk
For Business 1 I'd like to see just TestBusinessContact_pk of 2 and 3 listed in the a.* results, never showing 1 since it would be dropped as a duplicate.
For Business 2 TestBusinessContact_pk of 6 is not listed. The results should show TestBusinessContact_pk as being unique.
Here are the updated test cases:
TRUNCATE TABLE [dbo].[TestBusinessContact];
GO
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,3 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@dayne.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,3 ,'John', 'Snow', '222', '222', '222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,3 ,'John', 'Snow', '', '', '', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,2 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,3 ,'John', 'Snow', '222', '222', '2222', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,3 ,'John', 'Snow', '222', '222', '2222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,4 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,1 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,2 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');
GO
June 10, 2015 at 8:03 am
jeff.born (6/9/2015)
ChrisM@Work (6/9/2015)
@Jeff, please can you amend your sample data set so that it contains each of the conditions you describe? Thanks. Then run this and have a look at the results. You can expand the CASE to include several conditions, then use the value constructed in the CASE in something else e.g. RANK() or ROW_NUMBER().I've attached some more samples and have one other rule that I'm to familiar with the data and forgot to enumerate. That rule is you can have duplicate contact in a different business. So John Snow can be in business 1, 2, 3, etc. I'm looking for unique contacts by business. I took the cross apply and muddled with it a bit. I'm not sure if it will work for me, but it is fast, so if it can be updated to suite my needs....
Here is what I have so far:
SELECT x.*, a.*, '#' AS '#', b.*
FROM [dbo].[TestBusinessContact] a
INNER JOIN [dbo].[TestBusinessContact] b
ON b.TestBusinessContact_pk > a.TestBusinessContact_pk
AND b.FullName = a.FullName
AND b.[Business_fk] = a.[Business_fk]
CROSS APPLY (
SELECT MatchType = CASE
WHEN a.[Phone] = b.[Phone] AND a.[Email] = b.[Email] THEN 1
WHEN a.[Email] = b.[Email] THEN 2
WHEN a.[Phone] = b.[Phone] THEN 3
ELSE 0 END
) x
ORDER BY a.TestBusinessContact_pk
For Business 1 I'd like to see just TestBusinessContact_pk of 2 and 3 listed in the a.* results, never showing 1 since it would be dropped as a duplicate.
For Business 2 TestBusinessContact_pk of 6 is not listed. The results should show TestBusinessContact_pk as being unique.
Here are the updated test cases:
TRUNCATE TABLE [dbo].[TestBusinessContact];
GO
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (1,3 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@dayne.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (2,3 ,'John', 'Snow', '222', '222', '222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (3,3 ,'John', 'Snow', '', '', '', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,2 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (4,3 ,'John', 'Snow', '222', '222', '2222', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,1 ,'John', 'Snow', '111', '111', '1111', '');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,3 ,'John', 'Snow', '222', '222', '2222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (5,4 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,1 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');
INSERT INTO [dbo].[TestBusinessContact] ([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email]) VALUES (6,2 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');
GO
With this new requirement you might be better off changing tack and trying a different query altogether:
IF OBJECT_ID('Tempdb..#TestBusinessContact') IS NOT NULL DROP TABLE #TestBusinessContact;
CREATE TABLE #TestBusinessContact
(
[TestBusinessContact_pk] INT IDENTITY(1,1)NOT NULL,
[Business_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_Business_fk DEFAULT(0),
[DataSource_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_DataSourceEditionXRef_fk DEFAULT(0),
[FirstName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_FirstName DEFAULT(''),
[LastName]VARCHAR(50) NOT NULL CONSTRAINT DF_TestBusinessContact_LastName DEFAULT(''),
[FullName] AS [FirstName] + ' ' + [LastName],
[NPA]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NPA DEFAULT(''),
[NXX]VARCHAR(3) NOT NULL CONSTRAINT DF_TestBusinessContact_NXX DEFAULT(''),
[NXXLine]VARCHAR(4) NOT NUll CONSTRAINT DF_TestBusinessContact_NXXLine DEFAULT(''),
[Phone] AS (NPA + NXX + NXXLine),
[FormattedPhone]AS ('(' + NPA + ')' + ' ' + NXX + '-' + NXXLine),
[Email]VARCHAR(128)NOT NULL CONSTRAINT DF_TestBusinessContact_Email DEFAULT(''),
CONSTRAINT PK_TestBusinessContact PRIMARY KEY CLUSTERED
([TestBusinessContact_pk] ASC)
);
INSERT INTO #TestBusinessContact
([Business_fk], [DataSource_fk], [FirstName], [LastName], [NPA], [NXX], [NXXLine], [Email])
VALUES
(1,1 ,'John', 'Snow', '111', '111', '1111', ''),
(1,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),
(1,3 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@dayne.com'),
(2,1 ,'John', 'Snow', '111', '111', '1111', ''),
(2,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),
(2,3 ,'John', 'Snow', '222', '222', '222', 'johnsnow@got.com'),
(3,1 ,'John', 'Snow', '111', '111', '1111', ''),
(3,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),
(3,3 ,'John', 'Snow', '', '', '', 'johnsnow@got.com'),
(4,1 ,'John', 'Snow', '111', '111', '1111', ''),
(4,2 ,'John', 'Snow', '111', '111', '1111', ''),
(4,3 ,'John', 'Snow', '222', '222', '2222', ''),
(5,1 ,'John', 'Snow', '111', '111', '1111', ''),
(5,2 ,'John', 'Snow', '111', '111', '1111', 'johnsnow@got.com'),
(5,3 ,'John', 'Snow', '222', '222', '2222', 'johnsnow@got.com'),
(5,4 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com'),
(6,1 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com'),
(6,2 ,'John', 'Dayne', '222', '222', '2222', 'johnsnow@got.com');
UPDATE STATISTICS #TestBusinessContact WITH FULLSCAN
SELECT *
FROM #TestBusinessContact a
CROSS APPLY ( -- Match on phone and email address
SELECT MatchParent1 = MIN(b1.TestBusinessContact_pk)
FROM #TestBusinessContact b1
WHERE b1.FullName = a.FullName
AND b1.Business_fk = a.Business_fk
AND b1.Phone = a.Phone
AND b1.TestBusinessContact_pk < a.TestBusinessContact_pk
AND b1.Email = a.Email AND b1.Email <> ''
) m1
CROSS APPLY ( -- Otherwise, try to match on phone alone
SELECT MatchParent2 = MIN(b2.TestBusinessContact_pk)
FROM #TestBusinessContact b2
WHERE b2.FullName = a.FullName
AND b2.Business_fk = a.Business_fk
AND b2.Phone = a.Phone
AND b2.TestBusinessContact_pk <> a.TestBusinessContact_pk
AND m1.MatchParent1 IS NULL
AND NOT (b2.Email = '' AND a.Email > '')
AND NOT (b2.Email > '' AND a.Email > '' AND b2.TestBusinessContact_pk > a.TestBusinessContact_pk)
AND NOT (b2.Email = '' AND a.Email = '' AND b2.TestBusinessContact_pk > a.TestBusinessContact_pk)
) m2
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 11, 2015 at 9:54 am
SSCertifiable,
I've been playing with the cross apply and trying to get it to work with a few other requirements. Each business group I should easily be able to tell that a contact is the anchor contact and should not be deleted. In business group 1 the query you gave me has TestBusinessContact_pk of 1 and 3 matching to 2. I'd like to see that 1's parent is 2 because they share the same name and phone and 2 has the lower DataSource_fk. contacts 2 and 3 should be listed as keepers since they have different emails.
June 11, 2015 at 9:59 am
jeff.born (6/11/2015)
SSCertifiable,I've been playing with the cross apply and trying to get it to work with a few other requirements. Each business group I should easily be able to tell that a contact is the anchor contact and should not be deleted. In business group 1 the query you gave me has TestBusinessContact_pk of 1 and 3 matching to 2. I'd like to see that 1's parent is 2 because they share the same name and phone and 2 has the lower DataSource_fk. contacts 2 and 3 should be listed as keepers since they have different emails.
Have a play with this operator:
AND b1.TestBusinessContact_pk < a.TestBusinessContact_pk
Try > and <> instead of <.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply