Matching the existence of data between like columns

  • Hi all,

    I've been matching some incoming contacts to existing contacts in a database and need to update, insert, or rematch based on the ifs below. I believe there is an elegant way to accomplish this and would love some ideas!

    If name, phone, and email all provided for both, then use the highest Source.

    So if the Contact has a Source value of 5 and SourceContact has a ContactSource of 1, update Contact with SourceContact

    If name, phone and email all provided, and source the same then update to new values.

    If name and phone on SourceContact and name and Email on Contact then reset Contact_fk to -1 should not have matched, but should be an insert

    If name and email on SourceContact and name and phone on Contact then reset Contact_fk to -1 should not have matched, but should be an insert

    If name and phone on SourceContact and name and/or Phone is blank in Contact then update

    If name and email on SourceContact and name and/or email is blank in Contact then update

    If phone numbers can be different, just update record to SourceContact if it has a same or higher ContactSource

    If email address do not match then set SourceContacts to -1 Contact_fk it was computed incorrectly. Both have a non blank email

    If Contact_fk is 0 then it is a new contact and just insert it.

    IF OBJECT_ID('dbo.SourceContact', 'U') IS NOT NULL DROP TABLE [dbo].[SourceContact];

    CREATE TABLE [dbo].[SourceContact]

    (

    [SourceContact_pk]INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_SourceContact PRIMARY KEY CLUSTERED,

    [ContactLastName] VARCHAR(30) NOT NULL CONSTRAINT DF_SourceContact_ContactLastName DEFAULT (''),

    [ContactFirstName] VARCHAR(30) NOT NULL CONSTRAINT DF_SourceContact_ContactFirstName DEFAULT (''),

    [ContactPhone]VARCHAR(10)NOT NULL CONSTRAINT DF_SourceContact_ContactPhone DEFAULT (''),

    [ContactEmail]VARCHAR(128)NOT NULL CONSTRAINT DF_SourceContact_ContactEmail DEFAULT(''),

    [ContactSource]INTNOT NULL CONSTRAINT DF_SourceContact_ContactSource DEFAULT(0),

    [Contact_fk]INTNULL

    )

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (0, 'John', 'Snow', '1112223344', 'johnsnow@got.com', 1);

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (0, 'Mance', 'Rayder', '1112223355', 'mance.rayder@got.com', 1);

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (0, 'Arya', 'Stark', '2223334455', 'arya.stark@got.com', 1);

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (10, 'Alliser', 'Thome', '6669996669', 'forthewatch@imadeadmanwalking.com', 2); -- Contact_fk should be reset to -1 since email address are different

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (20, 'Davos', 'Sea Worthy', '2221119999', '', 1); -- blank, all whitespace or null emails are the same thing, blank

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (30, 'Elaria', 'Sand', '5558889797', 'elaria.sand@got.com', 3);

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (40, 'Oylenna', 'Tyrell', '2223331111', '', 4);

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (50, 'Samwell', 'Tarly', '', 'samwell.taryly@mib.com', 4);

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (60, 'Reek', 'Greyjoy', '9001021010', '', 1);

    INSERT INTO [dbo].[SourceContact] ([Contact_fk], [ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource]) VALUES (70, 'Ned', 'Stark', '1002003030', 'ned.stark@got.com', 3);

    IF OBJECT_ID('dbo.Contact', 'U') IS NOT NULL DROP TABLE [dbo].[Contact];

    CREATE TABLE [dbo].[Contact]

    (

    [Contact_pk]INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_Contact PRIMARY KEY CLUSTERED,

    [LastName] VARCHAR(30) NOT NULL CONSTRAINT DF_Contact_LastName DEFAULT (''),

    [FirstName] VARCHAR(30) NOT NULL CONSTRAINT DF_Contact_FirstName DEFAULT (''),

    [Phone]VARCHAR(10)NOT NULL CONSTRAINT DF_Contact_Phone DEFAULT (''),

    [Email]VARCHAR(128)NOT NULL CONSTRAINT DF_Contact_Email1 DEFAULT(''),

    [Source]INTNOT NULL CONSTRAINT DF_Contact_Source DEFAULT(0)

    );

    SET IDENTITY_INSERT [dbo].[Contact] ON;

    INSERT INTO [dbo].[Contact] ([Contact_pk], [LastName], [FirstName], [Phone], [Email], [Source]) VALUES (10, 'Alliser', 'Thome', '3334446868', 'alliser.thome@got.com', 1); -- matches 10 above, but email addresses are different and should be different contacts. Should reset Contact_fk to -1 on SourceContact

    INSERT INTO [dbo].[Contact] ([Contact_pk], [LastName], [FirstName], [Phone], [Email], [Source]) VALUES (20, 'Davos', 'Seaworthy', '2221119999', '', 4); -- matches 20 above, and has a higher source contact number so should be updated

    INSERT INTO [dbo].[Contact] ([Contact_pk], [LastName], [FirstName], [Phone], [Email], [Source]) VALUES (30, 'Elaria', 'Sand', '5558889797', 'elaria.sand@got.com', 3); -- matches 30 above, same source number so should be updated

    INSERT INTO [dbo].[Contact] ([Contact_pk], [LastName], [FirstName], [Phone], [Email], [Source]) VALUES (40, 'Oylenna', 'Tyrell', '', 'oylenna.tyrell@got.com', 2); -- has an email, while match above has phone but no email, this one should reset Contact_fk to -1 should not have matched, but should be an insert

    INSERT INTO [dbo].[Contact] ([Contact_pk], [LastName], [FirstName], [Phone], [Email], [Source]) VALUES (50, 'Samwell', 'Tarley', '9005551212', '', 1); -- matches 50 above, no phone number 50 above has an email, so should reset Contact_fk to -1 should not have matched, but should be an insert

    INSERT INTO [dbo].[Contact] ([Contact_pk], [LastName], [FirstName], [Phone], [Email], [Source]) VALUES (60, 'Theon', 'Greyjoy', '9003021515', '', 1); -- matches 60 above, and even though phone numbers are different, update.

    INSERT INTO [dbo].[Contact] ([Contact_pk], [LastName], [FirstName], [Phone], [Email], [Source]) VALUES (70, 'Edward', 'Stark', '1002003030', 'ned.stark@got.com', 1); -- matches 70 above, but has lower source number so should not be updated

    SET IDENTITY_INSERT [dbo].[Contact] OFF;

    SET IDENTITY_INSERT [dbo].[UpdateTransaction] ON;

    IF OBJECT_ID('dbo.Contact', 'U') IS NOT NULL DROP TABLE [dbo].[UpdateTransaction];

    -- I've done this as a CTE also, but I'm talking millions of contacts and server is memory constrained on good days.

    CREATE TABLE [dbo].[UpdateTransaction]

    (

    [UpdateTransactions_pk]INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_UpdateTransaction PRIMARY KEY CLUSTERED,

    [ContactLastName] VARCHAR(30) NOT NULL CONSTRAINT DF_UpdateTransaction_ContactLastName DEFAULT (''),

    [ContactFirstName] VARCHAR(30) NOT NULL CONSTRAINT DF_UpdateTransaction_ContactFirstName DEFAULT (''),

    [ContactPhone]VARCHAR(10)NOT NULL CONSTRAINT DF_UpdateTransaction_ContactPhone DEFAULT (''),

    [ContactEmail]VARCHAR(128)NOT NULL CONSTRAINT DF_UpdateTransaction_ContactEmail DEFAULT(''),

    [ContactSource]INTNOT NULL CONSTRAINT DF_UpdateTransaction_ContactSource DEFAULT(0),

    [Contact_fk]INTNULL

    )

    SET IDENTITY_INSERT [dbo].[UpdateTransaction] OFF;

    SELECT * FROM [dbo].[SourceContact];

    SELECT * FROM [dbo].[Contact];

    SELECT * FROM [dbo].[UpdateTransaction];

    --INSERT INTO [dbo].[UpdateTransaction] ([ContactLastName], [ContactFirstName], [ContactPhone], [ContactEmail], [ContactSource])

    --SELECT -- Need ideas here

    UPDATE c SET

    [LastName]= [ContactLastName]

    ,[FirstName]= [ContactFirstName]

    ,[Phone]= [ContactPhone]

    ,[Email]= [ContactEmail]

    ,[Source]= [ContactSource]

    FROM [dbo].[Contact] c

    INNER JOIN [dbo].[UpdateTransaction] ut ON ut.[Contact_fk] = c.[Contact_pk]

    WHERE [FirstName] + [LastName] + [Phone] + [Email] <> [ContactFirstName] + [ContactLastName] + [ContactPhone] + [ContactEmail];

  • Consider this idea (if not already considered). It is my experience in dealing with such data:

    In these situations, the data quality plays a major role. First objective is to establish a Unique Key per row.

    Consider the following:

    a)First & Lat Name should be in same case (unless case sensitivity is not an issue).

    b)No abbreviation used (e.g. John Smith and J. Smith)

    c)Resolve multiple contacts having same First and Last Name.

    d)Format of Email

    e)Format of phone number (with or without hypen/spaces)

    One the above conditions are met, the SQL to update/insert/delete will be a easy task.

  • amurjani (7/16/2015)


    Consider this idea (if not already considered). It is my experience in dealing with such data:

    In these situations, the data quality plays a major role. First objective is to establish a Unique Key per row.

    Consider the following:

    a)First & Lat Name should be in same case (unless case sensitivity is not an issue).

    b)No abbreviation used (e.g. John Smith and J. Smith)

    c)Resolve multiple contacts having same First and Last Name.

    d)Format of Email

    e)Format of phone number (with or without hypen/spaces)

    One the above conditions are met, the SQL to update/insert/delete will be a easy task.

    a) database setting make already make text comparisons case insensitive, so done

    b) Can't control this unfortunately. This would make everything easier.

    c) This has already been dealt with

    d) Can't control and not worried about it.

    e) Phone number are pretty clean.

    I'm past abcd and e. I've already matched the incoming contact to the contact table and have assigned a contract key to the source contact in the Contact_fk field.

    What I need help with making sure I don't update a contact if the other contact is from a less reputable source. That by itself I know how to do, but how to also say update if the same 3 fields are provided from the same source, so First and last name and phone on the SourceContact, and same three on the contact.

    What might make this clearer is that I've already matched the contacts. If a source contact only had first and last name it might have matched to a contact with first and last name and a phone number. I don't want to create a new contact for this case and already know the SourceContact should has a Contact_fk of X.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply