How to compare data in customer table with other customers to find related cust

  • select

    Custno, Addr1, City, Res_Phone, Bus_Phone, Fax_Phone, Marine_Phone, Pager_Phone,

    Other_Phone, email1, email2

    from customer c

    where Active='Y' -- About 1.5 Million Records

    Here in this sql server table, I have customers table, custno is unique id.

    we are trying to grouping customers and give one ID lets call it groupingID.

    INSERT INTO customer (Custno, Addr1, City, Res_Phone, Bus_Phone, Fax_Phone, Marine_Phone, Pager_Phone, Other_Phone, email1, email2)

    VALUES

    (1, '123 Main St', 'New York', '123-456-7890', '987-654-3210', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'john@example.com', 'johndoe@example.com'),

    (2, '456 Elm St', 'Los Angeles', '456-789-0123', '012-345-6789', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'jane@example.com', 'janedoe@example.com'),

    (3, '123 Main St', 'New York', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'bob@example.com', 'bobsmith@example.com'),

    (4, '321 Pine St', 'Houston', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'susan@example.com', 'susansmith@example.com'),

    (5, '654 Cedar St', 'Phoenix', '901-234-5678', '890-123-4567', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'david@example.com', 'daviddoe@example.com'),

    (6, '987 Maple St', 'Philadelphia', '333-444-5555', '567-890-1234', '666-777-8888', '999-000-1111', '222-333-4444', '555-666-7777', 'emily@example.com', 'emilyjones@example.com'),

    (7, '210 Walnut St', 'San Antonio', '123-456-7890', '678-901-2345', '777-888-9999', '000-111-2222', '333-444-5555', '666-777-8888', 'michael@example.com', 'michaelbrown@example.com'),

    (8, '543 Birch St', 'San Diego', '234-567-8901', '789-012-3456', '888-999-0000', '111-222-3333', '444-555-6666', '777-888-9999', 'amanda@example.com', 'amandasmith@example.com'),

    (9, '876 Redwood St', 'Dallas', '345-678-9012', '890-123-4567', '999-000-1111', '222-333-4444', '555-666-7777', '888-999-0000', 'matthew@example.com', 'matthewwilson@example.com'),

    (10, '1091 Poplar St', 'Austin', '456-789-0123', '901-234-5678', '000-111-2222', '333-444-5555', '666-777-8888', '999-000-1111', 'olivia@example.com', 'oliviamiller@example.com'),

    (11, '121 Pineapple St', 'Seattle', '567-890-1234', '012-345-6789', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'daniel@example.com', 'danielthomas@example.com'),

    (12, '231 Cherry St', 'Portland', '678-901-2345', '123-456-7890', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'sophia@example.com', 'sophiawilson@example.com'),

    (13, '344 Oakwood St', 'Miami', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'jacob@example.com', 'jacobthomas@example.com'),

    (14, '456 Maplewood St', 'Tampa', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'emma@example.com', 'emmabrown@example.com'),

    (15, '567 Pine St', 'Orlando', '901-234-5678', '456-789-0123', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'ethan@example.com', 'ethandavis@example.com');

    Custno 1 & 3 Adress & City match, custno3(faxphone) & custno6 (resphone) matching so these three records should match and get same groupid

    custno2 record fields not match with any fields so that one will get one groupid.

    custno4 & custno5 matches with res phone & business phone so these both reecords get same groupID

    so any phone no filed matches with any other customers any other field consider as matched.

    any email1 or email2 matched with any other customer email then all matched customer no get same groupID.

    Please let me know if I need to provide any more information from myend.

    Thank you in advance

     

  • Please see code in code block

    CREATE TABLE #Customers1 (Custno int, Addr1 varchar(50), City varchar(50), Res_Phone varchar(15), Bus_Phone varchar(15), Fax_Phone varchar(15), Marine_Phone varchar(15), Pager_Phone varchar(15), Other_Phone varchar(15), email1 varchar(50), email2 varchar(50))

    INSERT INTO #Customers1 (Custno, Addr1, City, Res_Phone, Bus_Phone, Fax_Phone, Marine_Phone, Pager_Phone, Other_Phone, email1, email2)
    VALUES
    (1, '123 Main St', 'New York', '123-456-7890', '987-654-3210', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'john@example.com', 'johndoe@example.com'),
    (2, '456 Elm St', 'Los Angeles', '456-789-0123', '012-345-6789', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'jane@example.com', 'janedoe@example.com'),
    (3, '123 Main St', 'New York', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'bob@example.com', 'bobsmith@example.com'),
    (4, '321 Pine St', 'Houston', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'susan@example.com', 'susansmith@example.com'),
    (5, '654 Cedar St', 'Phoenix', '901-234-5678', '890-123-4567', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'david@example.com', 'daviddoe@example.com'),
    (6, '987 Maple St', 'Philadelphia', '333-444-5555', '567-890-1234', '666-777-8888', '999-000-1111', '222-333-4444', '555-666-7777', 'emily@example.com', 'emilyjones@example.com'),
    (7, '210 Walnut St', 'San Antonio', '123-456-7890', '678-901-2345', '777-888-9999', '000-111-2222', '333-444-5555', '666-777-8888', 'michael@example.com', 'michaelbrown@example.com'),
    (8, '543 Birch St', 'San Diego', '234-567-8901', '789-012-3456', '888-999-0000', '111-222-3333', '444-555-6666', '777-888-9999', 'amanda@example.com', 'amandasmith@example.com'),
    (9, '876 Redwood St', 'Dallas', '345-678-9012', '890-123-4567', '999-000-1111', '222-333-4444', '555-666-7777', '888-999-0000', 'matthew@example.com', 'matthewwilson@example.com'),
    (10, '1091 Poplar St', 'Austin', '456-789-0123', '901-234-5678', '000-111-2222', '333-444-5555', '666-777-8888', '999-000-1111', 'olivia@example.com', 'oliviamiller@example.com'),
    (11, '121 Pineapple St', 'Seattle', '567-890-1234', '012-345-6789', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'daniel@example.com', 'danielthomas@example.com'),
    (12, '231 Cherry St', 'Portland', '678-901-2345', '123-456-7890', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'sophia@example.com', 'sophiawilson@example.com'),
    (13, '344 Oakwood St', 'Miami', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'jacob@example.com', 'jacobthomas@example.com'),
    (14, '456 Maplewood St', 'Tampa', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'emma@example.com', 'emmabrown@example.com'),
    (15, '567 Pine St', 'Orlando', '901-234-5678', '456-789-0123', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'ethan@example.com', 'ethandavis@example.com');

    Thank you

    Zond

  • What ever your rules for matching are , For example address matching, you will need to join the table with itself. Something like  ? >

    Select t1.custNo, t2.*, 
    Row_number() over (partition by t1.address Order by (select 1)) as groupNum
    from customers as t1
    inner join customers as t2
    on t2.address = t1.address

     

    It always helps if you have test data and DDL scripts for test tables to help us help you.

    ----------------------------------------------------

  • Regarding your expected results, are you hoping for (GroupId, CustNo)?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Also, are the rules you mentioned to be applied in the order they are written, such that if a customer gets assigned to group X, they cannot subsequently also be assigned to group Y? Or can the same customer be in more than one group?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Good Morning Phil, Yes I am expecting GroupID, Custno columns.

     

    there is no order for example one customer email1 matches with another customer email2 or email 1 they all considered one group.

     

    or any phone no matches with any phone (for example one customer res_phone matches with another customer bus_phone or fax_phone) then it is considered same group id.

     

    Please let me know if you need any more informatimation

     

    Thank you much in advance. Please help

  • That is Correct Phil.

     

    so when ever the phones matches (any customers any phone nos ) then assign one generic no.

    if email 1 or email 2 matches with any other customer group. but for addr1 and city both has to match with other customer.

     

    Thank you Phil

    Regards

    Sita

  • Dear Phil Good Evening

     

    Please let me know if you need any more information, Thank you

  • I don't think you answered this question:

    Can the same customer be in more than one group?

    This is a very likely scenario if all of the rules are always applied to all of the customers.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • You are correct, Yes Phil that is correct. it can be possible.

     

    i tried this a day back but not moving any direction, no luck from my end, can you please help. Thanks a ton in advance

    SELECT

    s.Custno,

    CASE

    WHEN email1 IN (SELECT email1 FROM #SUBCustomers1 WHERE Custno <> s.Custno) OR

    email1 IN (SELECT email2 FROM #SUBCustomers1 WHERE Custno <> s.Custno) THEN

    (SELECT TOP 1 Custno FROM #SUBCustomers1 WHERE email1 = s.email1 OR email2 = s.email1)

    WHEN email2 IN (SELECT email1 FROM #SUBCustomers1 WHERE Custno <> s.Custno) OR

    email2 IN (SELECT email2 FROM #SUBCustomers1 WHERE Custno <> s.Custno) THEN

    (SELECT TOP 1 Custno FROM #SUBCustomers1 WHERE email1 = s.email2 OR email2 = s.email2)

    ELSE CAST(s.Custno AS varchar(10))

    END AS OutputColumn

    FROM

    #SUBCustomers1 s;

     

     

    Thank you

    Sita

  • Your rules are too complex to do this in a single query. Instead, I suggest that you build up the groups one at a time, with one query for every different rule you have. Here is some code which does the one of the groupings (any phone number matches any other). Very similar code would match the e-mail addresses.

    DROP TABLE IF EXISTS #Grouped;

    CREATE TABLE #Grouped
    (
    GroupId INT NOT NULL
    ,CustomerId INT NOT NULL
    ,
    PRIMARY KEY CLUSTERED(
    GroupId
    ,CustomerId
    )
    WITH(IGNORE_DUP_KEY = ON)
    );

    --Rule 1 - any phone number matches
    WITH AllPhones
    AS (SELECT c.Custno
    ,Phone = c.Res_Phone
    FROM #Customers1 c
    UNION --Use instead of UNION ALL to make the results distinct
    SELECT c.Custno
    ,Phone = c.Bus_Phone
    FROM #Customers1 c
    UNION
    SELECT c.Custno
    ,Phone = c.Fax_Phone
    FROM #Customers1 c
    UNION
    SELECT c.Custno
    ,Phone = c.Marine_Phone
    FROM #Customers1 c
    UNION
    SELECT c.Custno
    ,Phone = c.Pager_Phone
    FROM #Customers1 c
    UNION
    SELECT c.Custno
    ,Phone = c.Other_Phone
    FROM #Customers1 c)
    ,matches
    AS (SELECT a.Phone
    ,ct = COUNT (*)
    FROM AllPhones a
    GROUP BY a.Phone
    HAVING COUNT (*) > 1)
    INSERT #Grouped
    (
    GroupId
    ,CustomerId
    )
    SELECT GroupId = 1
    ,c.Custno
    FROM #Customers1 c
    CROSS JOIN matches m
    WHERE c.Res_Phone = m.Phone
    OR c.Bus_Phone = m.Phone
    OR c.Fax_Phone = m.Phone
    OR c.Marine_Phone = m.Phone
    OR c.Pager_Phone = m.Phone
    OR c.Other_Phone = m.Phone;

    SELECT * FROM #Grouped g

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you So much Phil. for your efforts.

    I got it bit tricky, but we are ok to generate using temp tables or CTE etc. I tried the query but since my source table has 1.5 Mill records it is taking too much time (still executing 🙁  )

     

    I think this better to do step by step first get all custno into temp table and groupid then updating each found, you know way way better you have seen in and out of sql. for me I tried with proc above step but no luck yet 🙁

     

    if possible can you please look it once agin.

    Thank you very much  Phil

  • I think this better to do step by step first get all custno into temp table and groupid then updating each found, you know way way better you have seen in and out of sql. for me I tried with proc above step but no luck yet

    This won't work, as you don't know how many rows there will be (because a customer can be in more than one group, but you won't know how many until all of the matching queries have been run).

    It may be worth you creating a cut-down version of your source table ... perhaps with just 500 or so rows ... and doing the development and testing on that.

    +

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you Phil, great advise.

    Sure I will try that. Can you provide some code based on data provided. and No to below question. please

    Can the same customer be in more than one group?

     

     

  • Zond Sita wrote:

    Thank you Phil, great advise.

    Sure I will try that. Can you provide some code based on data provided. and No to below question. please

    Can the same customer be in more than one group?

    My code was based on the data provided.

    Have you changed your mind about a customer being in more than one group?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply