April 26, 2024 at 2:43 am
select
Custno, Addr1, City, Res_Phone, Bus_Phone, Fax_Phone, Marine_Phone, Pager_Phone,
Other_Phone, email1, email2
from customer c
where Active='Y' -- About 1.5 Million Records
Here in this sql server table, I have customers table, custno is unique id.
we are trying to grouping customers and give one ID lets call it groupingID.
INSERT INTO customer (Custno, Addr1, City, Res_Phone, Bus_Phone, Fax_Phone, Marine_Phone, Pager_Phone, Other_Phone, email1, email2)
VALUES
(1, '123 Main St', 'New York', '123-456-7890', '987-654-3210', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'john@example.com', 'johndoe@example.com'),
(2, '456 Elm St', 'Los Angeles', '456-789-0123', '012-345-6789', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'jane@example.com', 'janedoe@example.com'),
(3, '123 Main St', 'New York', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'bob@example.com', 'bobsmith@example.com'),
(4, '321 Pine St', 'Houston', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'susan@example.com', 'susansmith@example.com'),
(5, '654 Cedar St', 'Phoenix', '901-234-5678', '890-123-4567', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'david@example.com', 'daviddoe@example.com'),
(6, '987 Maple St', 'Philadelphia', '333-444-5555', '567-890-1234', '666-777-8888', '999-000-1111', '222-333-4444', '555-666-7777', 'emily@example.com', 'emilyjones@example.com'),
(7, '210 Walnut St', 'San Antonio', '123-456-7890', '678-901-2345', '777-888-9999', '000-111-2222', '333-444-5555', '666-777-8888', 'michael@example.com', 'michaelbrown@example.com'),
(8, '543 Birch St', 'San Diego', '234-567-8901', '789-012-3456', '888-999-0000', '111-222-3333', '444-555-6666', '777-888-9999', 'amanda@example.com', 'amandasmith@example.com'),
(9, '876 Redwood St', 'Dallas', '345-678-9012', '890-123-4567', '999-000-1111', '222-333-4444', '555-666-7777', '888-999-0000', 'matthew@example.com', 'matthewwilson@example.com'),
(10, '1091 Poplar St', 'Austin', '456-789-0123', '901-234-5678', '000-111-2222', '333-444-5555', '666-777-8888', '999-000-1111', 'olivia@example.com', 'oliviamiller@example.com'),
(11, '121 Pineapple St', 'Seattle', '567-890-1234', '012-345-6789', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'daniel@example.com', 'danielthomas@example.com'),
(12, '231 Cherry St', 'Portland', '678-901-2345', '123-456-7890', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'sophia@example.com', 'sophiawilson@example.com'),
(13, '344 Oakwood St', 'Miami', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'jacob@example.com', 'jacobthomas@example.com'),
(14, '456 Maplewood St', 'Tampa', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'emma@example.com', 'emmabrown@example.com'),
(15, '567 Pine St', 'Orlando', '901-234-5678', '456-789-0123', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'ethan@example.com', 'ethandavis@example.com');
Custno 1 & 3 Adress & City match, custno3(faxphone) & custno6 (resphone) matching so these three records should match and get same groupid
custno2 record fields not match with any fields so that one will get one groupid.
custno4 & custno5 matches with res phone & business phone so these both reecords get same groupID
so any phone no filed matches with any other customers any other field consider as matched.
any email1 or email2 matched with any other customer email then all matched customer no get same groupID.
Please let me know if I need to provide any more information from myend.
Thank you in advance
April 26, 2024 at 11:26 am
Please see code in code block
CREATE TABLE #Customers1 (Custno int, Addr1 varchar(50), City varchar(50), Res_Phone varchar(15), Bus_Phone varchar(15), Fax_Phone varchar(15), Marine_Phone varchar(15), Pager_Phone varchar(15), Other_Phone varchar(15), email1 varchar(50), email2 varchar(50))
INSERT INTO #Customers1 (Custno, Addr1, City, Res_Phone, Bus_Phone, Fax_Phone, Marine_Phone, Pager_Phone, Other_Phone, email1, email2)
VALUES
(1, '123 Main St', 'New York', '123-456-7890', '987-654-3210', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'john@example.com', 'johndoe@example.com'),
(2, '456 Elm St', 'Los Angeles', '456-789-0123', '012-345-6789', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'jane@example.com', 'janedoe@example.com'),
(3, '123 Main St', 'New York', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'bob@example.com', 'bobsmith@example.com'),
(4, '321 Pine St', 'Houston', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'susan@example.com', 'susansmith@example.com'),
(5, '654 Cedar St', 'Phoenix', '901-234-5678', '890-123-4567', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'david@example.com', 'daviddoe@example.com'),
(6, '987 Maple St', 'Philadelphia', '333-444-5555', '567-890-1234', '666-777-8888', '999-000-1111', '222-333-4444', '555-666-7777', 'emily@example.com', 'emilyjones@example.com'),
(7, '210 Walnut St', 'San Antonio', '123-456-7890', '678-901-2345', '777-888-9999', '000-111-2222', '333-444-5555', '666-777-8888', 'michael@example.com', 'michaelbrown@example.com'),
(8, '543 Birch St', 'San Diego', '234-567-8901', '789-012-3456', '888-999-0000', '111-222-3333', '444-555-6666', '777-888-9999', 'amanda@example.com', 'amandasmith@example.com'),
(9, '876 Redwood St', 'Dallas', '345-678-9012', '890-123-4567', '999-000-1111', '222-333-4444', '555-666-7777', '888-999-0000', 'matthew@example.com', 'matthewwilson@example.com'),
(10, '1091 Poplar St', 'Austin', '456-789-0123', '901-234-5678', '000-111-2222', '333-444-5555', '666-777-8888', '999-000-1111', 'olivia@example.com', 'oliviamiller@example.com'),
(11, '121 Pineapple St', 'Seattle', '567-890-1234', '012-345-6789', '111-222-3333', '444-555-6666', '777-888-9999', '000-111-2222', 'daniel@example.com', 'danielthomas@example.com'),
(12, '231 Cherry St', 'Portland', '678-901-2345', '123-456-7890', '222-333-4444', '555-666-7777', '888-999-0000', '111-222-3333', 'sophia@example.com', 'sophiawilson@example.com'),
(13, '344 Oakwood St', 'Miami', '789-012-3456', '234-567-8901', '333-444-5555', '666-777-8888', '999-000-1111', '222-333-4444', 'jacob@example.com', 'jacobthomas@example.com'),
(14, '456 Maplewood St', 'Tampa', '890-123-4567', '345-678-9012', '444-555-6666', '777-888-9999', '000-111-2222', '333-444-5555', 'emma@example.com', 'emmabrown@example.com'),
(15, '567 Pine St', 'Orlando', '901-234-5678', '456-789-0123', '555-666-7777', '888-999-0000', '111-222-3333', '444-555-6666', 'ethan@example.com', 'ethandavis@example.com');
Thank you
Zond
April 26, 2024 at 12:05 pm
What ever your rules for matching are , For example address matching, you will need to join the table with itself. Something like ? >
Select t1.custNo, t2.*,
Row_number() over (partition by t1.address Order by (select 1)) as groupNum
from customers as t1
inner join customers as t2
on t2.address = t1.address
It always helps if you have test data and DDL scripts for test tables to help us help you.
----------------------------------------------------
April 27, 2024 at 9:29 am
Regarding your expected results, are you hoping for (GroupId, CustNo)?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 27, 2024 at 9:31 am
Also, are the rules you mentioned to be applied in the order they are written, such that if a customer gets assigned to group X, they cannot subsequently also be assigned to group Y? Or can the same customer be in more than one group?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 28, 2024 at 3:24 pm
Good Morning Phil, Yes I am expecting GroupID, Custno columns.
there is no order for example one customer email1 matches with another customer email2 or email 1 they all considered one group.
or any phone no matches with any phone (for example one customer res_phone matches with another customer bus_phone or fax_phone) then it is considered same group id.
Please let me know if you need any more informatimation
Thank you much in advance. Please help
April 28, 2024 at 10:48 pm
That is Correct Phil.
so when ever the phones matches (any customers any phone nos ) then assign one generic no.
if email 1 or email 2 matches with any other customer group. but for addr1 and city both has to match with other customer.
Thank you Phil
Regards
Sita
April 29, 2024 at 3:26 am
Dear Phil Good Evening
Please let me know if you need any more information, Thank you
April 29, 2024 at 7:00 am
I don't think you answered this question:
Can the same customer be in more than one group?
This is a very likely scenario if all of the rules are always applied to all of the customers.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 29, 2024 at 11:52 am
You are correct, Yes Phil that is correct. it can be possible.
i tried this a day back but not moving any direction, no luck from my end, can you please help. Thanks a ton in advance
SELECT
s.Custno,
CASE
WHEN email1 IN (SELECT email1 FROM #SUBCustomers1 WHERE Custno <> s.Custno) OR
email1 IN (SELECT email2 FROM #SUBCustomers1 WHERE Custno <> s.Custno) THEN
(SELECT TOP 1 Custno FROM #SUBCustomers1 WHERE email1 = s.email1 OR email2 = s.email1)
WHEN email2 IN (SELECT email1 FROM #SUBCustomers1 WHERE Custno <> s.Custno) OR
email2 IN (SELECT email2 FROM #SUBCustomers1 WHERE Custno <> s.Custno) THEN
(SELECT TOP 1 Custno FROM #SUBCustomers1 WHERE email1 = s.email2 OR email2 = s.email2)
ELSE CAST(s.Custno AS varchar(10))
END AS OutputColumn
FROM
#SUBCustomers1 s;
Thank you
Sita
April 29, 2024 at 12:43 pm
Your rules are too complex to do this in a single query. Instead, I suggest that you build up the groups one at a time, with one query for every different rule you have. Here is some code which does the one of the groupings (any phone number matches any other). Very similar code would match the e-mail addresses.
DROP TABLE IF EXISTS #Grouped;
CREATE TABLE #Grouped
(
GroupId INT NOT NULL
,CustomerId INT NOT NULL
,
PRIMARY KEY CLUSTERED(
GroupId
,CustomerId
)
WITH(IGNORE_DUP_KEY = ON)
);
--Rule 1 - any phone number matches
WITH AllPhones
AS (SELECT c.Custno
,Phone = c.Res_Phone
FROM #Customers1 c
UNION --Use instead of UNION ALL to make the results distinct
SELECT c.Custno
,Phone = c.Bus_Phone
FROM #Customers1 c
UNION
SELECT c.Custno
,Phone = c.Fax_Phone
FROM #Customers1 c
UNION
SELECT c.Custno
,Phone = c.Marine_Phone
FROM #Customers1 c
UNION
SELECT c.Custno
,Phone = c.Pager_Phone
FROM #Customers1 c
UNION
SELECT c.Custno
,Phone = c.Other_Phone
FROM #Customers1 c)
,matches
AS (SELECT a.Phone
,ct = COUNT (*)
FROM AllPhones a
GROUP BY a.Phone
HAVING COUNT (*) > 1)
INSERT #Grouped
(
GroupId
,CustomerId
)
SELECT GroupId = 1
,c.Custno
FROM #Customers1 c
CROSS JOIN matches m
WHERE c.Res_Phone = m.Phone
OR c.Bus_Phone = m.Phone
OR c.Fax_Phone = m.Phone
OR c.Marine_Phone = m.Phone
OR c.Pager_Phone = m.Phone
OR c.Other_Phone = m.Phone;
SELECT * FROM #Grouped g
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 29, 2024 at 1:13 pm
Thank you So much Phil. for your efforts.
I got it bit tricky, but we are ok to generate using temp tables or CTE etc. I tried the query but since my source table has 1.5 Mill records it is taking too much time (still executing 🙁 )
I think this better to do step by step first get all custno into temp table and groupid then updating each found, you know way way better you have seen in and out of sql. for me I tried with proc above step but no luck yet 🙁
if possible can you please look it once agin.
Thank you very much Phil
April 29, 2024 at 1:28 pm
I think this better to do step by step first get all custno into temp table and groupid then updating each found, you know way way better you have seen in and out of sql. for me I tried with proc above step but no luck yet
This won't work, as you don't know how many rows there will be (because a customer can be in more than one group, but you won't know how many until all of the matching queries have been run).
It may be worth you creating a cut-down version of your source table ... perhaps with just 500 or so rows ... and doing the development and testing on that.
+
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 29, 2024 at 1:53 pm
Thank you Phil, great advise.
Sure I will try that. Can you provide some code based on data provided. and No to below question. please
Can the same customer be in more than one group?
April 29, 2024 at 2:03 pm
Thank you Phil, great advise.
Sure I will try that. Can you provide some code based on data provided. and No to below question. please
Can the same customer be in more than one group?
My code was based on the data provided.
Have you changed your mind about a customer being in more than one group?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply