August 2, 2007 at 3:04 am
Hi all - I have a tricky problem that I'm trying to solve, and not really getting anywhere. I'm hoping someone might have some ideas.
Basically we receive data from a call centre that is quite messy and completely unnormalised (as I'm sure is usual) and I'm trying to make some use out of it. Here is a simplified form of the data:
ID CustID PhoneID
1 1 10
2 2 10
3 3 11
4 1 12
5 2 12
6 4 13
7 5 11
8 5 15
9 5 18
Some Customers are given multiple CustIDs, and multiple PhoneIDs, so I'm trying to collapse the data on both these fields to identify individual customers.
In the above example, row IDs (1, 2, 4, 5), (3, 7, 8, 9) and (6) would constitute 3 individual customers after grouping by both CustID and PhoneID
Now my question is does anyone know a way to do this with a sql script? I've tried doing an inner join on CustID OR PhoneID, but that obviously doesn't work. I can't think of any other apporaches other than going through it line by line, which I'd rather avoid given that the table has nearly a million records.
Any help or pointers or even just thoughts would be much appreciated, thanks.
Cheers,
Matt
August 2, 2007 at 8:00 am
There must be a smart way of doing this but I just cannot think of it
I tried this
SELECT A,MIN(B) AS FROM (
SELECT a.[ID] AS [A],b.[ID] AS
FROM
a
INNER JOIN
b on b.[CustID] = a.[CustID]
UNION
SELECT a.[ID],b.[ID]
FROM
a
INNER JOIN
b on b.[PhoneID] = a.[PhoneID]
UNION
SELECT a.[ID],c.[ID]
FROM
a
INNER JOIN
b on b.[CustID] = a.[CustID]
INNER JOIN
c on c.[PhoneID] = b.[PhoneID]
UNION
SELECT a.[ID],c.[ID]
FROM
a
INNER JOIN
b on b.[PhoneID] = a.[PhoneID]
INNER JOIN
c on c.[CustID] = b.[CustID]
) x GROUP BY A
ORDER BY B
which gave the grouping you wanted but I suspect performance will be attrocious
Far away is close at hand in the images of elsewhere.
Anon.
August 2, 2007 at 11:38 am
What is the end result you seek? A single line with counts for each customer? All the detail captured in the database, ordered by customer (= customer + phone)?
I've had to dabble in call center data, but it's been a couple of years, and I have no clue what you are trying to accomplish with the solution to your problem. Perhaps a little more context or sample output could give us a hint.
(I'm thinking nested GROUP BY in selection, if it's permissible to summarize)
August 2, 2007 at 11:11 pm
Hey guys
Thanks for your input so far. Sorry i thought I made it clearer in my initial post, but I'll explain it a bit further.
Basically in the output I want to know which customers are actually the same people. Obviously every record with CustID 1 is the same person, but we're also counting everyone with the same PhoneID as the same person too. Thus in many cases, the same customer will have multiple CustIDs, and often multiple PhoneIDs also.
So in this case, there should be 3 unique customers as follows:
ID CustID PhoneID ActualCust
1 1 10 A
2 2 10 A
3 3 11 B
4 1 12 A
5 2 12 A
6 4 13 C
7 5 11 B
8 5 15 B
9 5 18 B
I'm not to fussed about the actual output (more IDs or whatever), I just need ot be able to uniquely identify which rows belong to the same Customer.
I hope things makes things clearer. Thanks again for your help on this. Please let me know if you have any further questions.
Cheers,
Matt
August 3, 2007 at 1:36 am
August 3, 2007 at 9:40 am
One suggestion - I'd start by building a reference table that identifies custid, phoneid combinations that refer to a specific customer, then use that in a join to locate calls by customer. Basically, it's faster to look something up than code a complicated rule! 😉
Steve G.
August 3, 2007 at 11:59 am
One solution: build a table that will map customer 'A' to all the appropriate phone or cust ids. Then use that relationship coding to summarize in the way you want.
For instance, taking your initial data:
ID CustID PhoneID ActualCust
1 1 10 A
2 2 10 A
3 3 11 B
4 1 12 A
5 2 12 A
6 4 13 C
7 5 11 B
8 5 15 B
9 5 18 B
The ActualCust column has to be derived by the appropriate business rules. As I understand it, you have framed the rules as follows:
1. If phoneid = previous phoneid in same import table, then customer is same.
2. If custid = previous custid in same import table, then customer is same.
3. First customer record becomes 'anchor' for identifying customer. Whatever unique identifier is assigned to first record found is then assigned to all subsequent records for this customer.
So far, everything i've envisioned requires a (horrors) cursor. I'm still working on how to build a set-based solution.
August 17, 2007 at 7:14 am
You can create a subquery to pair the lowest number CustID to all CustID values from 2 to N where there is more than one row for any PhoneID. I called this OrigCustID. It will give you a single value to identify each person.
For the sample data it would look like this:
OrigCustID CustID
1 2
3 5
CREATE TABLE #CustomerCalls (
ID int,
CustID INT,
PhoneID int,
ActualCust char(1))
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (1,1,10,'A')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (2,2,10,'A')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (3,3,11,'B')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (4,1,12,'A')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (5,2,12,'A')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (6,4,13,'C')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (7,5,11,'B')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (8,5,15,'B')
insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (9,5,18,'B')
SELECT
c.ID, COALESCE(orig.OrigCustID,c.CustID) AS OrigCustID,
c.CustID, c.PhoneID, c.ActualCust
FROM #CustomerCalls c
LEFT OUTER JOIN -- inner table that maps 2nd to Nth CustID to OrigCustID
(SELECT t2.CustID AS OrigCustID, t1.CustID
FROM #CustomerCalls t1
INNER JOIN (SELECT MIN(CustID) AS CustID, PhoneID FROM #CustomerCalls GROUP BY PhoneID) t2 -- the lowest number CustID
ON t1.phoneID = t2.PhoneID
WHERE t1.CustID > t2.CustID -- only gives rows 2nd to Nth CustID
GROUP BY t2.CustID, t1.CustID -- only 1 row for CustID to Orig CustID mapping
) orig
ON c.custID = orig.custID
ORDER BY c.ID
-- drop table #CustomerCalls
If you don’t need to see the multiple CustID values you might use this to load a second table that just has and is grouped by OrigCustID and PhoneID. The rowcount would be reduced.
August 17, 2007 at 11:46 am
Here is another way. I'm not sure how efficient it is but it does cover your example and a few others I thought of.
Basically I'm looping through the table setting a parentid for each row. Then I'm using that parentid if I can and if not I use the id to create a new parentid. All you have to do then is count the parent ids.
CREATE
TABLE #CustomerCalls (
ID
int,
CustID
INT,
PhoneID
int,
ActualCust
char(1),
ParentId
Int)
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (1,1,10,'A')
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (2,2,10,'A')
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (3,3,11,'B')
--insert into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (4,1,12,'A')
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (5,2,12,'A')
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (6,4,13,'C')
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (7,5,11,'B')
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (8,5,15,'B')
insert
into #CustomerCalls (ID, CustID, PhoneID, ActualCust) values (9,5,18,'B')
DECLARE
@Id Int, @CustId Int, @PhoneId Int, @ParentId Int
SELECT
@Id = MIN(Id), @ParentId = MIN(Id) FROM #CustomerCalls
WHILE
@Id IS NOT NULL
BEGIN
SELECT @CustId = CustId, @PhoneId = PhoneId, @ParentId = ISNULL(ParentId, Id)
FROM #CustomerCalls WHERE Id = @id
UPDATE #CustomerCalls SET ParentId = @ParentId
FROM #CustomerCalls
WHERE (CustId = @CustId OR PhoneId = @PhoneId)
AND (@ParentId > ParentId OR ParentId IS NULL)
SELECT @id = MIN(Id) FROM #CustomerCalls WHERE Id > @Id
END
SELECT
* FROM #CustomerCalls
Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]
August 19, 2007 at 8:54 pm
DECLARE @T TABLE (ID int, CustID int, PhoneID int)
INSERT INTO @T SELECT 1, 1, 10
UNION SELECT 2, 2, 10
UNION SELECT 3, 3, 11
UNION SELECT 4, 1, 12
UNION SELECT 5, 2, 12
UNION SELECT 6, 4, 13
UNION SELECT 7, 5, 11
UNION SELECT 8, 5, 15
UNION SELECT 9, 5, 18
DECLARE @r TABLE (NewID int, ID int, CustID int, PhoneID int)
DECLARE @I int, @C int, @P int
DECLARE C CURSOR FOR
SELECT ID, CustID, PhoneID FROM @T ORDER BY ID
OPEN C
FETCH C INTO @I, @C, @P
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO @r
SELECT IsNull((SELECT MAX(NewID) FROM @r WHERE CustID = @C OR PhoneID = @P),
(SELECT IsNull(MAX(NewID), 0) + 1 FROM @r)),
@I, @C, @P
FETCH C INTO @I, @C, @P
END
CLOSE C
DEALLOCATE C
SELECT * FROM @r
August 19, 2007 at 9:57 pm
Hi everyone. Thanks for your help. Ideally I wanted to avoid using a cursor or a loop, as it would require multiple passes anyway. Below is the script I ended up using, Unfortunately it only works through three levels of CustID/PhoneID/CustID - creating further would have increased the script size exponentially, and it should get most matches this way anyway.
If anyone has any further comments or suggestions I'd still be very grateful to hear from you, thanks.
select a.ID, a.CustID, a.PhoneID,
case when c.ID <= p.ID then c.ID else p.ID end as ParentID
from dbo.tbCustomer a
inner join
(select a.CustID, min(case when b.ID <= c.ID then b.ID else c.ID end) as ID
from dbo.tbCustomer a
inner join (select CustID, min(ID) ID from dbo.tbCustomer group by CustID) b
on a.CustID = b.CustID
inner join (select PhoneID, min(ID) ID from dbo.tbCustomer group by PhoneID) c
on a.PhoneID = c.PhoneID
group by a.CustID) as c
on a.CustID = c.CustID
inner join
(select a.PhoneID, min(case when b.ID <= c.ID then b.ID else c.ID end) as ID
from dbo.tbCustomer a
inner join (select CustID, min(ID) ID from dbo.tbCustomer group by CustID) b
on a.CustID = b.CustID
inner join (select PhoneID, min(ID) ID from dbo.tbCustomer group by PhoneID) c
on a.PhoneID = c.PhoneID
group by a.PhoneID) as p
on a.PhoneID = p.PhoneID
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply