October 9, 2012 at 1:45 am
Hello! I have a rather complex data integration challenge I'd love some input on. The scenario is a a system of record that has allowed for multiple duplicate records of the same customer. I matched duplicate records to each other in a single To-From table. My problem is to now single out one of the customer records as the 'parent' and map all of the others to it.
Note: the de-duping process allows for a customer to be mapped to multiple other customer records. Also, I need the leaf level records to be directly mapped to the highest level in the branch possible.
Lastly, I have solved this problem by looping through the data but I would like to make the solution more set-based, without cursors etc.
Here's some input data:
CREATE TABLE dbo.CustomerMatch (
FromCustomerID INT NOT NULL
, ToCustomerID INT NOT NULL
, Matches INT NOT NULL
, ToCustomerDateCreated DATETIME NOT NULL
)
INSERT dbo.CustomerMatch VALUES
(1, 2, 4, '20100101')
, (2, 1, 4, '20090101')
, (1, 3, 3, '20110101')
, (3, 1, 3, '20090101')
, (4, 2, 2, '20100101')
, (2, 4, 2, '20120101')
, (3, 2, 1, '20100101')
, (2, 3, 1, '20110101')
, (3, 4, 1, '20120101')
, (4, 3, 1, '20110101')
, (1, 4, 1, '20120101')
, (4, 1, 1, '20090101')
'Matches' is the number of attributes on which the two customer records were associated.
And here's the result I would like:
CustomerID ParentCustomerID Matches
----------- ---------------- -----------
1 1 4
2 1 4
3 1 3
4 1 2
Thanks for the help!
Chris
October 9, 2012 at 1:55 am
chris.ross 34852 (10/9/2012)
Hello! I have a rather complex data integration challenge I'd love some input on. The scenario is a a system of record that has allowed for multiple duplicate records of the same customer. I matched duplicate records to each other in a single To-From table. My problem is to now single out one of the customer records as the 'parent' and map all of the others to it.Note: the de-duping process allows for a customer to be mapped to multiple other customer records. Also, I need the leaf level records to be directly mapped to the highest level in the branch possible.
Does your deduping process use WHERE a.CustomerID <> b.CustomerID? If so, you could eliminate the dupes in the output by using WHERE a.CustomerID < b.CustomerID. This would result in an output like the following:
(1, 2, 4, '20100101')
, (1, 3, 3, '20110101')
, (4, 2, 2, '20100101')
, (3, 2, 1, '20100101')
, (3, 4, 1, '20120101')
, (1, 4, 1, '20120101')
you might want to include the FROMcustomerCreated to help you choose which row to keep.
Lastly, I have solved this problem by looping through the data but I would like to make the solution more set-based, without cursors etc.
Here's some input data:
CREATE TABLE dbo.CustomerMatch (
FromCustomerID INT NOT NULL
, ToCustomerID INT NOT NULL
, Matches INT NOT NULL
, ToCustomerDateCreated DATETIME NOT NULL
)
INSERT dbo.CustomerMatch VALUES
(1, 2, 4, '20100101')
, (2, 1, 4, '20090101')
, (1, 3, 3, '20110101')
, (3, 1, 3, '20090101')
, (4, 2, 2, '20100101')
, (2, 4, 2, '20120101')
, (3, 2, 1, '20100101')
, (2, 3, 1, '20110101')
, (3, 4, 1, '20120101')
, (4, 3, 1, '20110101')
, (1, 4, 1, '20120101')
, (4, 1, 1, '20090101')
'Matches' is the number of attributes on which the two customer records were associated.
And here's the result I would like:
CustomerID ParentCustomerID Matches
----------- ---------------- -----------
1 1 4
2 1 4
3 1 3
4 1 2
Thanks for the help!
Chris
There have been a couple of threads like this recently, I'll try to track 'em down.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply