January 22, 2014 at 3:03 am
When I'm performing a de-duplication exercise on a mailing list I produce a table of potential matches (i.e. records that have something in common like a postcode or company name etc). Then for each matching pair I compare all their parts to each other and produce a table like the below:
As you can see I have the IDs of the matching pair, type of match and various comparison scores.
To decide on a match I have over 30 UPDATE statements similar to the below:
UPDATE PotentialMatches
SET Individual = 'Match Type 1'
WHERE Name_Score = 1
AND Premise_Score = 1
AND Address_Score = 100
UPDATE PotentialMatches
SET Individual = 'Match Type 2'
WHERE Name_Score = 1
AND MobileTelephone-Score = 1
This was hardcoded into my stored procedure which was a pain as every time I wanted to introduce a new match rule I had to update the sproc which isn't ideal.
To get around this I created a MatchRules table like the below:
The idea is to loop through the rules and have just one UPDATE statement so that in future I can just add more match rules to this table without touching the sproc. But this isn't working as expected! Sample data provided below as well as the code that's causing the problem. if anyone can suggest a fix or an alternative method to solving this problem I would really appreciate it!
CREATE TABLE dbo.MatchResults(
Master_ID INT,
Duplicate_ID INT,
MatchKeyType INT,
Name_Score INT,
OrgName_Score INT,
Premise_Score INT,
Address_Score INT,
Postcode_Score INT,
HomeTelephone_Score INT,
MobileTelephone_Score INT,
Email_Score INT,
Individual_Match TINYINT DEFAULT(0),
Site_Match TINYINT DEFAULT(0))
INSERT INTO dbo.MatchResults (
Master_ID,
Duplicate_ID,
MatchKeyType,
Name_Score,
OrgName_Score,
Premise_Score,
Address_Score,
Postcode_Score,
HomeTelephone_Score,
MobileTelephone_Score,
Email_Score)
SELECT 247355, 152880, 11, 0, 3, 0, 50, 1, 4, 5, 5 UNION ALL
SELECT 67328, 195769, 2, 2, 1, 0, 50, 0, 0, 0, 5 UNION ALL
SELECT 157281, 178807, 10, 2, 1, 1, 70, 0, 0, 0, 5 UNION ALL
SELECT 68723, 22872, 12, 1, 3, 1, 25, 0, 0, 0, 5
CREATE TABLE MatchRules(
MatchRuleID int NOT NULL,
NameScore tinyint NULL,
CompanyScore tinyint NULL,
Company2Score tinyint NULL,
PremiseScore tinyint NULL,
AddressScore tinyint NULL,
PostcodeScore tinyint NULL,
TelephoneScore tinyint NULL,
MobileScore tinyint NULL,
EmailScore tinyint NULL,
Customer varchar(50) NULL,
MatchLevel varchar(20) NULL,
IsActive bit NULL)
INSERT INTO MatchRules (MatchRuleID, NameScore, CompanyScore, Company2Score, PremiseScore, AddressScore,
PostcodeScore, TelephoneScore, MobileScore, EmailScore, Customer, MatchLevel, IsActive)
SELECT 1,1,NULL,NULL,1,100,1,NULL,NULL,NULL, 'ALL','Individual', 1 UNION ALL
SELECT 2,2,NULL,NULL,1,100,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 3,3,NULL,NULL,1,100,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 4,1,NULL,NULL,1,80,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 5,2,NULL,NULL,1,80,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 6,3,NULL,NULL,1,80,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 7,1,NULL,NULL,1,40,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 8,2,NULL,NULL,1,40,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 9,3,NULL,NULL,1,40,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 10,1,NULL,NULL,NULL,100,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 11,2,NULL,NULL,NULL,100,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 12,3,NULL,NULL,NULL,100,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 13,1,NULL,NULL,NULL,80,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 14,2,NULL,NULL,NULL,80,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 15,3,NULL,NULL,NULL,80,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 16,1,NULL,NULL,NULL,40,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 17,2,NULL,NULL,NULL,40,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 18,3,NULL,NULL,NULL,40,1,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 19,1,NULL,NULL,1,100,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 20,2,NULL,NULL,1,100,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 21,3,NULL,NULL,1,100,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 22,1,NULL,NULL,1,80,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 23,2,NULL,NULL,1,80,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 24,3,NULL,NULL,1,80,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 25,1,NULL,NULL,1,40,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 26,2,NULL,NULL,1,40,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 27,3,NULL,NULL,1,40,NULL,NULL,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 28,1,NULL,NULL,NULL,NULL,NULL,1,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 29,2,NULL,NULL,NULL,NULL,NULL,1,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 30, 3,NULL,NULL,NULL,NULL,NULL,1,NULL,NULL,'ALL','Individual',1 UNION ALL
SELECT 31, 1,NULL,NULL,NULL,NULL,NULL,NULL,1,NULL,'ALL','Individual',1 UNION ALL
SELECT 32, 2,NULL,NULL,NULL,NULL,NULL,NULL,1,NULL,'ALL','Individual',1 UNION ALL
SELECT 33, 3,NULL,NULL,NULL,NULL,NULL,NULL,1,NULL,'ALL','Individual',1 UNION ALL
SELECT 34, 1,NULL,NULL,NULL,NULL,NULL,NULL,NULL,1,'ALL','Individual',1 UNION ALL
SELECT 35, 2,NULL,NULL,NULL,NULL,NULL,NULL,NULL,1,'ALL','Individual',1 UNION ALL
SELECT 36, 3,NULL,NULL,NULL,NULL,NULL,NULL,NULL,1,'ALL','Individual',1
And this is my code:
DECLARE @MatchRuleID TINYINT
DECLARE @NameScore TINYINT
DECLARE @CompanyScore TINYINT
DECLARE @Company2Score TINYINT
DECLARE @PremiseScore TINYINT
DECLARE @AddressScore TINYINT
DECLARE @PostcodeScore TINYINT
DECLARE @TelephoneScore TINYINT
DECLARE @MobileScore TINYINT
DECLARE @EmailScore TINYINT
DECLARE MatchingRules CURSOR FOR
SELECT
MatchRuleID, NameScore, CompanyScore, Company2Score, PremiseScore, AddressScore, PostcodeScore,
TelephoneScore, MobileScore, EmailScore
FROM
dbo.MatchRules
WHERE IsActive = 1
AND MatchLevel = 'Individual'
OPEN MatchingRules
FETCH NEXT FROM MatchingRules INTO @MatchRuleID, @NameScore, @CompanyScore, @Company2Score, @PremiseScore,
@AddressScore, @PostcodeScore, @TelephoneScore, @MobileScore, @EmailScore
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE MatchResults
SET Individual_Match = @MatchRuleID
WHERE Name_Score <= ISNULL(@NameScore, 5)
AND OrgName_Score <= ISNULL(@CompanyScore, 5)
AND Premise_Score <= ISNULL(@PremiseScore, 5)
AND Address_Score >= ISNULL(@AddressScore, 0)
AND Postcode_Score <= ISNULL(@PostcodeScore, 5)
AND HomeTelephone_Score <= ISNULL(@TelephoneScore, 5)
AND MobileTelephone_Score <= ISNULL(@MobileScore, 5)
AND Email_Score <= ISNULL(@EmailScore, 5)
AND Individual_Match = 0
FETCH NEXT FROM MatchingRules INTO @MatchRuleID, @NameScore, @CompanyScore, @Company2Score, @PremiseScore,
@AddressScore, @PostcodeScore, @TelephoneScore, @MobileScore, @EmailScore
END
CLOSE MatchingRules
DEALLOCATE MatchingRules
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
January 22, 2014 at 7:23 am
;WITH result (Master_ID,MatchRuleID) AS (
SELECTmr.Master_ID,MIN(r.MatchRuleID)
FROMMatchRules r
JOIN MatchResults mr
ON mr.Name_Score <= ISNULL(r.NameScore, 5)
AND mr.OrgName_Score <= ISNULL(r.CompanyScore, 5)
AND mr.Premise_Score <= ISNULL(r.PremiseScore, 5)
AND mr.Address_Score >= ISNULL(r.AddressScore, 0)
AND mr.Postcode_Score <= ISNULL(r.PostcodeScore, 5)
AND mr.HomeTelephone_Score <= ISNULL(r.TelephoneScore, 5)
AND mr.MobileTelephone_Score <= ISNULL(r.MobileScore, 5)
AND mr.Email_Score <= ISNULL(r.EmailScore, 5)
WHEREr.IsActive = 1
ANDr.MatchLevel = 'Individual'
GROUPBY mr.Master_ID
)
UPDATEmr
SETmr.Individual_Match = result.MatchRuleID
FROMMatchResults mr
JOIN result ON result.Master_ID = mr.Master_ID
EDITED:
p.s. This is just a rewrite of the cursor.
Is there a particular problem with the matching?
Far away is close at hand in the images of elsewhere.
Anon.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply