February 19, 2013 at 5:13 am
I have a lot of respect for the optimisation gurus on this forum so when one of these guys (Jeff Moden) makes a comment like in this thread http://www.sqlservercentral.com/Forums/Topic1420602-3077-2.aspx#bm1420669 then I get really worried.
I've also found that if a function has over 200 IF statements, that it's time to go back to the drawing board. Many such "hopeless" functions frequently do have a much more effective and easy to code solution. there may also be the case where a stored procedure is more appropriate than a function
This has been on my mind since the end of last week so now I want to put this issue to sleep by opening a new discussion to see if any of you good people can help me!
The function in question is used to determine whether two records have the same Firstname, Middlename and Surname.
Consider the following example of two records:
It is clear that these two records refer to the same person but doing a self join on ContactName, Address1 and Postcode would not return these two records as duplicates because there are differences in Address1 and name.
Another approach is to join the records only on postcode then compare the name elements to see calculate a matching score for the name part.
SELECT
a.contactname as master_contactname,
b.contactname as duplicate_contactname
FROM someTable as a
inner join someTable as b
on a.Postcode = b.PostCode
WHERE a.ID < b.ID
My Name scoring function takes 8 parameters (4 for the master record and the other 4 of the potential duplicate record)
1) NormalisedName (Forname, Surname, Middlename)
2) PhoneticForename
3) PhoneticSurname
4) PhoneticMiddlename
Now this is where the 200 + IF statements start and the reason for them is as follows:
IF Master_Surname = Duplicate_Surname AND Master_Firstname = Duplicate_Firstname THEN
the ultimate outcome of the name comparison will depend on how the middle names compare BUT the middle names can have 5 different possibilities
IF Master_Middlename = Duplicate_Middlename then EXACT match
IF Master_Middlename is empty or Duplicate_Middlename is empty then LIKELY match
IF Master_Middlename SOUNDS the same as Duplicate_Middlename then LIKELY match
...etc ...etc
I also have to account for when Master_Surname SOUNDS like Duplicate Surname and for this I also have to look at the firstname and middlename
Hope this makes sense?
I can't see how else to do this without using so many IFs?
In the end I created a SQL CLR in C# which is mega fast compared to the SQL UDF equivalent.
Any suggestions?
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
February 19, 2013 at 5:26 am
I would have used a multi step UNION, with one of the columns returned the rank value you need. That would be more optimal than the UDF. I would have to test it out to see it compare against the CLR.
February 19, 2013 at 5:28 am
I don't understand what you mean by a multi step UNION. Can you give a simple example?
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
February 19, 2013 at 5:43 am
I whipped this up quick, and looking at it you would probably want to combine all the EXACTSinto one UNION etc... but I broke them out so you can see what I was doing.
Something like this:
CREATE PROCEDURE dbo.ProcName
(@ID1 INT,
@ID2 INT)
AS
BEGIN
SELECT A.id,'EXACT'
FROM sometable A
CROSS APPLY sometable B
WHERE A.ID=@id1
AND B.id=@id2
AND A.surname=B.surname
AND A.firstname=B.firstname
UNION
SELECT A.id,'EXACT'
FROM sometable A
CROSS APPLY sometable B
WHERE A.ID=@id1
AND B.id=@id2
AND A.middle=B.middle
UNION
SELECT A.id,'LIKELY'
FROM sometable A
CROSS APPLY sometable B
WHERE A.ID=@id1
AND B.id=@id2
AND (A.middle=NULL OR A.middle =''
OR A.middle=NULL OR A.middle ='')
etc .....
END
February 19, 2013 at 8:10 am
Thanks for this. I see what you're doing but I will have to play around with it to get it to fit with my name matching matrix which looks something like the below: (there are other ones for when the surnames SOUND the same, Surnames approximately the same, surnames sound approximately the same, surnames containment, surname sound containment etc....
<lastnames match="equal">
<firstnames match="equal">
<middlenames match="equal">sure</middlenames>
<middlenames match="both_empty">sure</middlenames>
<middlenames match="one_empty">sure</middlenames>
<middlenames match="approx">likely</middlenames>
<middlenames match="contains">likely</middlenames>
<middlenames match="unequal">possible</middlenames>
</firstnames>
<firstnames match="sounds_equal">
<middlenames match="equal">sure</middlenames>
<middlenames match="both_empty">likely</middlenames>
<middlenames match="one_empty">likely</middlenames>
<middlenames match="approx">possible</middlenames>
<middlenames match="contains">possible</middlenames>
<middlenames match="unequal">zero</middlenames>
</firstnames>
<firstnames match="both_empty">
<middlenames match="equal">zero</middlenames>
<middlenames match="both_empty">likely</middlenames>
<middlenames match="one_empty">zero</middlenames>
<middlenames match="approx">zero</middlenames>
<middlenames match="contains">zero</middlenames>
<middlenames match="unequal">zero</middlenames>
</firstnames>
<firstnames match="one_empty">
<middlenames match="equal">zero</middlenames>
<middlenames match="both_empty">likely</middlenames>
<middlenames match="one_empty">likely</middlenames>
<middlenames match="approx">zero</middlenames>
<middlenames match="contains">likely</middlenames>
<middlenames match="unequal">zero</middlenames>
</firstnames>
<firstnames match="approx">
<middlenames match="equal">likely</middlenames>
<middlenames match="both_empty">likely</middlenames>
<middlenames match="one_empty">likely</middlenames>
<middlenames match="approx">possible</middlenames>
<middlenames match="contains">possible</middlenames>
<middlenames match="unequal">zero</middlenames>
</firstnames>
<firstnames match="sounds_approx">
<middlenames match="equal">likely</middlenames>
<middlenames match="both_empty">possible</middlenames>
<middlenames match="one_empty">possible</middlenames>
<middlenames match="approx">zero</middlenames>
<middlenames match="contains">possible</middlenames>
<middlenames match="unequal">zero</middlenames>
</firstnames>
<firstnames match="contains">
<middlenames match="equal">likely</middlenames>
<middlenames match="both_empty">likely</middlenames>
<middlenames match="one_empty">likely</middlenames>
<middlenames match="approx">possible</middlenames>
<middlenames match="contains">possible</middlenames>
<middlenames match="unequal">zero</middlenames>
</firstnames>
<firstnames match="unequal">
<middlenames match="equal">zero</middlenames>
<middlenames match="both_empty">zero</middlenames>
<middlenames match="one_empty">zero</middlenames>
<middlenames match="approx">zero</middlenames>
<middlenames match="contains">possible</middlenames>
<middlenames match="unequal">zero</middlenames>
</firstnames>
</lastnames>
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply