Function with 200+ IF statements!

  • I have a lot of respect for the optimisation gurus on this forum so when one of these guys (Jeff Moden) makes a comment like in this thread http://www.sqlservercentral.com/Forums/Topic1420602-3077-2.aspx#bm1420669 then I get really worried.

    I've also found that if a function has over 200 IF statements, that it's time to go back to the drawing board. Many such "hopeless" functions frequently do have a much more effective and easy to code solution. there may also be the case where a stored procedure is more appropriate than a function

    This has been on my mind since the end of last week so now I want to put this issue to sleep by opening a new discussion to see if any of you good people can help me!

    The function in question is used to determine whether two records have the same Firstname, Middlename and Surname.

    Consider the following example of two records:

    It is clear that these two records refer to the same person but doing a self join on ContactName, Address1 and Postcode would not return these two records as duplicates because there are differences in Address1 and name.

    Another approach is to join the records only on postcode then compare the name elements to see calculate a matching score for the name part.

    SELECT

    a.contactname as master_contactname,

    b.contactname as duplicate_contactname

    FROM someTable as a

    inner join someTable as b

    on a.Postcode = b.PostCode

    WHERE a.ID < b.ID

    My Name scoring function takes 8 parameters (4 for the master record and the other 4 of the potential duplicate record)

    1) NormalisedName (Forname, Surname, Middlename)

    2) PhoneticForename

    3) PhoneticSurname

    4) PhoneticMiddlename

    Now this is where the 200 + IF statements start and the reason for them is as follows:

    IF Master_Surname = Duplicate_Surname AND Master_Firstname = Duplicate_Firstname THEN

    the ultimate outcome of the name comparison will depend on how the middle names compare BUT the middle names can have 5 different possibilities

    IF Master_Middlename = Duplicate_Middlename then EXACT match

    IF Master_Middlename is empty or Duplicate_Middlename is empty then LIKELY match

    IF Master_Middlename SOUNDS the same as Duplicate_Middlename then LIKELY match

    ...etc ...etc

    I also have to account for when Master_Surname SOUNDS like Duplicate Surname and for this I also have to look at the firstname and middlename

    Hope this makes sense?

    I can't see how else to do this without using so many IFs?

    In the end I created a SQL CLR in C# which is mega fast compared to the SQL UDF equivalent.

    Any suggestions?

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I would have used a multi step UNION, with one of the columns returned the rank value you need. That would be more optimal than the UDF. I would have to test it out to see it compare against the CLR.

  • I don't understand what you mean by a multi step UNION. Can you give a simple example?

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I whipped this up quick, and looking at it you would probably want to combine all the EXACTSinto one UNION etc... but I broke them out so you can see what I was doing.

    Something like this:

    CREATE PROCEDURE dbo.ProcName

    (@ID1 INT,

    @ID2 INT)

    AS

    BEGIN

    SELECT A.id,'EXACT'

    FROM sometable A

    CROSS APPLY sometable B

    WHERE A.ID=@id1

    AND B.id=@id2

    AND A.surname=B.surname

    AND A.firstname=B.firstname

    UNION

    SELECT A.id,'EXACT'

    FROM sometable A

    CROSS APPLY sometable B

    WHERE A.ID=@id1

    AND B.id=@id2

    AND A.middle=B.middle

    UNION

    SELECT A.id,'LIKELY'

    FROM sometable A

    CROSS APPLY sometable B

    WHERE A.ID=@id1

    AND B.id=@id2

    AND (A.middle=NULL OR A.middle =''

    OR A.middle=NULL OR A.middle ='')

    etc .....

    END

  • Thanks for this. I see what you're doing but I will have to play around with it to get it to fit with my name matching matrix which looks something like the below: (there are other ones for when the surnames SOUND the same, Surnames approximately the same, surnames sound approximately the same, surnames containment, surname sound containment etc....

    <lastnames match="equal">

    <firstnames match="equal">

    <middlenames match="equal">sure</middlenames>

    <middlenames match="both_empty">sure</middlenames>

    <middlenames match="one_empty">sure</middlenames>

    <middlenames match="approx">likely</middlenames>

    <middlenames match="contains">likely</middlenames>

    <middlenames match="unequal">possible</middlenames>

    </firstnames>

    <firstnames match="sounds_equal">

    <middlenames match="equal">sure</middlenames>

    <middlenames match="both_empty">likely</middlenames>

    <middlenames match="one_empty">likely</middlenames>

    <middlenames match="approx">possible</middlenames>

    <middlenames match="contains">possible</middlenames>

    <middlenames match="unequal">zero</middlenames>

    </firstnames>

    <firstnames match="both_empty">

    <middlenames match="equal">zero</middlenames>

    <middlenames match="both_empty">likely</middlenames>

    <middlenames match="one_empty">zero</middlenames>

    <middlenames match="approx">zero</middlenames>

    <middlenames match="contains">zero</middlenames>

    <middlenames match="unequal">zero</middlenames>

    </firstnames>

    <firstnames match="one_empty">

    <middlenames match="equal">zero</middlenames>

    <middlenames match="both_empty">likely</middlenames>

    <middlenames match="one_empty">likely</middlenames>

    <middlenames match="approx">zero</middlenames>

    <middlenames match="contains">likely</middlenames>

    <middlenames match="unequal">zero</middlenames>

    </firstnames>

    <firstnames match="approx">

    <middlenames match="equal">likely</middlenames>

    <middlenames match="both_empty">likely</middlenames>

    <middlenames match="one_empty">likely</middlenames>

    <middlenames match="approx">possible</middlenames>

    <middlenames match="contains">possible</middlenames>

    <middlenames match="unequal">zero</middlenames>

    </firstnames>

    <firstnames match="sounds_approx">

    <middlenames match="equal">likely</middlenames>

    <middlenames match="both_empty">possible</middlenames>

    <middlenames match="one_empty">possible</middlenames>

    <middlenames match="approx">zero</middlenames>

    <middlenames match="contains">possible</middlenames>

    <middlenames match="unequal">zero</middlenames>

    </firstnames>

    <firstnames match="contains">

    <middlenames match="equal">likely</middlenames>

    <middlenames match="both_empty">likely</middlenames>

    <middlenames match="one_empty">likely</middlenames>

    <middlenames match="approx">possible</middlenames>

    <middlenames match="contains">possible</middlenames>

    <middlenames match="unequal">zero</middlenames>

    </firstnames>

    <firstnames match="unequal">

    <middlenames match="equal">zero</middlenames>

    <middlenames match="both_empty">zero</middlenames>

    <middlenames match="one_empty">zero</middlenames>

    <middlenames match="approx">zero</middlenames>

    <middlenames match="contains">possible</middlenames>

    <middlenames match="unequal">zero</middlenames>

    </firstnames>

    </lastnames>

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply