Need a function to compare strings.

  • Aside from FTE & the Soundex trick, almost every other solution that you come up with here will share the same problem: they are not SARGable, so if you apply them to a table they cannot use indexes and have to scan the whole table. That's OK if you are only going to apply it to the input stream or for bulk processing, however for OLTP use it's a real killer.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Hi Bob

    Forget my approach... I just made a performance test of both functions.

    My Function

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 3360 ms, elapsed time = 3951 ms.

    Your Function

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 1438 ms, elapsed time = 1402 ms.

    The differences of my function between yesterday and today depend on two different systems.

    I notice quiet often in last days that there are many features within SSE 2005 (or 2008) which I don't know yet. (My customer project still needs SSE 2000 compatibility... 🙁 )

    Wish you best!

    Flo

  • Thanks for testing it, Flo.

    Inline table valued functions generally perform better when processed against entire sets with CROSS APPLY. The problem is that it an ITVF has to be ONE query.

    Barry, you are exactly right. I thought I mentioned that there is an older version of the function which was used for individual transactions, but it was a real drag when trying to get long batches loaded. I got acceptable timings this morning using two queries joined by a union all. (One join on name, the other joined on number.) But, it was a pain to get there.

    I had to store the results in a temp table then cross apply the StringsClose function to both the name and numbers. Haven't nailed down why yet, but multiple cross applys together with joins and DISTINCT is perhaps a bad thing. More research is needed on my part. In any event, selecting only those records from the temp table which had both a close name and a close number, into a SECOND temp table, and finally doing a select distinct gets me the correct result set. It's not pretty, but it works.

    I'm not happy with any of the approaches immediately open to me, although "Fuzzy Lookup" is on my reading list again. Do you think full-text search could handle the transposition match requirement?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Try this, see if it'll give you what you need. Returns the number of differences between the strings. I don't have time to finalize it right now, but it should get you what you need.

    with

    names (name1,name2) as

    (select 'Hovious','Hovious' union all

    select 'xHovious','Hovious' union all

    select 'Hovious','xHovious' union all

    select 'Hoviousx','Hovious' union all

    select 'Hovious','Hoviousx' union all

    select 'Xovious','Hovious' union all

    select 'Hovious','Xovious' union all

    select 'HoviouX','HoviouX' union all

    select 'HoXious','Hovious' union all

    select 'Hovious','HoXious' union all

    select 'HoviousSS','Hovious' union all

    select 'Hovious','HoviousSS' union all

    select 'HHHovious','Hovious' union all

    select 'Hovious','HHHovious' union all

    select 'OHvious','Hovious' union all

    select 'Hovious','OHvious' union all

    select 'HoIVous','Hovious' union all

    select 'Hovious','HoIVous' union all

    select 'HovioSU','Hovious' union all

    select 'Hovious','HovioSU' union all

    select 'Hovis','Hovious' union all

    select 'Hovious','Hovis' union all

    select 'HovIAs','Hovious' union all

    select 'Hovious','HovIAs' union all

    select 'HXviXus','Hovious' union all

    select 'Hovious','HXviXus')

    select Name1, Name2, Diff

    from Names

    cross apply

    (select count(*) as Diff

    from

    (select number as N1, substring(name1, number, 1) as N1Sub

    from dbo.Numbers

    where number <= len(name1)) N1

    full outer join

    (select number as N2, substring(name2, number, 1) as N2Sub

    from dbo.Numbers

    where number <= len(name2)) N2

    on n1 = n2 and n1sub = n2sub

    or n1 = n2-1 and n1sub = n2sub

    or n1 = n2+1 and n1sub = n2sub

    where n1 is null

    or n2 is null) Diffs;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks G, but it fails the data test in one respect. When two extra characters are added at the front of the string, it would correctly throw them out (Diff > 2) , but when there are two at the END, it calculates Diff = 1.

    HoviousSS Hovious 1

    Hovious HoviousSS 1

    HHHovious Hovious 11

    Hovious HHHovious 11

    I appreciate the effort. You guys are great.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob: Whats the scoring criteria that you are using?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Posted at the top and in my function.

    - one character added or omitted between strings: Hovious/Hovios

    or

    - one character mistyped: Hovious/Jovious

    or

    - one simple transposition Obvious/Bovious

    I ran G's query and looked at the results. His "Diff" column seemed to yield a Diff <= 2 for almost all of the situations where my function returns a "Y".

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I see some discrepancy: you say they are close if Diff<=2, but the comments imply that Close is Diff<=1?

    I ask because testing for Diff<=1 is pretty easy to do.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Can you add a Case statment to the Diff value? "When Diff <=2 and abs(len(Name1)-len(Name2)) <= 1 then 'Y'". Will that solve what you need?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply