March 19, 2009 at 10:17 am
Aside from FTE & the Soundex trick, almost every other solution that you come up with here will share the same problem: they are not SARGable, so if you apply them to a table they cannot use indexes and have to scan the whole table. That's OK if you are only going to apply it to the input stream or for bulk processing, however for OLTP use it's a real killer.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 19, 2009 at 10:50 am
Hi Bob
Forget my approach... I just made a performance test of both functions.
My Function
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 3360 ms, elapsed time = 3951 ms.
Your Function
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 1438 ms, elapsed time = 1402 ms.
The differences of my function between yesterday and today depend on two different systems.
I notice quiet often in last days that there are many features within SSE 2005 (or 2008) which I don't know yet. (My customer project still needs SSE 2000 compatibility... 🙁 )
Wish you best!
Flo
March 19, 2009 at 11:42 am
Thanks for testing it, Flo.
Inline table valued functions generally perform better when processed against entire sets with CROSS APPLY. The problem is that it an ITVF has to be ONE query.
Barry, you are exactly right. I thought I mentioned that there is an older version of the function which was used for individual transactions, but it was a real drag when trying to get long batches loaded. I got acceptable timings this morning using two queries joined by a union all. (One join on name, the other joined on number.) But, it was a pain to get there.
I had to store the results in a temp table then cross apply the StringsClose function to both the name and numbers. Haven't nailed down why yet, but multiple cross applys together with joins and DISTINCT is perhaps a bad thing. More research is needed on my part. In any event, selecting only those records from the temp table which had both a close name and a close number, into a SECOND temp table, and finally doing a select distinct gets me the correct result set. It's not pretty, but it works.
I'm not happy with any of the approaches immediately open to me, although "Fuzzy Lookup" is on my reading list again. Do you think full-text search could handle the transposition match requirement?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 19, 2009 at 11:50 am
Try this, see if it'll give you what you need. Returns the number of differences between the strings. I don't have time to finalize it right now, but it should get you what you need.
with
names (name1,name2) as
(select 'Hovious','Hovious' union all
select 'xHovious','Hovious' union all
select 'Hovious','xHovious' union all
select 'Hoviousx','Hovious' union all
select 'Hovious','Hoviousx' union all
select 'Xovious','Hovious' union all
select 'Hovious','Xovious' union all
select 'HoviouX','HoviouX' union all
select 'HoXious','Hovious' union all
select 'Hovious','HoXious' union all
select 'HoviousSS','Hovious' union all
select 'Hovious','HoviousSS' union all
select 'HHHovious','Hovious' union all
select 'Hovious','HHHovious' union all
select 'OHvious','Hovious' union all
select 'Hovious','OHvious' union all
select 'HoIVous','Hovious' union all
select 'Hovious','HoIVous' union all
select 'HovioSU','Hovious' union all
select 'Hovious','HovioSU' union all
select 'Hovis','Hovious' union all
select 'Hovious','Hovis' union all
select 'HovIAs','Hovious' union all
select 'Hovious','HovIAs' union all
select 'HXviXus','Hovious' union all
select 'Hovious','HXviXus')
select Name1, Name2, Diff
from Names
cross apply
(select count(*) as Diff
from
(select number as N1, substring(name1, number, 1) as N1Sub
from dbo.Numbers
where number <= len(name1)) N1
full outer join
(select number as N2, substring(name2, number, 1) as N2Sub
from dbo.Numbers
where number <= len(name2)) N2
on n1 = n2 and n1sub = n2sub
or n1 = n2-1 and n1sub = n2sub
or n1 = n2+1 and n1sub = n2sub
where n1 is null
or n2 is null) Diffs;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 19, 2009 at 12:50 pm
Thanks G, but it fails the data test in one respect. When two extra characters are added at the front of the string, it would correctly throw them out (Diff > 2) , but when there are two at the END, it calculates Diff = 1.
HoviousSS Hovious 1
Hovious HoviousSS 1
HHHovious Hovious 11
Hovious HHHovious 11
I appreciate the effort. You guys are great.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 19, 2009 at 1:50 pm
Bob: Whats the scoring criteria that you are using?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 19, 2009 at 1:54 pm
Posted at the top and in my function.
- one character added or omitted between strings: Hovious/Hovios
or
- one character mistyped: Hovious/Jovious
or
- one simple transposition Obvious/Bovious
I ran G's query and looked at the results. His "Diff" column seemed to yield a Diff <= 2 for almost all of the situations where my function returns a "Y".
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 19, 2009 at 2:04 pm
I see some discrepancy: you say they are close if Diff<=2, but the comments imply that Close is Diff<=1?
I ask because testing for Diff<=1 is pretty easy to do.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 19, 2009 at 2:29 pm
Can you add a Case statment to the Diff value? "When Diff <=2 and abs(len(Name1)-len(Name2)) <= 1 then 'Y'". Will that solve what you need?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply