January 1, 2014 at 5:05 pm
kishorreddy.yuva (12/29/2013)
Hi Jason,I just copy the code from here and try to execute in my machine, But when i check these all are function not stored procedure.
Please provide me the code either for CLR function or to convert all this functions into stored procedure
If you look through this thread you will see an example of how to call the function from a stored procedure.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 2, 2014 at 8:53 am
I've renamed the pure SQL functions somewhat, but here's an example of how I would call them from a stored procedure:
DECLARE @result FLOAT
SET @result = CalculateJaroWinkler('Bob','Bill')
PRINT @result
SET @result = CalculateJaroWinkler('Bob','Bub') --0.7999
PRINT @result
SET @result = CalculateJaroWinkler('Barnes','Banres') --0.9555
PRINT @result
Alternately, for a very primitive table compare example:
DECLARE @JaroThreshold FLOAT
SET @JaroThreshold = 0.9
SELECT top 500 LEFT(MyWord,2) AS LeftTwo, MyWord AS Word
INTO #tempA
FROM MyListOfWords
WHERE MyWord LIKE '__a%'
ORDER BY MyWord
ALTER TABLE #tempA ADD CONSTRAINT [PK_tempA] PRIMARY KEY CLUSTERED
(
[Word] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = ON, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
GO
SELECT top 500 LEFT(MyWord,2) AS LeftTwo, MyWord AS word
INTO #tempB
FROM MyListOfWords
WHERE MyWord LIKE '__a%'
ORDER BY MyWord
ALTER TABLE #tempB ADD CONSTRAINT [PK_tempB] PRIMARY KEY CLUSTERED
(
[Word] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = ON, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
GO
SELECT a.word, b.word, CalculateJaroWinkler(a.word,b.word)
FROM #tempA a
INNER JOIN #tempB b
ON b.LeftTwo = a.LeftTwo
AND b.Word <> a.Word
WHERE CalculateJaroWinkler(a.word,b.word) > @JaroThreshold
DROP TABLE #tempA
DROP TABLE #tempB
July 6, 2015 at 2:47 pm
I'm hoping someone is still looking at this post. I have implemented the algorithm and it is working just fine. However, it performs quite horribly with my table with 60k rows. Is there anything I can do to speed things up? my table has basically first name and last name and that't it.
running a query as such
select [dbo].[JaroWinkler]('smith', LAST_NAME ) from LEIE could take up to 7 minutes.
Any Idea would be most appreciative.
Thanks!
July 6, 2015 at 6:08 pm
I'd try the CLR approach.
Also it sounds like you could match directly rather than via fuzzy, and perhaps only run jaro winkler on an unmatched subset?
When I use this in exercises, it's usually the last type of matching I attempt having exhausted other identifiers or direct matching approaches.
July 6, 2015 at 6:19 pm
Thanks SSC, I'm trying to use this with a web application, which would query a DB and get potential matches. I was hoping to have an option for a fuzzy search based on last name , first name combo or one or the other.
I'm quite new to the CLR thing, do you have any resources docs on how that would work? thanks for replying, or on your suggestion with the subset how would you go about doing a subset based on a last name search with a table that has 60k rows.
Thanks again!!!
July 6, 2015 at 6:33 pm
See my code on page 4 of this thread - and it's Mike 😉
I can't go through the details of CLR with you- but basically it's c# that you publish to sql server (and you or your DBA need to do a bit of config to enable CLR on the server itself). SQL Server can then call the routine.
When I did this before it did 36k rows in under a second (according to my last post anyway!)
September 29, 2015 at 6:59 am
Hi,
I'm new in the SQL stuff and have been engaged in a project to work with strings matching. I have read the entire article on (http://www.sqlservercentral.com/articles/Fuzzy+Match/65702/) and that is really interesting for what I have been working.
I tried to create exactly the same scenario here, but in your query to call the Jaro Winkler function you mentioned the dbo.JaroWinkler, but this function name does not exist in your code.
Below is the query to match the strings.
------------------------------
SELECT NameLookup.name_group_id, NameInput.Cust_Id, NameInput.Name_Input, NameLookup.first_name ,NameLookup.first_name_normalized, NameInput.Last_Name, dbo.JaroWinkler(NameInput.Name_Input, NameLookup.first_name) AS Jaro3,
RANK() OVER (Partition BY NameLookup.name_group_id ORDER BY dbo.JaroWinkler(NameInput.Name_Input, NameLookup.first_name) Desc) Jar02
FROM NameInput CROSS JOIN NameLookup Where dbo.JaroWinkler(NameInput.Name_Input, NameLookup.first_name) > .95order by NameLookup.name_group_id
Would you mind provide me some guidance on this case?
Thanks in advance
Eric
September 29, 2015 at 10:35 am
Hey Eric, Check out the zip files at the bottom of the article, you should find it there.
Adam Sottosanti
December 2, 2015 at 4:50 pm
Hey Ira,
Thank you so much for this. I'm implementing the Jaro-Winkler algorithm to do fuzzy matching in a data warehouse. I'll be able to integrate this TSQL script into my SSIS ETL Packages
March 4, 2016 at 8:31 am
The CLR implementation found in the assembly Microsoft.MasterDataServices.DataQuality, can be used in SQL and it is about 300 times faster..
March 4, 2016 at 8:41 am
ip7915 (3/4/2016)
The CLR implementation found in the assembly Microsoft.MasterDataServices.DataQuality, can be used in SQL and it is about 300 times faster..
Yep. And if you have DQS setup it's already available for you.
-- Itzik Ben-Gan 2001
October 3, 2016 at 9:34 pm
Deleted
February 1, 2018 at 2:16 pm
Hi Ira,
Does not seem to work for some cases... For example:
select dbo.[fn_calculateJaro]('Busan-Gimhae LRT Warranty','Busan Gimhae')
returns 0 because of a dash character '-'
However...select DIFFERENCE('Busan-Gimhae LRT Warranty','Busan Gimhae')
returns 4, meaning a good match.
Viewing 13 posts - 46 through 57 (of 57 total)
You must be logged in to reply to this topic. Login to reply