Fuzzy Grouping - Question

Question

Fuzzy Grouping - Question

SSCrazy

Points: 2321

September 6, 2006 at 10:25 am

I am experimenting with the Fuzzying Grouping Data Flow Transformations and I am getting unexpected behavior. I set the similarity threshold down to .25 ( I have tried lower and higher values).

The Issue:

I am not sure why it won't group 'Dan' with 'Daniel'. It does it properly for X-RAY Technologist.

Any Ideas? Thanks. Daniel

_key_in	_key_out	_score	Name	Name_clean	_Similarity_Name
1	1	1	DANIEL	DANIEL	1
3	1	0.665343	DANIELSON	DANIEL	0.6653433
2	2	1	DAN	DAN	1
4	4	1	X-RAY TECHNOLOGIST	X-RAY TECHNOLOGIST	1
5	4	0.9	XRAY TECHNOLOGIST	X-RAY TECHNOLOGIST	0.9
7	4	0.77257	X-RAY TECH	X-RAY TECHNOLOGIST	0.7725695
6	4	0.561458	XRAY TECH	X-RAY TECHNOLOGIST	0.5614583

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Alexander-255849 Newbie Points: 1 More actions · Answer 1

From

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/FzDTSSQL05.asp

Fuzzy Lookup and Fuzzy Grouping use a custom, domain-independent distance function that takes into account the edit distance (for example, "hits" is distance 2 from "bit"), the number of tokens, token order, and relative frequencies.

In your case

DAN distance from DANIEL is 3.

There are no other tokens to contribute to the score.

"X-RAY TECH" and "X-RAY TECHNOLOGIST" have common token.

However the edit distance from

DANIELSON to DANIEL is also 3. I can only guess than their "custom, domain-independent distance function" put different weight on delete editing versus insert editing.

DanielP SSCrazy Points: 2321 More actions · Answer 2

DanielP

SSCrazy

Points: 2321

September 25, 2006 at 3:30 pm

#662378

Thanks for the info.

Fuzzy Grouping - Question

Cookies on SQLServerCentral