July 1, 2002 at 12:07 pm
Hello all,
With some previous help from the group I have arrived at a SP that is doing what I want but very slowly. I would appreciate any advice on how to speed this up. Here is the situation:
1. I have a mailing list called AbcsMstr. The user provides a code called @abcs_cd to pull a subset of names from AbcsMstr and a cursor is opened with the result.
2. Next I look for matches in our prospect table where state = state, city = city and SOUNDEX matches for first and last name. The matches are inserted into table abcs.
3. Typically, the AbcsMstr subset will have about 10,000 records and our prospect table has about 200,000 records. This SP code is below.
4. Is there anything I can do with the logic to speed things up? It was suggested to open the cursor as FORWARD ONLY but I can't seem to find anything about it in BOL. Is this supported in SQL7?
Thanks very much for any suggestions!!!
CREATE PROCEDURE procSOUND_EX @abcs_cd char(10)
--Jonathan Matt 6/26/02
--Input Paramters: abcs_cd - example: Code=30-2
--Purpose: User supplies the abcs_cd code to populate a cursor, then each record in the cursor
--is tested for a match in the Person table. The results are placed in a table named abcs.
AS
set nocount on
DECLARE@AbcsId int,
@last_name varchar(50),
@first_name varchar(50),
@city varchar(50),
@st varchar(50),
@zip varchar(50),
@prspct_id int
DELETE FROM abcs
DECLARE name_cursor CURSOR FOR
SELECT AbcsId, Last_Name, First_Name, City, St, Zip
FROM AbcsMstr
WHERE AbcsMstr.COUP = @abcs_cd
OPEN name_cursor
WHILE @@fetch_status <> -1
BEGIN
FETCH name_cursor INTO @AbcsId, @last_name, @first_name, @city, @st, @zip
INSERT INTO abcs
SELECT @AbcsId, Person.person_last_name, Person.person_first_name, Person.prspct_id, Prspct.hsehold_addr1_name, Prspct.hsehold_city_name, Prspct.hsehold_state_cd, Prspct.hsehold_postal_zone_cd
FROM Person JOIN Prspct ON Person.prspct_id = Prspct.prspct_id
WHERE
@st = Prspct.hsehold_state_cd AND
@city = Prspct.hsehold_city_name AND
(SOUNDEX(person_last_name) = SOUNDEX (@last_name)) AND
(SOUNDEX(person_first_name) = SOUNDEX (@first_name))
END
CLOSE name_cursor
DEALLOCATE name_cursor
July 2, 2002 at 3:50 am
Hi, what you could do is move the insert statement in a separate stored procedure. And make the cursor read only ( i don't have sql 7 bol at hand so i don't know the exact syntacts for a read only cursor)
July 2, 2002 at 11:53 am
Definately upgrade to SQL Server 2000. Don't wait, do it immediately.
Several features will make your routine run instantly. You would be able to have a computed column in the table for the SOUNDEX fuction. Then create an index on this computed column.
Take care.
July 2, 2002 at 5:44 pm
I agree that an indexed computed column is a good idea. Even without it though the key is to try to do a set operation instead of a cursor operation. Have you tried writing it as a single query? If performance suffers too much (I dont imagine this runs that often or is extremely time critical?) you could add two columns for the soundex values your table and maintain them with a trigger index them. This would simulate what you'd get from a computed column with an index on it.
I'd look at those options before tinkering with the cursor.
Andy
July 3, 2002 at 8:15 am
I want to thank everyone who offered suggestions. I rewrote the procedure and got rid of the cursor. The cursor method took many hours (I never let it complete). It now runs in under 5 minutes for 10,000 input records against 250,000 prospect records! I am happy with this! My code is below. Any suggestions for further improvement gladly accepted.
Thanks again to all.
CREATE PROCEDURE procAbcs_hit @abcs_cd varchar (20)
/*Created by Jonathan Matt on 7/2/02.
Input parameter @abcs_cd identifies the particular subset of names to analyze.
Table abcs_hit holds all prospects that have same ciy and state as names being analyzed.
Table abcs holds all those prospects in abcs_hit whose SOUNDEX matches for first and last names*/
AS
SET NOCOUNT ON
DELETE FROM abcs_hit
INSERT abcs_hit
SELECT AbcsMstr.AbcsId,
Prspct.prspct_id,
AbcsMstr.Last_Name,
AbcsMstr.First_Name,
Person.person_last_name,
Person.person_first_name
FROM (AbcsMstr INNER JOIN Prspct ON (AbcsMstr.St = Prspct.hsehold_state_cd) AND (AbcsMstr.City = Prspct.hsehold_city_name)
INNER JOIN Person ON Prspct.prspct_id = Person.prspct_id)
WHERE (((AbcsMstr.Coup)='@abcs_cd') AND ((Prspct.prspct_type_cd)='HOUSEHOLD'))
ORDER BY AbcsMstr.AbcsId, Prspct.prspct_id;
DELETE FROM abcs
INSERT abcs
SELECT abcs_id,
prspct_id
FROM abcs_hit
WHERE (SOUNDEX(ab_last_name)= SOUNDEX(per_last_name)) AND
(SOUNDEX(ab_first_name)=SOUNDEX(per_first_name))
ORDER BY abcs_id, prspct_id;
July 4, 2002 at 3:28 am
I have a similar process but was disatisfied with soundex for phonetic matching, so here is an alternate algorithm:
CREATE FUNCTION dbo.DoubleMetaPhone (@str varchar(70))
RETURNS char(10)
AS
BEGIN
/*#########################################################################
Double Metaphone Phonetic Matching Function
This reduces word to approximate phonetic string. This is deliberately
not a direct phonetic
Based off original C++ code and algorithm by
Lawrence Philips (lphilips@verity.com)
Published in the C/C++ Users Journal:
http://www.cuj.com/articles/2000/0006/0006d/0006d.htm?topic=articles
Original Metaphone presented in article in "Computer Language" in 1990.
Reduces alphabet to
The 14 constonant sounds:
"sh" "p"or"b" "th"
| | |
X S K J T F H L M N P R 0 W
Drop vowels except at the beginning
Produces a char(10) string. The left(@result,5) gives the most common
pronouciation, right(@result,5) gives the commonest alternate.
Translated into t-SQL by Keith Henry (keithh@lbm-solutions.com)
#########################################################################*/
Declare@original varchar(70),
@primary varchar(70),
@secondary varchar(70),
@lengthint,
@Last int,
@current int,
@strcur1 char(1) ,
@strnext1 char(1) ,
@strprev1char(1),
@SlavoGermanic bit
set @SlavoGermanic= 0
set @primary = ''
set @secondary = ''
set @current = 1
set @length = len(@str)
set @Last = @length
set @original = lTrim(isnull(@str,'')) + ''
set @original = upper(@original)
if patindex('%[WK]%',@str) + charindex('CZ',@str) + charindex('WITZ',@str) <> 0
set @SlavoGermanic = 1
-- skip this at beginning of word
if substring(@original, 1, 2) in ('GN', 'KN', 'PN', 'WR', 'PS')
set @current = @current + 1
-- Initial 'X' is pronounced 'Z' e.g. 'Xavier'
if substring(@original, 1, 1) = 'X'
begin
set @primary = @primary + 'S' -- 'Z' maps to 'S'
set @secondary = @secondary + 'S'
set @current = @current + 1
end
if substring(@original, 1, 1) in ('A', 'E', 'I', 'O', 'U', 'Y')
begin
set @primary = @primary + 'A' -- all init vowels now map to 'A'
set @secondary = @secondary + 'A'
set @current = @current + 1
end
while @current <= @length
begin
if len(@primary) >= 5 break
set @strcur1 = substring(@original, @current, 1)
set @strnext1 = substring(@original, (@current + 1), 1)
set @strprev1 = substring(@original, (@current - 1), 1)
if @strcur1 in ('A', 'E', 'I', 'O', 'U', 'Y', ' ', '''', '-')
set @current = @current + 1
else
if @strcur1 = 'B' -- '-mb', e.g. 'dumb', already skipped over ...
begin
set @primary = @primary + 'P'
set @secondary = @secondary + 'P'
if @strnext1 = 'B'
set @current = @current + 2
else
set @current = @current + 1
end
else
if @strcur1 = 'Ç'
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
set @current = @current + 1
end
else
if @strcur1 = 'C'
begin
if @strnext1 = 'H'
begin
if substring(@original, @current, 4) = 'CHIA'-- italian 'chianti'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
else
begin
if @current > 1-- find 'michael'
and substring(@original, @current, 4) = 'CHAE'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'X'
end
else
begin
if @current = 1-- greek roots e.g. 'chemistry', 'chorus'
and (substring(@original, @current + 1, 5) in ('HARAC', 'HARIS')
or substring(@original, @current + 1, 3) in ('HOR', 'HYM', 'HIA', 'HEM')
)
and substring(@original, 1, 5) <> 'CHORE'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
else
begin
if (substring(@original, 0, 4) in ('VAN ', 'VON ')-- germanic, greek, or otherwise 'ch' for 'kh' sound
or substring(@original, 0, 3) = 'SCH'
)
or substring(@original, @current - 2, 6) in ('ORCHES', 'ARCHIT', 'ORCHID')-- 'architect' but not 'arch', orchestra', 'orchid'
or substring(@original, @current + 2, 1) in ('T', 'S')
or ((@strprev1 in ('A','O','U','E')
or @current = 1
)
and substring(@original, @current + 2, 1) in ('L','R','N','M','B','H','F','V','W',' ')-- e.g. 'wachtler', 'weschsler', but not 'tichner'
)
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
else
begin
if (@current > 1)
begin
if substring(@original, 1, 2) = 'MC' -- e.g. 'McHugh'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
else
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'K'
end
end
else
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'X'
end
end
end
end
end
set @current = @current + 2
end --ch logic
else
begin
if @strnext1 = 'C'-- double 'C', but not McClellan'
and not(@current = 1
and substring(@original, 1, 1) = 'M'
)
begin
if substring(@original, @current + 2, 1) in ('I','E','H')-- 'bellocchio' but not 'bacchus'
and substring(@original, @current + 2, 2) <> 'HU'
begin
if (@current = 2-- 'accident', 'accede', 'succeed'
and @strprev1 = 'A'
)
or substring(@original, @current - 1, 5) in ('UCCEE', 'UCCES')
begin
set @primary = @primary + 'KS'
set @secondary = @secondary + 'KS'
end
else
begin-- 'bacci', 'bertucci', other italian
set @primary = @primary + 'X'
set @secondary = @secondary + 'X'
-- e.g. 'focaccia'if substring(@original, @current, 4) = 'CCIA'
end
set @current = @current + 3
end
else
begin
set @primary = @primary + 'K'-- Pierce's rule
set @secondary = @secondary + 'K'
set @current = @current + 2
end
end
else
begin
if @strnext1 in ('K','G','Q')
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
set @current = @current + 2
end
else
begin
if @strnext1 in ('I','E','Y')
begin
if substring(@original, @current, 3) in ('CIO','CIE','CIA')-- italian vs. english
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'X'
end
else
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
end
set @current = @current + 2
end
else
begin
if @strnext1 = 'Z'-- e.g. 'czerny'
and substring(@original, @current -2, 4) <> 'WICZ'
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'X'
set @current = @current + 2
end
else
begin
if @current > 2 -- various gremanic
and substring(@original, @current - 2,1) not in ('A', 'E', 'I', 'O', 'U', 'Y')
and substring(@original, @current - 1, 3) = 'ACH'
and ((substring(@original, @current + 2, 1) <> 'I')
and ((substring(@original, @current + 2, 1) <> 'E')
or substring(@original, @current - 2, 6) in ('BACHER', 'MACHER')
)
)
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
set @current = @current + 2
end
else
begin
if @current = 1 -- special case 'caesar'
and substring(@original, @current, 6) = 'CAESAR'
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
set @current = @current + 2
end
else
begin-- final else
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
if substring(@original, @current + 1, 2) in (' C',' Q',' G')-- name sent in 'mac caffrey', 'mac gregor'
set @current = @current + 3
else
set @current = @current + 1
end
end
end
end
end
end
end
end
else
if @strcur1 = 'D'
begin
if @strnext1 = 'G'
begin
if substring(@original, @current + 2, 1) in ('I','E','Y')
begin
set @primary = @primary + 'J'-- e.g. 'edge'
set @secondary = @secondary + 'J'
set @current = @current + 3
end
else
begin
set @primary = @primary + 'TK'-- e.g. 'edgar'
set @secondary = @secondary + 'TK'
set @current = @current + 2
end
end
else
begin
if substring(@original, @current, 2) in ('DT','DD')
begin
set @primary = @primary + 'T'
set @secondary = @secondary + 'T'
set @current = @current + 2
end
else
begin
set @primary = @primary + 'T'
set @secondary = @secondary + 'T'
set @current = @current + 1
end
end
end
else
if @strcur1 = 'F'
begin
set @primary = @primary + 'F'
set @secondary = @secondary + 'F'
if (@strnext1 = 'F')
set @current = @current + 2
else
set @current = @current + 1
end
else
if @strcur1 = 'G'
begin
if (@strnext1 = 'H')
begin
if @current > 1
and @strprev1 not in ('A', 'E', 'I', 'O', 'U', 'Y')
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
else
begin
if not((@current > 2-- Parker's rule (with some further refinements) - e.g. 'hugh'
and substring(@original, @current - 2, 1) in ('B','H','D')
)-- e.g. 'bough'
or (@current > 3
and substring(@original, @current - 3, 1) in ('B','H','D')
)-- e.g. 'broughton'
or (@current > 4
and substring(@original, @current - 4, 1) in ('B','H')
))
begin
if @current > 3-- e.g. 'laugh', 'McLaughlin', 'cough', 'gough', 'rough', 'tough'
and @strprev1 = 'U'
and substring(@original, @current - 3, 1) in ('C','G','L','R','T')
begin
set @primary = @primary + 'F'
set @secondary = @secondary + 'F'
end
else
begin
if @current > 1
and @strprev1 <> 'I'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
else
begin
if (@current < 4)
begin
if (@current = 1)-- 'ghislane', 'ghiradelli'
begin
if (substring(@original, @current + 2, 1) = 'I')
begin
set @primary = @primary + 'J'
set @secondary = @secondary + 'J'
end
else
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
end
end
end
end
end
end
set @current = @current + 2
end
else
begin
if (@strnext1 = 'N')
begin
if @current = 1
and substring(@original, 0,1) in ('A', 'E', 'I', 'O', 'U', 'Y')
and @SlavoGermanic = 0
begin
set @primary = @primary + 'KN'
set @secondary = @secondary + 'N'
end
else
begin
-- not e.g. 'cagney'
if substring(@original, @current + 2, 2) = 'EY'
and (@strnext1 <> 'Y')
and @SlavoGermanic = 0
begin
set @primary = @primary + 'N'
set @secondary = @secondary + 'KN'
end
else
begin
set @primary = @primary + 'KN'
set @secondary = @secondary + 'KN'
end
end
set @current = @current + 2
end
else
begin
if substring(@original, @current + 1, 2) = 'LI'-- 'tagliaro'
and @SlavoGermanic = 0
begin
set @primary = @primary + 'KL'
set @secondary = @secondary + 'L'
set @current = @current + 2
end
else
begin
if @current = 1-- -ges-, -gep-, -gel- at beginning
and (@strnext1 = 'Y'
or substring(@original, @current + 1, 2) in ('ES','EP','EB','EL','EY','IB','IL','IN','IE', 'EI','ER')
)
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'J'
set @current = @current + 2
end
else
begin
if (substring(@original, @current + 1, 2) = 'ER'-- -ger-, -gy-
or @strnext1 = 'Y'
)
and substring(@original, 1, 6) not in ('DANGER','RANGER','MANGER')
and @strprev1 not in ('E', 'I')
and substring(@original, @current - 1, 3) not in ('RGY','OGY')
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'J'
set @current = @current + 2
end
else
begin
if @strnext1 in ('E','I','Y')-- italian e.g. 'biaggi'
or substring(@original, @current -1, 4) in ('AGGI','OGGI')
begin
if (substring(@original, 1, 4) in ('VAN ', 'VON ')-- obvious germanic
or substring(@original, 1, 3) = 'SCH'
)
or substring(@original, @current + 1, 2) = 'ET'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
end
else
begin
-- always soft if french ending
if substring(@original, @current + 1, 4) = 'IER '
begin
set @primary = @primary + 'J'
set @secondary = @secondary + 'J'
end
else
begin
set @primary = @primary + 'J'
set @secondary = @secondary + 'K'
end
end
set @current = @current + 2
end
else
begin-- other options exausted call it k sound
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
if (@strnext1 = 'G')
set @current = @current + 2
else
set @current = @current + 1
end
end
end
end
end
end
end
else
if @strcur1 = 'H'
begin
if (@current = 1 -- only keep if first & before vowel or btw. 2 vowels
or @strprev1 in ('A', 'E', 'I', 'O', 'U', 'Y')
)
and @strnext1 in ('A', 'E', 'I', 'O', 'U', 'Y')
begin
set @primary = @primary + 'H'
set @secondary = @secondary + 'H'
set @current = @current + 2
end
else
set @current = @current + 1
end
else
if @strcur1 = 'J'
begin
if substring(@original, @current, 4) = 'JOSE'-- obvious spanish, 'jose', 'san jacinto'
or substring(@original, 1, 4) = 'SAN '
begin
if (@current = 1
and substring(@original, @current + 4, 1) = ' '
)
or substring(@original, 1, 4) = 'SAN '
begin
set @primary = @primary + 'H'
set @secondary = @secondary + 'H'
end
else
begin
set @primary = @primary + 'J'
set @secondary = @secondary + 'H'
end
set @current = @current + 1
end
else
begin
if @current = 1
begin
set @primary = @primary + 'J' -- Yankelovich/Jankelowicz
set @secondary = @secondary + 'A'
set @current = @current + 1
end
else
begin
if @strprev1 in ('A', 'E', 'I', 'O', 'U', 'Y') -- spanish pron. of .e.g. 'bajador'
and @SlavoGermanic = 0
and @strnext1 in ('A','O')
begin
set @primary = @primary + 'J'
set @secondary = @secondary + 'H'
set @current = @current + 1
end
else
begin
if (@current = @Last)
begin
set @primary = @primary + 'J'
set @secondary = @secondary + ''
set @current = @current + 1
end
else
begin
if @strnext1 in ('L','T','K','S','N','M','B','Z')
and @strprev1 not in ('S','K','L')
begin
set @primary = @primary + 'J'
set @secondary = @secondary + 'J'
set @current = @current + 1
end
else
begin
if (@strnext1 = 'J') -- it could happen
set @current = @current + 2
else
set @current = @current + 1
end
end
end
end
end
end
else
if @strcur1 = 'K'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
if (@strnext1 = 'K')
set @current = @current + 2
else
set @current = @current + 1
end
else
if @strcur1 = 'L'
begin
if (@strnext1 = 'L')
begin
if (@current = (@length - 3)-- spanish e.g. 'cabrillo', 'gallegos'
and substring(@original, @current - 1, 4) in ('ILLO','ILLA','ALLE')
)
or ((substring(@original, @Last - 1, 2) in ('AS','OS')
or substring(@original, @Last, 1) in ('A','O')
)
and substring(@original, @current - 1, 4) = 'ALLE'
)
set @primary = @primary + 'L'--Alternate is silent
else
begin
set @primary = @primary + 'L'
set @secondary = @secondary + 'L'
end
set @current = @current + 2
end
else
begin
set @current = @current + 1
set @primary = @primary + 'L'
set @secondary = @secondary + 'L'
end
end
else
if @strcur1 = 'M'
begin
set @primary = @primary + 'M'
set @secondary = @secondary + 'M'
if substring(@original, @current - 1, 3) = 'UMB'
and (@current + 1 = @Last
or substring(@original, @current + 2, 2) = 'ER'
)-- 'dumb', 'thumb'
or @strnext1 = 'M'
set @current = @current + 2
else
set @current = @current + 1
end
else
if @strcur1 in ('N','Ñ')
begin
set @primary = @primary + 'N'
set @secondary = @secondary + 'N'
if @strnext1 in ('N','Ñ')
set @current = @current + 2
else
set @current = @current + 1
end
else
if @strcur1 = 'P'
begin
if (@strnext1 = 'H')
begin
set @current = @current + 2
set @primary = @primary + 'F'
set @secondary = @secondary + 'F'
end
else
begin
-- also account for 'campbell' and 'raspberry'
if @strnext1 in ('P','B')
set @current = @current + 2
else
begin
set @current = @current + 1
set @primary = @primary + 'P'
set @secondary = @secondary + 'P'
end
end
end
else
if @strcur1 = 'Q'
begin
set @primary = @primary + 'K'
set @secondary = @secondary + 'K'
if (@strnext1 = 'Q')
set @current = @current + 2
else
set @current = @current + 1
end
else
if @strcur1 = 'R'
begin
if @current = @Last-- french e.g. 'rogier', but exclude 'hochmeier'
and @SlavoGermanic = 0
and substring(@original, @current - 2, 2) = 'IE'
and substring(@original, @current - 4, 2) not in ('ME','MA')
set @secondary = @secondary + 'R' --set @primary = @primary + ''
else
begin
set @primary = @primary + 'R'
set @secondary = @secondary + 'R'
end
if (@strnext1 = 'R')
begin
if substring(@original, @current, 3) = 'RRI' --alternate Kerrigan, Corrigan
set @secondary = @secondary + 'R'
set @current = @current + 2
end
else
set @current = @current + 1
end
else
if @strcur1 = 'S'
begin
if substring(@original, @current - 1, 3) in ('ISL','YSL') -- special cases 'island', 'isle', 'carlisle', 'carlysle'
set @current = @current + 1--silent s
else
begin
if substring(@original, @current, 2) = 'SH'
begin
-- germanic
if substring(@original, @current + 1, 4) in ('HEIM','HOEK','HOLM','HOLZ')
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
end
else
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'X'
end
set @current = @current + 2
end
else
begin
-- italian & armenian
if substring(@original, @current, 3) in ('SIO','SIA')
or substring(@original, @current, 4) in ('SIAN')
begin
if @SlavoGermanic = 0
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'X'
end
else
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
end
set @current = @current + 3
end
else
begin
if (@current = 1-- german & anglicisations, e.g. 'smith' match 'schmidt', 'snider' match 'schneider'
and @strnext1 in ('M','N','L','W')-- also, -sz- in slavic language altho in hungarian it is pronounced 's'
)
or @strnext1 = 'Z'
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'X'
if @strnext1 = 'Z'
set @current = @current + 2
else
set @current = @current + 1
end
else
begin
if substring(@original, @current, 2) = 'SC'
begin
if substring(@original, @current + 2, 1) = 'H'-- Schlesinger's rule
begin
if substring(@original, @current + 3, 2) in ('OO','ER','EN','UY','ED','EM')-- dutch origin, e.g. 'school', 'schooner'
begin
if substring(@original, @current + 3, 2) in ('ER','EN')-- 'schermerhorn', 'schenker'
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'SK'
end
else
begin
set @primary = @primary + 'SK'
set @secondary = @secondary + 'SK'
end
set @current = @current + 3
end
else
begin
if @current = 1
and substring(@original, 3,1) not in ('A', 'E', 'I', 'O', 'U', 'Y')
and substring(@original, @current + 3, 1) <> 'W'
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'S'
end
else
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'X'
end
set @current = @current + 3
end
end
else
begin
if substring(@original, @current + 2, 1) in ('I','E','Y')
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
end
else
begin
set @primary = @primary + 'SK'
set @secondary = @secondary + 'SK'
end
set @current = @current + 3
end
end
else
begin
if @current = 1-- special case 'sugar-'
and substring(@original, @current, 5) = 'SUGAR'
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'S'
set @current = @current + 1
end
else
begin
if @current = @Last-- french e.g. 'resnais', 'artois'
and substring(@original, @current - 2, 2) in ('AI','OI')
set @secondary = @secondary + 'S' --set @primary = @primary + ''
else
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
end
if @strnext1 in ('S','Z')
set @current = @current + 2
else
set @current = @current + 1
end
end
end
end
end
end
end
else
if @strcur1 = 'T'
begin
if substring(@original, @current, 4) = 'TION'
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'X'
set @current = @current + 3
end
else
if substring(@original, @current, 3) in ('TIA','TCH')
begin
set @primary = @primary + 'X'
set @secondary = @secondary + 'X'
set @current = @current + 3
end
else
if substring(@original, @current, 2) = 'TH'
or substring(@original, @current, 3) = 'TTH'
begin
if substring(@original, @current + 2, 2) in ('OM','AM')-- special case 'thomas', 'thames' or germanic
or substring(@original, 0, 4) in ('VAN ','VON ')
or substring(@original, 0, 3) = 'SCH'
begin
set @primary = @primary + 'T'
set @secondary = @secondary + 'T'
end
else
begin
set @primary = @primary + '0'
set @secondary = @secondary + 'T'
end
set @current = @current + 2
end
else
begin
if @strnext1 in ('T','D')
begin
set @current = @current + 2
set @primary = @primary + 'T'
set @secondary = @secondary + 'T'
end
else
begin
set @current = @current + 1
set @primary = @primary + 'T'
set @secondary = @secondary + 'T'
end
end
end
else
if @strcur1 = 'V'
if (@strnext1 = 'V')
set @current = @current + 2
else
begin
set @current = @current + 1
set @primary = @primary + 'F'
set @secondary = @secondary + 'F'
end
else
if @strcur1 = 'W'
begin
-- can also be in middle of word
if substring(@original, @current, 2) = 'WR'
begin
set @primary = @primary + 'R'
set @secondary = @secondary + 'R'
set @current = @current + 2
end
else
if @current = 1
and (@strnext1 in ('A', 'E', 'I', 'O', 'U', 'Y')
or substring(@original, @current, 2) = 'WH'
)
begin
if @strnext1 in ('A', 'E', 'I', 'O', 'U', 'Y')-- Wasserman should match Vasserman
begin
set @primary = @primary + 'A'
set @secondary = @secondary + 'F'
set @current = @current + 1
end
else
begin
set @primary = @primary + 'A'-- need Uomo to match Womo
set @secondary = @secondary + 'A'
set @current = @current + 1
end
end
else
if (@current = @Last -- Arnow should match Arnoff
and @strprev1 in ('A', 'E', 'I', 'O', 'U', 'Y')
)
or substring(@original, @current - 1, 5) in ('EWSKI','EWSKY','OWSKI','OWSKY')
or substring(@original, 0, 3) = 'SCH'
begin
set @secondary = @secondary + 'F'--set @primary = @primary + ''
set @current = @current + 1
end
else
if substring(@original, @current, 4) in ('WICZ','WITZ') -- polish e.g. 'filipowicz'
begin
set @primary = @primary + 'TS'
set @secondary = @secondary + 'FX'
set @current = @current + 4
end
else
set @current = @current + 1-- else skip it
end
else
if @strcur1 = 'X'
begin
if not (@current = @Last-- french e.g. breaux
and (substring(@original, @current - 3, 3) in ('IAU', 'EAU')
or substring(@original, @current - 2, 2) in ('AU', 'OU')
)
)
begin
set @primary = @primary + 'KS'
set @secondary = @secondary + 'KS'
end--else skip it
if @strnext1 in ('C','X')
set @current = @current + 2
else
set @current = @current + 1
end
else
if @strcur1 = 'Z'
begin
if (@strnext1 = 'Z')
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
set @current = @current + 2
end
else
begin
if (@strnext1 = 'H') -- chinese pinyin e.g. 'zhao'
begin
set @primary = @primary + 'J'
set @secondary = @secondary + 'J'
set @current = @current + 2
end
else
begin
if (substring(@original, @current + 1, 2) in ('ZO', 'ZI', 'ZA'))
or (@SlavoGermanic = 1
and (@current > 1
and @strprev1 <> 'T'
)
)
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'TS'
end
else
begin
set @primary = @primary + 'S'
set @secondary = @secondary + 'S'
end
end
set @current = @current + 1
end
end
else
set @current = @current + 1
end
return cast(@primary as char(5)) + cast(@secondary as char(5))
end
Then I added a Matchscore algorithm to give me an adjustable match (this bit is still being developed).
CREATE FUNCTION dbo.MetaphoneMatch(@str1 varchar(70),@str2 varchar(70))
returns tinyint
as
begin
declare@Res tinyint
set @Res = 0
if @str1 = @str2 set @Res = 10
else
begin
declare @tmpStr char(10),
@p1 varchar(5),
@s1 varchar(5),
@p2 varchar(5),
@s2 varchar(5)
set @tmpStr = dbo.DoubleMetaPhone(@str1)
set @p1 = rtrim(left(@tmpStr,5))
set @s1 = ltrim(right(@tmpStr,5))
set @tmpStr = dbo.DoubleMetaPhone(@str2)
set @p2 = rtrim(left(@tmpStr,5))
set @s2 = ltrim(right(@tmpStr,5))
if @p1 = '' set @p1 = null
if @p2 = '' set @p2 = null
if @s1 = '' set @s1 = null
if @s2 = '' set @s2 = null
if(@p1 = @s1) and (@p2 = @s2) and (@p1 = @p2) set @Res = 9--consistent match between consistent keys
else
if((@p1 = @s1) and (@p2 <> @s2)) or
((@p1 <> @s1) and (@p2 = @s2))-- one Umatched key
if (@p1 = @p2) set @Res = 8--matched primary key
else
if (@p1 = @s2) or
(@p2 = @s1) set @Res = 7--matched Secondary to primary key
else
if (@p1 <> @p2) and
(@s1 = @s2) set @Res = 6--matched Secondary key with unmatched primary
else
set @Res = 0
else
if((@p1 <> @s1) and (@p2 <> @s2)) -- both Umatched keys
if (@p1 = @p2) and
(@s1 = @s2) set @Res = 5--matched primary and matched Secondary keys
else
if (@p1 = @p2) and
(@s1 <> @s2) set @Res = 4--matched primary key with unmatched Secondary
else
if (@p1 = @s2) and
(@s1 = @p2) set @Res = 3--cross matched Secondary to primary key
else
if (@p1 = @s2) or
(@p2 = @s1) set @Res = 2--matched Secondary to primary key
else
if (@p1 <> @p2) and
(@s1 = @s2) set @Res = 1--matched Secondary key with unmatched primary
else
set @Res = 0
end
return @Res
end
This works better than soundex for surnames, but I need something else for firstnames. I have a dataset where a lot of firstnames are missing or initals.
I need a keyname list to match unphonetic names (like Robert and Bob, Anthony and Tony etc) does anyone know of a good one?
July 24, 2002 at 8:47 am
Sorry about the delay responding. Thanks a ton for the code. It will take me awhile to absorb what you sent. In the meantime, my soundex procedure is now running in about 20 seconds to compare 10,000 names with 400,000!
Sure has come along way from the first version with the cursor.
CREATE PROCEDURE procAbcs_hit_New @abcs_cd varchar (20)
--Created on 7/15/02 by Jonathan Matt. Input pararmeter @abcd_cd is the code supplied by calling app abcs_hit.mdb
AS
SET NOCOUNT ON
DELETE WORKING_abcs
INSERT WORKING_abcs (AbcsId, prspct_id)
SELECT AbcsMstr.AbcsID, Prspct.prspct_id
FROM AbcsMstr JOIN (Person JOIN Prspct ON Person.prspct_id = Prspct.prspct_id)
ON (AbcsMstr.Zip = Prspct.hsehold_postal_zone_cd)
WHERE AbcsMstr.Coup=@abcs_cd AND
Prspct.prspct_type_cd ='Household' AND
SUBSTRING(AbcsMstr.Zip, 1, 5)=SUBSTRING(Prspct.hsehold_postal_zone_cd, 1, 5) AND
SOUNDEX (AbcsMstr.Last_Name)=SOUNDEX(person.person_last_name) AND
SOUNDEX(AbcsMstr.First_Name)=SOUNDEX(person.person_first_name);
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply