Soundex and the Irish

  • O'Halfbubble here wondering how you handle Irish style names with the Soundex function.

    soundex('O''Halfbubble') returns O000, as doesn O'Anything.

    Do you perhaps drop the apostrophe to get soundex('OHalfbubble') of O411?

    Since there will likely be thousands of names to look up, I figured

    ... where soundex(lastname) = soundex(@search)

    would be a rebar operation where the server would have to evaluate soundex on all the rows, so I built a child table:

    CREATE TABLE dbo.GuestSoundex(

    GuestSoundexID uniqueidentifier NULL,

    GuestPK [char](16) NULL,

    SX char(4) NULL

    ) ON [PRIMARY]

    ALTER TABLE dbo.GuestSoundex ADD CONSTRAINT DF_GuestSoundex_GuestSoundexID DEFAULT (newid()) FOR GuestSoundexID

    and when a guest is entered, I parse out the name by non alpha's and store the soundex for each section in the GuestSoundex table.

    To do a lookup, I take what the user has typed and separate it into words the same way, and fetch all the guests that have guestsoundex records matching all the words.

    Maybe just separating the O will do: Seamus O'Halfbubble would get 3 soundex entries, one for Seamus, one for O, and one for Halfbubble, so when the user types 0'Halfbubble they'll get all the guests with an O and also a Halfbubble. If they just type Halfbubble, the result set would include O'Halfbubble. Yea, I think that will do. What do you thing?

    [font="Arial"]Halfbubble ------------
    You just can't do this stuff unless you're at least half a bubble off center.[/font]

  • soundex returns 0000 whenever a non A-Z,a-z character occurs: so commas hypens, numbers or the quote all give you misleading data.

    personally, i had always stripped out the values fromt he string that are not allowed, but i like the idea of breaking the name up too.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I've never found Soundex() to be all that useful. It won't find that "Richard" (soundex R263) is the same name as "Rick" (soundex R200) or "Dick" (soundex D200). It won't find obvious "heard it wrong on the phone" entries, like "Dan" vs "Van" (D500 vs V500).

    Names that aren't even particularly similar to the human ear, like "Robby" and "Rover", end up with a soundex "Difference" score of 3, which is the same as the score for "Robby" and "Bobby", which are nearly indistinguishable unless it's quiet (and are, indeed, functionally the same name). While "Robbie" and "Robert" have a score of 2, which means they are less similar than those, while being the same name, and hard to distinguish over a noisy environment.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I'm beginning to think you're right.

    It does spot some variations like Tom and Thom and sometimes and internal f for a v, but if you don't have the first letter right, it's lost.

    [font="Arial"]Halfbubble ------------
    You just can't do this stuff unless you're at least half a bubble off center.[/font]

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply