T-SQL Unique index act as Fuzzy Lookup

  • Hi,

    I have table name dbo.Main with ID, LastName and FirstName columns. It has a unique index on LastName and FirstName. So if I try to insert

    1 Hem Has

    2 Hem Has

    It will reject the 2nd record due to unique index. What if I have the following records and if by mistake, I misspelled the last Name. How can I avoid entering the 2nd record in the dbo.Main. I don't want to use Soundex and was thinking is there a way to index in a same way as Fuzzy Lookup( kind of fuzzy unique index).

    1 Hem Has

    2 Heme Has

  • Nope. Especially not a unique index. If it's unique, it's unique. If someone types Heme, Hem or Ham, they're all going into the database.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Not natively in SQL.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Just thinking out loud here. If he could come up with some sort of formula to condense the name strings down... by omitting vowels for example... could he set up a unique constraint over a computed, persisted column? That way he'd be comparing "HMHS" to "HMHS".

    I have no idea what would be acceptable as a name variation, so don't ask me to spell out what the formula should be. 😀

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (8/4/2011)


    Just thinking out loud here. If he could come up with some sort of formula to condense the name strings down... by omitting vowels for example... could he set up a unique constraint over a computed, persisted column? That way he'd be comparing "HMHS" to "HMHS".

    I have no idea what would be acceptable as a name variation, so don't ask me to spell out what the formula should be. 😀

    That would work for sure. Would have to be very careful with the formula, as it would not be allowed to contain non-deterministic functions.

    I blogged about something similar some days ago.[/url]

    -- Gianluca Sartori

  • The Dixie Flatline (8/4/2011)


    Just thinking out loud here. If he could come up with some sort of formula to condense the name strings down... by omitting vowels for example... could he set up a unique constraint over a computed, persisted column? That way he'd be comparing "HMHS" to "HMHS".

    I have no idea what would be acceptable as a name variation, so don't ask me to spell out what the formula should be. 😀

    SoundEx()

    That's pretty much what it does.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • If you are talking about table where you are going to store personal details, then you shouldn't even think creating and validation the person names due to the following possible situations:

    1. There are genuine duplicates exist in real life, eg. John Smith (my daghter name has the same name as my wife...)

    2. How you can be sure that Rob Man and Rob Mann are not two different people with very similar surname

    3. Bill and William - what you do in this scenario?

    Untill you have something really unique for the person (finger prints, retina scan or DNK sample) it is impossible to 100% identify the person.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (8/5/2011)


    If you are talking about table where you are going to store personal details, then you shouldn't even think creating and validation the person names due to the following possible situations:

    1. There are genuine duplicates exist in real life, eg. John Smith (my daghter name has the same name as my wife...)

    2. How you can be sure that Rob Man and Rob Mann are not two different people with very similar surname

    3. Bill and William - what you do in this scenario?

    Untill you have something really unique for the person (finger prints, retina scan or DNK sample) it is impossible to 100% identify the person.

    Fair point.

    I have coded a simple data validation script some years ago to find duplicates in a marketing database and it was no fun at all.

    I had to look at the name, address, city and many other things to detect duplicates and it was a lot of trial and error.

    Many algorithms had to be tuned to leverage the duplicates/false duplicates rate.

    -- Gianluca Sartori

  • Gianluca Sartori (8/5/2011)


    ...

    I had to look at the name, address, city and many other things to detect duplicates and it was a lot of trial and error.

    Many algorithms had to be tuned to leverage the duplicates/false duplicates rate.

    Yes, and final descition should still be maid by the real user in his discretion. Usually applcation logic would apply some algoritm and return list of potential duplicates. Business user would then review the list and make the appropriate action. Auto dedupe by name, address and even DoB would present some level of risk: twins named the same (to make tax official mad) and living at the same address are rare case but possible.;-)

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • GilaMonster (8/5/2011)


    The Dixie Flatline (8/4/2011)


    Just thinking out loud here. If he could come up with some sort of formula to condense the name strings down... by omitting vowels for example... could he set up a unique constraint over a computed, persisted column? That way he'd be comparing "HMHS" to "HMHS".

    I have no idea what would be acceptable as a name variation, so don't ask me to spell out what the formula should be. 😀

    SoundEx()

    That's pretty much what it does.

    I would hate to use SoundEx() for this kind of thing. It's purpose is to suggest possibilities where names might be mispronounced or misspelled. There's precious little control over it.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply