Pattern Matching for Computer Column

  • I need my column called 'nameInternal' to derive from another column called 'nameFormal'. If the latter is "Oracle Corporation", the 'nameInternal' column would compute to 'Oracle'. In pseudo code, this might look like:

    CREATE TABLE Company

    nameInternal AS

    CASE WHEN nameFormal LIKE ('%[corporation]'

    THEN nameInternal ('%[^corporation])'--need to match chars except for 'corporation'

    END

    I've searched pattern matching and CASE but can't deduce this particular task. I'm running 2008R2 on Windows7.

  • It sounds like you are looking at a Jaro-Winkler type of requirement.

    Here is an article on that

    http://www.sqlservercentral.com/articles/Fuzzy+Match/65702/

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hi Jason: Thanks for your reply. I don't think the Jaro-Winkler approach is quite what I'm looking for. Let me explain further: Both the nameFormal and the nameInternal attributes will be used in different ways; they're not fuzzy duplicates or mistakes.

    I'm using LIKE to detect a "formal" name like 'Oracle Corporation'. And I'd like to use pattern matching to strip off certain "fluff" words such as corporation, corp, inc, ltd, and so on. In the example below, I'd like to match all the chars in 'Oracle Corporation' except for 'corporation'. Pattern matching is obviously easy with LIKE but I don't know how to use it to strip off certain chars.

    CREATE TABLE Company

    nameInternal AS

    CASE WHEN nameFormal LIKE ('%[corporation]'

    THEN nameInternal ('%[^corporation])'--need to match all chars except for 'corporation'

    END

  • DECLARE @tbl TABLE

    (

    ID INT IDENTITY(1,1),

    Name VARCHAR(100),

    Name1 AS CASE WHEN Name LIKE '%corporation' THEN SUBSTRING(Name,1,LEN(Name)-11) ELSE Name END

    )

    INSERT INTO @tbl(Name)

    SELECT 'Oracle Corporation'

    UNION

    SELECT 'Union Corporation'

    UNION

    SELECT 'Union Corporate'

    select * from @tbl

    Regards,

    Mitesh Oswal

    +918698619998

    Regards,
    Mitesh OSwal
    +918698619998

  • timSF (2/9/2014)


    Hi Jason: Thanks for your reply. I don't think the Jaro-Winkler approach is quite what I'm looking for. Let me explain further: Both the nameFormal and the nameInternal attributes will be used in different ways; they're not fuzzy duplicates or mistakes.

    I'm using LIKE to detect a "formal" name like 'Oracle Corporation'. And I'd like to use pattern matching to strip off certain "fluff" words such as corporation, corp, inc, ltd, and so on. In the example below, I'd like to match all the chars in 'Oracle Corporation' except for 'corporation'. Pattern matching is obviously easy with LIKE but I don't know how to use it to strip off certain chars.

    CREATE TABLE Company

    nameInternal AS

    CASE WHEN nameFormal LIKE ('%[corporation]'

    THEN nameInternal ('%[^corporation])'--need to match all chars except for 'corporation'

    END

    On a previous project, we did this very thing and used the Jaro-Winkler method to get to what you are seeking. We also built a "terms" table where we could strip off those things like corporation, llc, etc etc etc.

    You could also try to build something using patindex to strip off those terms that should not be returned.

    Using patindex you could search for the starting point of these terms and return everything outside of that pattern (left or right).

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Thanks Mitesh. Using Substring makes perfect sense.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply