Full text indexes

  • I'm not even sure this is doable, but I'm going to try.

    Any tips on how to use a Full text index function such as containstable in a join?

    i.e. I have defined a thesaurus as such

    <XML ID="Microsoft Search Thesaurus">

    <thesaurus xmlns="x-schema:tsSchema.xml">

    <diacritics_sensitive>0</diacritics_sensitive>

    <expansion>

    <sub>Jon</sub>

    <sub>John</sub>

    <sub>Jack</sub>

    <sub>Jonathon</sub>

    <sub>Jonathon</sub>

    </expansion>

    <expansion>

    <sub>Sue</sub>

    <sub>Susie</sub>

    <sub>Suze</sub>

    <sub>Susan</sub>

    </expansion>

    <expansion>

    <sub>Tom</sub>

    <sub>Thomas</sub>

    <sub>Thom</sub>

    </expansion>

    <expansion>

    <sub>William</sub>

    <sub>Bill</sub>

    <sub>Will</sub>

    </expansion>

    <expansion>

    <sub>Richard</sub>

    <sub>Dick</sub>

    <sub>Rich</sub>

    </expansion>

    <expansion>

    <sub>Elizabeth</sub>

    <sub>Elisabeth</sub>

    <sub>Beth</sub>

    <sub>Liz</sub>

    <sub>Lisabeth</sub>

    <sub>Lisa</sub>

    </expansion>

    </thesaurus>

    </XML>

    I'm trying to join two disparate sets of data with lists of users... one from our client, one from our own AD infrastructure.

    I defined FTI on both tables for all of the text columns, and now what I want to do is join where the lastname matches in each table and use the thesaurus to match on first names

    CREATE TABLE [dbo].[ADUsers](

    [Name] [nvarchar](255) NULL,

    [First Name] [nvarchar](255) NULL,

    [Last Name] [nvarchar](255) NULL,

    [User Logon Name] [nvarchar](255) NULL,

    [userid] [int] IDENTITY(1,1) NOT NULL,

    CONSTRAINT [pk_adusers] PRIMARY KEY CLUSTERED

    (

    [userid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    /****** Object: FullTextIndex Script Date: 02/19/2009 13:52:52 ******/

    CREATE FULLTEXT INDEX ON [dbo].[ADUsers](

    [First Name] LANGUAGE [English],

    [Last Name] LANGUAGE [English],

    [Name] LANGUAGE [English],

    [User Logon Name] LANGUAGE [English])

    KEY INDEX [pk_adusers] ON [NW]

    WITH CHANGE_TRACKING AUTO

    GO

    CREATE TABLE [dbo].[Suppliedusers](

    [CUST_FRST_NM] [nvarchar](255) NULL,

    [CUST_LAST_NM] [nvarchar](255) NULL,

    [OFFICE_NAME] [nvarchar](255) NULL,

    [OFFIC] [nvarchar](255) NULL,

    [CUST_SHR] [nvarchar](255) NULL,

    [CUST_EMAL_ID] [nvarchar](255) NULL,

    [AGENT_NBR] [nvarchar](255) NULL,

    [nvarchar](255) NULL,

    [LAST_UPDATE] [nvarchar](255) NULL,

    [TERM_DATE] [nvarchar](255) NULL,

    [DESIGNATION] [nvarchar](255) NULL,

    [userid] [int] IDENTITY(1,1) NOT NULL,

    CONSTRAINT [pk_users] PRIMARY KEY CLUSTERED

    (

    [userid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    /****** Object: FullTextIndex Script Date: 02/19/2009 13:54:10 ******/

    CREATE FULLTEXT INDEX ON [dbo].[suppliedusers](

    [CUST_EMAL_ID] LANGUAGE [English],

    [CUST_FRST_NM] LANGUAGE [English],

    [CUST_LAST_NM] LANGUAGE [English],

    [CUST_SHR] LANGUAGE [English],

    [DESIGNATION] LANGUAGE [English],

    [OFFIC] LANGUAGE [English],

    [OFFICE_NAME] LANGUAGE [English])

    KEY INDEX [pk_users] ON [NW]

    WITH CHANGE_TRACKING AUTO

    GO

    Now what I want to do is to join the tables... but heres where I'm having no luck...

    select

    a.cust_frst_nm,

    a.cust_last_nm,

    b.[first name],

    b.[last name]

    from

    suppliedusers a join containstable(adusers,[first name],'FORMSOF(THESAURUS,a.cust_frst_nm)') b on

    a.cust_last_nm = b.[last name]

    Any tips on how to craft a join using a full text index function? It seems to only accept literals, not another field... 🙁



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • I'm thinking the problem here isn't so much the necessity for a literal in the contains table 'function' (for want of a better word), but it's more of basic SQL problem at hand.

    I know this is a pretty lame example that no-one would ever do, but I think the problem comes down the same problem you would have running this query:

    Select a.ID, b.Name from tblA a

    Join

    (Select b1.ID, b1.Name from tblB b1 where a.ID = b1.ID) b

    On b.ID = a.ID

    Problem here being, a.ID can not be 'seen' by the subquery, it sorta exists 'outside its world'. I should be able to explain this better but my brain is fried at the moment 😉

    Your containstable function/statement/whatever is treated the same way as the subquery above. It's own little self-contained unit which is 'unaware' of the outside query and its fields and aliases entirely.

    Wish I could give you some better news, but I don't think this can be done the way you're hoping to do it. Some type of salvation may lie in using a cursor ... so that you can always have an actual first name 'on hand' before you go to run the contains table. But I'm not entirely sure that'll work either ... time to head home for me ...

  • Yah... that's where I am now... using a cursor to fetch the first and last names from one table.. build dynamic sql to create the containstable function dumping the results in a temp table, and repeating the process..

    it sucks.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Your solution is exactly along the lines of what I was thinking was likely necessary to pull off what you're trying to do.

    To be totally honest, I'm kinda thinking a better solution might be to just build a 'names alias' table instead of using FTI. I'm not a big expert on it, but my impression is that it's mostly suited for doing comparisons against larger bodies of text, as opposed to comparing against, basically, just one word in a field like you're using it for...

  • bvaljalo (2/20/2009)


    Your solution is exactly along the lines of what I was thinking was likely necessary to pull off what you're trying to do.

    To be totally honest, I'm kinda thinking a better solution might be to just build a 'names alias' table instead of using FTI. I'm not a big expert on it, but my impression is that it's mostly suited for doing comparisons against larger bodies of text, as opposed to comparing against, basically, just one word in a field like you're using it for...

    That's actually a good suggestion...

    I managed to make it work with the cursor for now, which got me 115 matches out of 2000 records matched against 2200 records (but the non-matches were legit).

    The query looked like this.

    create table #results

    (

    applied_first_name varchar(255),

    applied_last_name varchar(255),

    user_logon_name varchar(255),

    cust_frst_nm varchar(255),

    cust_last_nm varchar(255),

    cust_email_id varchar(255),

    )

    declare @Last varchar(255)

    declare @first varchar(255)

    declare @email varchar(255)

    declare @desig varchar(255)

    declare @cmd nvarchar(max)

    declare csr_su cursor for

    select

    distinct

    cust_frst_nm,

    cust_last_nm,

    cust_emal_id

    from

    suppliedusers

    open csr_su

    fetch next from csr_su into @first,@last,@email

    while @@fetch_status = 0

    BEGIN

    print @Last

    set @Last = replace(@last,'''','''''')

    print @Last

    set @cmd = N'insert into #results

    select

    distinct

    b.[first name],

    b.[last name],

    b.,

    a.cust_frst_nm,

    a.cust_last_nm,

    a.cust_emal_id

    FROM

    suppliedusers a JOIN CONTAINSTABLE(suppliedusers,cust_frst_nm,''FORMSOF(THESAURUS,"' + @first + '",1)'') a1 on

    a.userid = a1.

    LEFT JOIN adusers b on

    a.cust_last_nm = b.[last name]

    JOIN CONTAINSTABLE(adusers,[first name],''FORMSOF(THESAURUS,"' + @first + '",1)'') b1 on

    b.userid = b1.

    WHERE a.cust_last_nm = ''' + @Last + ''' and a.cust_frst_nm = ''' + @first + '''

    and a.cust_emal_id = ''' + @email + ''''

    exec sp_executesql @cmd

    fetch next from csr_su into @first,@last,@email

    END

    close csr_su

    deallocate csr_su

    Again... I hated the cursor but I had to build the dynamic sql using FTI. Mostly because I was convinced that the thesaurus lookups would help with the fuzzy matches... the use of the names alias table is definately a better idea.

    If they come back with better lists of users for me to bounce against each other, a names alias table will be the way to go... way simpler.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply