February 19, 2009 at 12:57 pm
I'm not even sure this is doable, but I'm going to try.
Any tips on how to use a Full text index function such as containstable in a join?
i.e. I have defined a thesaurus as such
<XML ID="Microsoft Search Thesaurus">
<thesaurus xmlns="x-schema:tsSchema.xml">
<diacritics_sensitive>0</diacritics_sensitive>
<expansion>
<sub>Jon</sub>
<sub>John</sub>
<sub>Jack</sub>
<sub>Jonathon</sub>
<sub>Jonathon</sub>
</expansion>
<expansion>
<sub>Sue</sub>
<sub>Susie</sub>
<sub>Suze</sub>
<sub>Susan</sub>
</expansion>
<expansion>
<sub>Tom</sub>
<sub>Thomas</sub>
<sub>Thom</sub>
</expansion>
<expansion>
<sub>William</sub>
<sub>Bill</sub>
<sub>Will</sub>
</expansion>
<expansion>
<sub>Richard</sub>
<sub>Dick</sub>
<sub>Rich</sub>
</expansion>
<expansion>
<sub>Elizabeth</sub>
<sub>Elisabeth</sub>
<sub>Beth</sub>
<sub>Liz</sub>
<sub>Lisabeth</sub>
<sub>Lisa</sub>
</expansion>
</thesaurus>
</XML>
I'm trying to join two disparate sets of data with lists of users... one from our client, one from our own AD infrastructure.
I defined FTI on both tables for all of the text columns, and now what I want to do is join where the lastname matches in each table and use the thesaurus to match on first names
CREATE TABLE [dbo].[ADUsers](
[Name] [nvarchar](255) NULL,
[First Name] [nvarchar](255) NULL,
[Last Name] [nvarchar](255) NULL,
[User Logon Name] [nvarchar](255) NULL,
[userid] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [pk_adusers] PRIMARY KEY CLUSTERED
(
[userid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: FullTextIndex Script Date: 02/19/2009 13:52:52 ******/
CREATE FULLTEXT INDEX ON [dbo].[ADUsers](
[First Name] LANGUAGE [English],
[Last Name] LANGUAGE [English],
[Name] LANGUAGE [English],
[User Logon Name] LANGUAGE [English])
KEY INDEX [pk_adusers] ON [NW]
WITH CHANGE_TRACKING AUTO
GO
CREATE TABLE [dbo].[Suppliedusers](
[CUST_FRST_NM] [nvarchar](255) NULL,
[CUST_LAST_NM] [nvarchar](255) NULL,
[OFFICE_NAME] [nvarchar](255) NULL,
[OFFIC] [nvarchar](255) NULL,
[CUST_SHR] [nvarchar](255) NULL,
[CUST_EMAL_ID] [nvarchar](255) NULL,
[AGENT_NBR] [nvarchar](255) NULL,
[nvarchar](255) NULL,
[LAST_UPDATE] [nvarchar](255) NULL,
[TERM_DATE] [nvarchar](255) NULL,
[DESIGNATION] [nvarchar](255) NULL,
[userid] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [pk_users] PRIMARY KEY CLUSTERED
(
[userid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: FullTextIndex Script Date: 02/19/2009 13:54:10 ******/
CREATE FULLTEXT INDEX ON [dbo].[suppliedusers](
[CUST_EMAL_ID] LANGUAGE [English],
[CUST_FRST_NM] LANGUAGE [English],
[CUST_LAST_NM] LANGUAGE [English],
[CUST_SHR] LANGUAGE [English],
[DESIGNATION] LANGUAGE [English],
[OFFIC] LANGUAGE [English],
[OFFICE_NAME] LANGUAGE [English])
KEY INDEX [pk_users] ON [NW]
WITH CHANGE_TRACKING AUTO
GO
Now what I want to do is to join the tables... but heres where I'm having no luck...
select
a.cust_frst_nm,
a.cust_last_nm,
b.[first name],
b.[last name]
from
suppliedusers a join containstable(adusers,[first name],'FORMSOF(THESAURUS,a.cust_frst_nm)') b on
a.cust_last_nm = b.[last name]
Any tips on how to craft a join using a full text index function? It seems to only accept literals, not another field... 🙁
February 19, 2009 at 6:35 pm
I'm thinking the problem here isn't so much the necessity for a literal in the contains table 'function' (for want of a better word), but it's more of basic SQL problem at hand.
I know this is a pretty lame example that no-one would ever do, but I think the problem comes down the same problem you would have running this query:
Select a.ID, b.Name from tblA a
Join
(Select b1.ID, b1.Name from tblB b1 where a.ID = b1.ID) b
On b.ID = a.ID
Problem here being, a.ID can not be 'seen' by the subquery, it sorta exists 'outside its world'. I should be able to explain this better but my brain is fried at the moment 😉
Your containstable function/statement/whatever is treated the same way as the subquery above. It's own little self-contained unit which is 'unaware' of the outside query and its fields and aliases entirely.
Wish I could give you some better news, but I don't think this can be done the way you're hoping to do it. Some type of salvation may lie in using a cursor ... so that you can always have an actual first name 'on hand' before you go to run the contains table. But I'm not entirely sure that'll work either ... time to head home for me ...
February 20, 2009 at 7:38 am
Yah... that's where I am now... using a cursor to fetch the first and last names from one table.. build dynamic sql to create the containstable function dumping the results in a temp table, and repeating the process..
it sucks.
February 20, 2009 at 11:16 am
Your solution is exactly along the lines of what I was thinking was likely necessary to pull off what you're trying to do.
To be totally honest, I'm kinda thinking a better solution might be to just build a 'names alias' table instead of using FTI. I'm not a big expert on it, but my impression is that it's mostly suited for doing comparisons against larger bodies of text, as opposed to comparing against, basically, just one word in a field like you're using it for...
February 20, 2009 at 12:20 pm
bvaljalo (2/20/2009)
Your solution is exactly along the lines of what I was thinking was likely necessary to pull off what you're trying to do.To be totally honest, I'm kinda thinking a better solution might be to just build a 'names alias' table instead of using FTI. I'm not a big expert on it, but my impression is that it's mostly suited for doing comparisons against larger bodies of text, as opposed to comparing against, basically, just one word in a field like you're using it for...
That's actually a good suggestion...
I managed to make it work with the cursor for now, which got me 115 matches out of 2000 records matched against 2200 records (but the non-matches were legit).
The query looked like this.
create table #results
(
applied_first_name varchar(255),
applied_last_name varchar(255),
user_logon_name varchar(255),
cust_frst_nm varchar(255),
cust_last_nm varchar(255),
cust_email_id varchar(255),
)
declare @Last varchar(255)
declare @first varchar(255)
declare @email varchar(255)
declare @desig varchar(255)
declare @cmd nvarchar(max)
declare csr_su cursor for
select
distinct
cust_frst_nm,
cust_last_nm,
cust_emal_id
from
suppliedusers
open csr_su
fetch next from csr_su into @first,@last,@email
while @@fetch_status = 0
BEGIN
print @Last
set @Last = replace(@last,'''','''''')
print @Last
set @cmd = N'insert into #results
select
distinct
b.[first name],
b.[last name],
b.,
a.cust_frst_nm,
a.cust_last_nm,
a.cust_emal_id
FROM
suppliedusers a JOIN CONTAINSTABLE(suppliedusers,cust_frst_nm,''FORMSOF(THESAURUS,"' + @first + '",1)'') a1 on
a.userid = a1.
LEFT JOIN adusers b on
a.cust_last_nm = b.[last name]
JOIN CONTAINSTABLE(adusers,[first name],''FORMSOF(THESAURUS,"' + @first + '",1)'') b1 on
b.userid = b1.
WHERE a.cust_last_nm = ''' + @Last + ''' and a.cust_frst_nm = ''' + @first + '''
and a.cust_emal_id = ''' + @email + ''''
exec sp_executesql @cmd
fetch next from csr_su into @first,@last,@email
END
close csr_su
deallocate csr_su
Again... I hated the cursor but I had to build the dynamic sql using FTI. Mostly because I was convinced that the thesaurus lookups would help with the fuzzy matches... the use of the names alias table is definately a better idea.
If they come back with better lists of users for me to bounce against each other, a names alias table will be the way to go... way simpler.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply