May 9, 2008 at 11:41 am
I am searching a single TEXT column for 25+ keywords and I need to return the recordID and the matching keyword.
A portion of the current query:
Declare @Results table(Keyword varchar(255), RecordID int)
DECLARE curFF CURSOR Fast_Forward FOR
SELECT Keyword from KeyWords WHERE IsActive=1
OPEN curFF
FETCH NEXT FROM curFF INTO @Keyword
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO @Results
SELECT @Keyword, RecordID
FROM Email
WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate
AND SenderEmailAddress Like @user-id
AND CONTAINS(HTMLBody,@Keyword)
FETCH NEXT FROM curFF INTO @Keyword
END
CLOSE curFF
DEALLOCATE curFF
It is of course MUCH faster if I don't use a cursor and change the @Keyword variable to contain all of the keywords separated by OR. However, I have not found a method that will return which keyword was the result of the match when using that method.
Any suggestions would be appreciated.
May 9, 2008 at 12:00 pm
Tim OPry (5/9/2008)
I am searching a single TEXT column for 25+ keywords and I need to return the recordID and the matching keyword.A portion of the current query:
Declare @Results table(Keyword varchar(255), RecordID int)
DECLARE curFF CURSOR Fast_Forward FOR
SELECT Keyword from KeyWords WHERE IsActive=1
OPEN curFF
FETCH NEXT FROM curFF INTO @Keyword
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO @Results
SELECT @Keyword, RecordID
FROM Email
WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate
AND SenderEmailAddress Like @user-id
AND CONTAINS(HTMLBody,@Keyword)
FETCH NEXT FROM curFF INTO @Keyword
END
CLOSE curFF
DEALLOCATE curFF
It is of course MUCH faster if I don't use a cursor and change the @Keyword variable to contain all of the keywords separated by OR. However, I have not found a method that will return which keyword was the result of the match when using that method.
Any suggestions would be appreciated.
You could try:
INSERT INTO @Results
SELECT kw.Keyword, RecordID
FROM Email inner join
(SELECT Keyword from KeyWords WHERE IsActive=1) kw
on CONTAINS(HTMLBody,kw.keyword)
WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate
AND SenderEmailAddress Like @user-id
Not sure if CONTAINS can be used in the join criteria. If not then go with a cross join:
INSERT INTO @Results
SELECT kw.Keyword, RecordID
FROM Email, (SELECT Keyword from KeyWords WHERE IsActive=1) kw
WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate
AND SenderEmailAddress Like @user-id
AND CONTAINS(HTMLBody,kw.keyword)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 9, 2008 at 12:22 pm
Thank you for the suggestion, but both return: "Incorrect syntax near 'kw'.", pointing to the line with the CONTAINS keyword.
I haven't been able to create a join using CONTAINS.
May 9, 2008 at 1:38 pm
Tim OPry (5/9/2008)
Thank you for the suggestion, but both return: "Incorrect syntax near 'kw'.", pointing to the line with the CONTAINS keyword.I haven't been able to create a join using CONTAINS.
Hmm - I could swear I had done that before - but I can't seem to now, so it must be Friday afternoon:).
different approach (not sure it this helps your perf at all):
declare @keywords nvarchar(4000)
set @keywords=substring(STUFF((select ' OR "'+keyword+'"' from KeyWords where keyword not like 't%' FOR XML PATH('')),1,1,''),4,400);
INSERT INTO @Results
SELECT kw.Keyword, RecordID
FROM (select *
from Email
WHERE ReceivedTime>=@StartDate
AND ReceivedTime<@EndDate
AND SenderEmailAddress Like @user-id
AND CONTAINS(HTMLBody,@keywords)) e
inner join (SELECT Keyword
from KeyWords
WHERE IsActive=1) kw
on patindex('% '+keyword+' %',' '+htmlbody+' ')
You may need to slightly adjust the patindex depending on what type of keywords you're up against...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 9, 2008 at 2:23 pm
Thanks again - but still no joy. The join fails - should it not have a comparative after patindex, but even when I added somethingn it cannot match the HTMLBody column. (invalid column name)
May 9, 2008 at 2:47 pm
Tim OPry (5/9/2008)
Thanks again - but still no joy. The join fails - should it not have a comparative after patindex, but even when I added somethingn it cannot match the HTMLBody column. (invalid column name)
YUP - something "ate" my >0 after patindex.... hmm - this does seem to work here. I've been assuming HTMLBODY is in the email table, no?
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 12, 2008 at 6:57 am
I assumed it should have been >0 and yes, HTMLBody is the TEXT column within the Email table.
There was an issue with some of the keywords ( this is a SQL 2000 box so XML PATH is not supported and my quick equivalent...wasn't.)
I have it working now. (other than the replacment for concatenating all of the keywords into one string). Once I get that working, I will run some tests and let you know.
Thank you again for the help.
May 12, 2008 at 2:48 pm
Just wanted to give you a follow-up - while doing a single SELECT with the keywords separated by a ' OR ' is of course much faster/more efficient, ironically the JOIN using PATINDEX makes it a little slower on average than using a cursor (based upon our normal queries).
If I did not need the individual keywords that matched each record, I could speed this up considerably.
Of course I use yet another cursor to return the combined results from the @Results table object in the form of:
RecordID1 Keyword1, Keyword2, Keyword3
RecordID2 Keyword2, Keyword4
etc..
I've often struggled with the TSQL syntax in these scenarios - to NOT use a cursor.
Your solution was one I would not have thought of, so I appreciate the time and help. Too bad the Join using CONTAINS doesn't work, that was a lot cleaner/straight forward.
May 12, 2008 at 9:54 pm
How many words are in HTMLBody and can I see an example of 10 rows, please?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 6:49 am
The table stores the basic information of our company email, HTMLBody is the HTML contents of the email body. It can vary from a few dozen words to a few hundred K or more. Only export that will provide it in a useable format is a database backup. I copied a few non-sensitive (spam) records to an empty database and backed that up - it is attached as a zip.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply