Can this be done without a cursor?

  • I am searching a single TEXT column for 25+ keywords and I need to return the recordID and the matching keyword.

    A portion of the current query:

    Declare @Results table(Keyword varchar(255), RecordID int)

    DECLARE curFF CURSOR Fast_Forward FOR

    SELECT Keyword from KeyWords WHERE IsActive=1

    OPEN curFF

    FETCH NEXT FROM curFF INTO @Keyword

    WHILE @@FETCH_STATUS = 0

    BEGIN

    INSERT INTO @Results

    SELECT @Keyword, RecordID

    FROM Email

    WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate

    AND SenderEmailAddress Like @user-id

    AND CONTAINS(HTMLBody,@Keyword)

    FETCH NEXT FROM curFF INTO @Keyword

    END

    CLOSE curFF

    DEALLOCATE curFF

    It is of course MUCH faster if I don't use a cursor and change the @Keyword variable to contain all of the keywords separated by OR. However, I have not found a method that will return which keyword was the result of the match when using that method.

    Any suggestions would be appreciated.

  • Tim OPry (5/9/2008)


    I am searching a single TEXT column for 25+ keywords and I need to return the recordID and the matching keyword.

    A portion of the current query:

    Declare @Results table(Keyword varchar(255), RecordID int)

    DECLARE curFF CURSOR Fast_Forward FOR

    SELECT Keyword from KeyWords WHERE IsActive=1

    OPEN curFF

    FETCH NEXT FROM curFF INTO @Keyword

    WHILE @@FETCH_STATUS = 0

    BEGIN

    INSERT INTO @Results

    SELECT @Keyword, RecordID

    FROM Email

    WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate

    AND SenderEmailAddress Like @user-id

    AND CONTAINS(HTMLBody,@Keyword)

    FETCH NEXT FROM curFF INTO @Keyword

    END

    CLOSE curFF

    DEALLOCATE curFF

    It is of course MUCH faster if I don't use a cursor and change the @Keyword variable to contain all of the keywords separated by OR. However, I have not found a method that will return which keyword was the result of the match when using that method.

    Any suggestions would be appreciated.

    You could try:

    INSERT INTO @Results

    SELECT kw.Keyword, RecordID

    FROM Email inner join

    (SELECT Keyword from KeyWords WHERE IsActive=1) kw

    on CONTAINS(HTMLBody,kw.keyword)

    WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate

    AND SenderEmailAddress Like @user-id

    Not sure if CONTAINS can be used in the join criteria. If not then go with a cross join:

    INSERT INTO @Results

    SELECT kw.Keyword, RecordID

    FROM Email, (SELECT Keyword from KeyWords WHERE IsActive=1) kw

    WHERE ReceivedTime>=@StartDate AND ReceivedTime<@EndDate

    AND SenderEmailAddress Like @user-id

    AND CONTAINS(HTMLBody,kw.keyword)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thank you for the suggestion, but both return: "Incorrect syntax near 'kw'.", pointing to the line with the CONTAINS keyword.

    I haven't been able to create a join using CONTAINS.

  • Tim OPry (5/9/2008)


    Thank you for the suggestion, but both return: "Incorrect syntax near 'kw'.", pointing to the line with the CONTAINS keyword.

    I haven't been able to create a join using CONTAINS.

    Hmm - I could swear I had done that before - but I can't seem to now, so it must be Friday afternoon:).

    different approach (not sure it this helps your perf at all):

    declare @keywords nvarchar(4000)

    set @keywords=substring(STUFF((select ' OR "'+keyword+'"' from KeyWords where keyword not like 't%' FOR XML PATH('')),1,1,''),4,400);

    INSERT INTO @Results

    SELECT kw.Keyword, RecordID

    FROM (select *

    from Email

    WHERE ReceivedTime>=@StartDate

    AND ReceivedTime<@EndDate

    AND SenderEmailAddress Like @user-id

    AND CONTAINS(HTMLBody,@keywords)) e

    inner join (SELECT Keyword

    from KeyWords

    WHERE IsActive=1) kw

    on patindex('% '+keyword+' %',' '+htmlbody+' ')

    You may need to slightly adjust the patindex depending on what type of keywords you're up against...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks again - but still no joy. The join fails - should it not have a comparative after patindex, but even when I added somethingn it cannot match the HTMLBody column. (invalid column name)

  • Tim OPry (5/9/2008)


    Thanks again - but still no joy. The join fails - should it not have a comparative after patindex, but even when I added somethingn it cannot match the HTMLBody column. (invalid column name)

    YUP - something "ate" my >0 after patindex.... hmm - this does seem to work here. I've been assuming HTMLBODY is in the email table, no?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I assumed it should have been >0 and yes, HTMLBody is the TEXT column within the Email table.

    There was an issue with some of the keywords ( this is a SQL 2000 box so XML PATH is not supported and my quick equivalent...wasn't.)

    I have it working now. (other than the replacment for concatenating all of the keywords into one string). Once I get that working, I will run some tests and let you know.

    Thank you again for the help.

  • Just wanted to give you a follow-up - while doing a single SELECT with the keywords separated by a ' OR ' is of course much faster/more efficient, ironically the JOIN using PATINDEX makes it a little slower on average than using a cursor (based upon our normal queries).

    If I did not need the individual keywords that matched each record, I could speed this up considerably.

    Of course I use yet another cursor to return the combined results from the @Results table object in the form of:

    RecordID1 Keyword1, Keyword2, Keyword3

    RecordID2 Keyword2, Keyword4

    etc..

    I've often struggled with the TSQL syntax in these scenarios - to NOT use a cursor.

    Your solution was one I would not have thought of, so I appreciate the time and help. Too bad the Join using CONTAINS doesn't work, that was a lot cleaner/straight forward.

  • How many words are in HTMLBody and can I see an example of 10 rows, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The table stores the basic information of our company email, HTMLBody is the HTML contents of the email body. It can vary from a few dozen words to a few hundred K or more. Only export that will provide it in a useable format is a database backup. I copied a few non-sensitive (spam) records to an empty database and backed that up - it is attached as a zip.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply