Lots of keywords to match!

  • Hi guys,

    I was wondering if you could help me. I have a keyword matching system idea and I need to figure out if it is feasible and if so, roughly how I would go about doing it.

    Basically, I have an article being posted on my site. The hard part is that each word of that article then needs to be used as a 'keyword' to match up against users (via a comma seperated list of words in the users table) and return the list of matches.

    Is there a way of doing this? Would Full-Text Indexing help? Any comments would be greatly appreciated!!!

  • [font="Arial"]

    Hello,

    I would create a working table with a single field to store each word used. I would create an index on it and then using that table, compare the words to the user comma delimited string which should also be loaded via word token or a delimiter routine.

    There are word token subroutines available to help with loading the single field table.

    The comparison of the two table entries would be fairly simple at that point.

    Regards,

    Terry

    [/font]

  • tbeadle (2/20/2008)


    [font="Arial"]

    There are word token subroutines available to help with loading the single field table.The comparison of the two table entries would be fairly simple at that point.[/font]

    Good thoughts, Terry... but where might one find the "token subroutines" that you're talking about?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Simong,

    Here's a simple "splitter" to normalize the CSV entries you have so you do a join instead of a "LIKE"...

    --===== Create a table to hold user ids and a word list for each user

    -- THIS IS NOT A PART OF THE SOLUTION

    CREATE TABLE #UserWords (UserID INT PRIMARY KEY, WordList VARCHAR(8000))

    INSERT INTO #UserWords

    (UserID,WordList)

    SELECT 1,'this,is,a,test,for,user,one,wordlist' UNION ALL

    SELECT 2,'this,is,the,word,list,for,user2,and,its,different'

    --===== Split the information from the table above to "normalize" it.

    SELECT UserID,

    SUBSTRING(','+uw.WordList, t.N+1, CHARINDEX(',', uw.WordList+',', t.N+1)-t.N) AS Val

    FROM dbo.Tally t,

    #UserWords uw

    WHERE SUBSTRING(','+uw.WordList, t.N, 1) = ','

    AND t.N <= LEN(','+uw.WordList)

    drop table #userwords

    Just in case you don't know what a Tally table is, it's nothing more than a table with a single column of well indexed sequential numbers and it's used for counting (hence the name "Tally). It has dozens of uses that would normally require a While Loop and splitting CSV's happens to be one of them.

    Here's how to make one...

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply