Search against multiple keywords

  • Hi All (SQL Server 2000)

    I am trying to search a particular table (any column) for a match on a comma seperated list of keywords.

    eg 'Computer, Hardward, Software'

    What I need is an SP / UDF which will take one or more table names and the comma seperated list of key words and return the following...

    TableName, FieldName, Record ID, Text where match found, Count of match found.

    All my table have an identity field for the primary key (Record Id) but the name varies from table to table.

    The count would work as follows... If the text in a field was 'The computer has software loaded', this would return 2 (1 for computer and 1 for software).

    If the text was 'The computer computer computer has something loaded', it would return 3, 1 for each instance of Computer.

    Anyone help me with this?

    Thanks

    CCB

     

     

  • Hmmm... This will be slow on a lot of data and very clumsy to code.

    Have you investigated Full Text Search?

    It's a bit late here so I don't have time to come up with an answer but it will definitely involve some

    * dynamic SQL

    * use of the information_schema views to get the pkeys for the tables

    * some temporary tables to spool the results

    Because it will require dynamic SQL and real temporary tables (the # type, not @ variables) a UDF is out of the question - a stored proc is the only way.

    I shall try to come up with something tomorrow night for you if no one else has by then - shouldn't take too long.

  • Thanks Ian - that would be helpful

  • I would suggest a different avenue. 

    Here is parse function that will strip your keywords out into a table, (you can change the length to whatever your needs may warrant).  It is very similar to one I have used for years as a stored procedure instead of a function:

    -- Author : Madhusudan  May 02, 2002

    -- Parses a comma delimited string and returns a table with the parsed strings as rows

    CREATE FUNCTION dbo.ParseString( @ParseString varchar(255) = NULL)

     RETURNS @ParsedStringTbl TABLE( SplitString varchar(255))

    AS

    BEGIN

     DECLARE @Position int

     DECLARE @Splitstring varchar(255)

     DECLARE @StrLen int

     SELECT @StrLen = LEN( LTRIM( @ParseString))

     IF @StrLen <> 0

     BEGIN

      WHILE @strlen > 0

       BEGIN

        SELECT @Position = CHARINDEX( ',', @ParseString)

        IF @Position = 0

        BEGIN

         INSERT INTO @ParsedStringTbl VALUES( @ParseString)

         BREAK

        END

        SELECT @splitstring = SUBSTRING( @ParseString, 1, @pos - 1)

        INSERT INTO @ParsedStringTbl VALUES( @splitstring)

     

        SELECT @StrLen= @StrLen - @Position

        SELECT @ParseString = SUBSTRING( @ParseString, @Position + 1, @StrLen)

       END

     END

    RETURN

    END

     

    Once you have this information in a table, you can check for existence in your database tables.  You should probably use sysobjects and if necessary syscomments to get your table information - hence no need for dynamic SQL. 

    Here is a quick and dirty example of how to get the count of occurrences:

    DECLARE @WordSearch TABLE( PK integer IDENTITY(1,1), 

                                               Word varchar(100))

    INSERT INTO @WordSearch

    SELECT 'The computer computer computer has something loaded'

    UNION ALL

    SELECT 'The computer computer has something loaded'

    UNION ALL

    SELECT 'The computer has something loaded'

    SELECT ws.Word, COUNT(*) AS [Total]

    FROM @WordSearch ws, @WordSearch ws2

    WHERE LEN( ws.Word) <= LEN( ws2.Word)

      AND ws.word LIKE '%computer%'

      AND ws2.word LIKE '%computer%'

    GROUP BY ws.Word

    ORDER BY COUNT(*) DESC

    I wasn't born stupid - I had to study.

  • Thanks Farrell.

    An error with the count I think - it should be

    WHERE LEN( ws.Word) >= LEN( ws2.Word) not

    WHERE LEN( ws.Word) <= LEN( ws2.Word)

    Please excuse my ignorance... I am still not sure how your UDF links into this - can you explain further. Thanks.

  • Sorry I missed that - good catch.  (although, this is somewhat recursive, so I am not sure it will have an adverse impact...  dunno and haven't tested it)

    The parse string works around CHARINDEX.  The function basically finds the position of a specified character - in this case, the comma (when we use this, we usually try and define a pipe | delimited string as that is a highly unusual character).  It finds that place and then substrings the string at position one to the comma position minus one.  The next time it adds one space. 

    CHARINDEX is great, but you have to remember that it brings back the position of the character found, so if you are stripping with it, you need to either add or subtract one position each time... 

    Glad this is showing good results for you. 

    (forgot, you will need to define the @ParsedStringTbl table variable in advance to hold the data...)

    I wasn't born stupid - I had to study.

  • I just checked, (was busy with someone else's crisis) and in the words of Ed MacMann (sp) from the Johnny Carson "Tonigh Show"

    "You are correct, sir!"  I did have the greater than / less than logic reversed.  Doh! 

    I wasn't born stupid - I had to study.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply