implementing keyword search in a database of books

  • You mentioned you had duplicates using the first FreeTextTable. UNION eliminates dupes, so I'm not sure what you mean.

    1. To combine the result sets and eliminate duplicate rows (that is, two rows that match column for column), combine the queries with UNION.

    2. To combine the results sets without eliminating duplicate rows, use UNION ALL.

    Result sets combined with UNION or UNION ALL must consist of the same number of columns with matching data types

    ----------------------------------------------------------------------

    There are many ways to accomplish multiple search criteria. One way is to search for rows where the keyword is present in both Author AND Title at the same time, INNER JOIN two CONTAINSTABLE() result sets.

    -- Example

    DECLARE @KW varchar(100)

    SET @KW = '"sql server"'

    SELECT FT_TBL.Title

         , FT_TBL.[First Name - 1]

         , FT_TBL.[Last Name - 1]

         , KEY1.RANK AS RankAuthor

         , KEY2.RANK AS RankTitle

      FROM tbl_books AS FT_TBL

      JOIN CONTAINSTABLE (tbl_books, author, @KW) AS KEY1

        ON FT_TBL.ID = KEY1.

      JOIN CONTAINSTABLE (tbl_books, title, @KW) AS KEY2

        ON FT_TBL.ID = KEY2.

     ORDER BY CASE

                WHEN KEY1.RANK >= KEY2.RANK THEN KEY1.RANK

                ELSE KEY2.RANK

              END

           DESC

  • That works perfectly. However, for the ordering by rank, I have created a new column called Rank = RankAuthor + RankTitle and then it works even better. Ok, so for using AND, I use JOIN, for OR if I am not mistaken LEFT OUTER JOIN, but what can I use for NOT? Example: Author but NOT title. Also, I have fixed the author Query, now it works pretty well. The one last problem I am having is that there is no match within those separate joins, the rankX is just an empty field. Therefore when I add all the Ranks together to form the final rank, if there was any empty rank, the final rank is going to be empty too. Is there are ny how to specify at the begining that the Rank would be equal to 0 or something? I am so happy, almost done with this.

    SELECT FT_TBL.Title,

    FT_TBL.[Last Name - 1], FT_TBL.[First Name - 1], FT_TBL.[Series],

    KEY1.RANK AS RANK1, KEY2.RANK AS RANK2, KEY3.RANK AS RANK3,

    KEY4.RANK AS RANK4,

    (KEY1.RANK + KEY2.RANK + KEY2.RANK + KEY3.RANK) AS RANK

    FROM tbl_books AS FT_TBL " +

    JOIN FreeTextTable(tbl_books, [Last Name - 1], @KW) AS KEY1

    ON FT_TBL.ID = KEY1.[Key]

    LEFT OUTER JOIN FreeTextTable(tbl_books, [First Name - 1], @KW) AS KEY2

    ON FT_TBL.ID = KEY2.[Key]

    LEFT OUTER JOIN FreeTextTable(tbl_books, [Last Name - 2], @KW) AS KEY3

    ON FT_TBL.ID = KEY3.[Key]

    LEFT OUTER JOIN FreeTextTable(tbl_books, [First Name - 2], @KW) AS KEY4

    ON FT_TBL.ID = KEY4.[Key]

    ORDER BY RANK DESC

  • I solved the rank counting problem, I have used the COALESCE() function for assigning a value to a field in case that it is null. so the line which was previously:

    (KEY1.RANK + KEY2.RANK + KEY2.RANK + KEY3.RANK) AS RANK

    is now:

    (COALESCE(KEY1.RANK,0) + COALESCE(KEY2.RANK,0) + COALESCE(KEY2.RANK,0) + COALESCE(KEY3.RANK,0)) AS RANK

    If anybody knows about nicer solution, I would love to know it, as this one does not look too nice...but it works

  • It appears you're finally getting close to your solution. Regarding the question about NOT ( your example: Author but NOT title ), since the NOT column (title in this case) will not return any rank info, why not use

    WHERE NOT Contains(title, ----)

  • That seems fairly easy, I will give it a try. I guess there si no easy solution for using NOT SOME_OPERATION containstable(title, ...). And for ordering, if I am correct, I have to do the joins first and then do the NOT part? And in case I have multiple NOT search fields, can I use WHERE NOT (Contains(title, ...) or Contains(author,...))?

  • JOIN is part of the FROM clause. That is followed by the WHERE clause, which can have multiple NOT Contains().  Whether you connected them with OR or with AND would depend on the particular query.

     

  • So when I though that I am finally getting all the SQL right, I ran into another problem. I am trying to search for author OR title, while using the FULL OUTER JOIN. However, when I handle the author first and then title, all the fields of the table which came from the title query are empty, except for the RANK. When I do it the other way(title OR author), everything works perfectly. I have no clue what is wrong and I need this, because I am generating the queries in a script and no hand generating them. Here are the queries:

    Author OR Title:

    "SELECT FT_TBL.Title

    , FT_TBL.[Series]

    ,FT_TBL.[First Name - 1]

    , FT_TBL.[Last Name - 1]

    , KEY1.RANK AS RANK1

    ,(COALESCE(KEY1.RANK,0) + COALESCE(KEY1_A1.RANK,0) + COALESCE(KEY2_A1.RANK,0) + COALESCE(KEY3_A1.RANK,0) + COALESCE(KEY4_A1.RANK,0) + COALESCE(KEY5_A1.RANK,0) + COALESCE(KEY6_A1.RANK,0)) AS RANK

    FROM tbl_books AS FT_TBL

    JOIN

    FreeTextTable(tbl_books, [Last Name - 1], 'jamieson OR sewall OR dock OR suhrie') AS KEY1_A1 ON FT_TBL.ID = KEY1_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [First Name - 1], 'jamieson OR sewall OR dock OR suhrie') AS KEY2_A1 ON FT_TBL.ID = KEY2_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [Last Name - 2], 'jamieson OR sewall OR dock OR suhrie') AS KEY3_A1 ON FT_TBL.ID = KEY3_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [First Name - 2], 'jamieson OR sewall OR dock OR suhrie') AS KEY4_A1 ON FT_TBL.ID = KEY4_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [Last Name- 3], 'jamieson OR sewall OR dock OR suhrie') AS KEY5_A1 ON FT_TBL.ID = KEY5_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [First Name - 3], 'jamieson OR sewall OR dock OR suhrie') AS KEY6_A1 ON FT_TBL.ID = KEY6_A1.[Key]

    FULL OUTER JOIN

    FREETEXTTABLE (tbl_books, Title,'Nursing education in the South') AS KEY1 ON FT_TBL.ID = KEY1."

    Title and Author:

    "SELECT FT_TBL.Title

    , FT_TBL.[Series]

    ,FT_TBL.[First Name - 1]

    , FT_TBL.[Last Name - 1]

    , KEY1.RANK AS RANK1

    ,(COALESCE(KEY1.RANK,0) + COALESCE(KEY1_A1.RANK,0) + COALESCE(KEY2_A1.RANK,0) + COALESCE(KEY3_A1.RANK,0) + COALESCE(KEY4_A1.RANK,0) + COALESCE(KEY5_A1.RANK,0) + COALESCE(KEY6_A1.RANK,0)) AS RANK

    FROM tbl_books AS FT_TBL

    JOIN

    FREETEXTTABLE (tbl_books, Title,'Nursing education in the South') AS KEY1 ON FT_TBL.ID = KEY1.

    FULL OUTER JOIN

    FreeTextTable(tbl_books, [Last Name - 1], 'jamieson OR sewall OR dock OR suhrie') AS KEY1_A1 ON FT_TBL.ID = KEY1_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [First Name - 1], 'jamieson OR sewall OR dock OR suhrie') AS KEY2_A1 ON FT_TBL.ID = KEY2_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [Last Name - 2], 'jamieson OR sewall OR dock OR suhrie') AS KEY3_A1 ON FT_TBL.ID = KEY3_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [First Name - 2], 'jamieson OR sewall OR dock OR suhrie') AS KEY4_A1 ON FT_TBL.ID = KEY4_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [Last Name- 3], 'jamieson OR sewall OR dock OR suhrie') AS KEY5_A1 ON FT_TBL.ID = KEY5_A1.[Key] LEFT OUTER JOIN

    FreeTextTable(tbl_books, [First Name - 3], 'jamieson OR sewall OR dock OR suhrie') AS KEY6_A1 ON FT_TBL.ID = KEY6_A1.[Key] "

  • Try this - it's a bit like a dynamic SQL solution, but CONTAINSTABLE takes a string param so it's perhaps acceptable. It might give you some ideas, anyway. 

     
    set nocount on

    --declarations

    declare

    @words varchar(1000), @wordcount int, @i int

    declare

    @sql varchar(8000), @sql2 varchar(8000), @sql3 varchar(8000)

    declare

    @word table(id int identity, word varchar(40))

    declare

    @expr table(rank int, wordid int, expr varchar(8000),length as len(expr))

    --initialisation

    select

    @words = 'rim,steel,aluminium,wheel,strong,spokes'

    ,

    @sql = 'select ''"' + replace(replace(@words, char(39), char(39)+char(39)),',','"'' union all select ''"') + '"'''

    insert

    @word(word) exec(@sql)

    insert

    @expr(rank, wordid, expr) select 1, id, word from @word

    select

    @wordcount = @@rowcount, @i = 2

    --building NEAR expressions

    while

    @i <= @wordcount

    begin

    insert @expr(rank,wordid, expr)

    select @i, w.id, e.expr + ' NEAR ' + w.word

    from @expr e join @word w
    on charindex(w.word,e.expr) = 0
    and w.id > e.wordid
    where e.rank = @i-1
    select @i = @i + 1

    end

    --building ISABOUT() call
    --use rank to generate WEIGHT param,
    --complex issue - could be refined.
    --each combination will be accompanied by all its subsets.

    declare

    @STR varchar(8000)

    select

    @STR = 'ISABOUT('

    select

    @STR = @STR + expr + ' weight('+str(1.000/((@wordcount-rank)+1.0),3,1)+ '),

    '

    from

    @expr

    where

    expr is not null

    select

    @STR = substring(@STR,1,len(@STR)-3) + ')'

    --debug: print ISABOUT() call

    print

    @STR

    --pass ISABOUT string to CONTAINSTABLE
    --I think you need containstable to get the ranking info.

    select

    CT.RANK,p.description

    from

    Production.ProductDescription p

    join

    containstable(

    Production

    .ProductDescription

    ,

    Description

    ,

    @STR

    )

    as CT

    on

    CT. = p.ProductDescriptionID

    order

    by CT.RANK desc

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Your second query

    is (in relevant respects) of the form:

    --Title and Author:

    select

    1

    from

    tbl_books B

    join

    (select ID from tbl_books where col1 = 1) B1 on B1.ID = B.ID

    full

    outer join (select ID from tbl_books where col1 = 2) B2 on B2.ID = B1.ID

    left

    join (select ID from tbl_books where col1 = 3) B3 on B3.ID = B2.ID

    left

    join (select ID from tbl_books where col1 = 4) B4 on B4.ID = B3.ID

    left

    join (select ID from tbl_books where col1 = 5) B5 on B5.ID = B4.ID

    left

    join (select ID from tbl_books where col1 = 6) B6 on B6.ID = B5.ID

    left

    join (select ID from tbl_books where col1 = 7) B7 ON B7.ID = B6.ID

     
    so the first join limits the results,
    meaning that the full outer join may produce nulls in the columns of B and B1
    if it produces any results not already in the first derived table (B1).
    This means that you won't get the data from B for any row that doesn't match
    your first (inner joined) condition.
    The rest of the joins cannot change the number of rows returned
    (assuming no duplicate IDs which could multiply results!)
    so the query is limited to those rows matching the first two conditions.
    To make you existing queries better (leaving aside any performance issues,
    and subject to refinement):
     

    --Title and Author 2:

    select

    1

    from

    tbl_books B

    left

    join (select ID from tbl_books where col1 = 1) B1 on B1.ID = B.ID

    left

    join (select ID from tbl_books where col1 = 2) B2 on B2.ID = B1.ID

    left

    join (select ID from tbl_books where col1 = 3) B3 on B3.ID = B2.ID

    left

    join (select ID from tbl_books where col1 = 4) B4 on B4.ID = B3.ID

    left

    join (select ID from tbl_books where col1 = 5) B5 on B5.ID = B4.ID

    left

    join (select ID from tbl_books where col1 = 6) B6 on B6.ID = B5.ID

    left

    join (select ID from tbl_books where col1 = 7) B7 ON B7.ID = B6.ID

    where

    (B2.ID is not null

    or B3.ID is not null
    or B4.ID is not null
    or B5.ID is not null
    or B6.ID is not null
    or B7.ID is not null)
     

    However

    , it would probably be preferable if you didn't need to have 7 'keys' tables

    being generated, then joined. One problem is your database design - the names
    fields probably (arguably) violate 1st normal form - i.e. that each row contain only
    one value per attribute. Even if you had only one author per book,
    the fact that you have firstname and lastname, rather than an AuthorID,
    mean (again arguably) that it also violates 3rd normal form - all non-key attributes
    are mutually independent.
    It might be better to move the authors into a new table, so you would only need to
    search two columns instead of six, and generally simplify things.
    Even if you don't do that, you could specify all 6 columns in a single

    FREETEXT

    , FREETEXTTABLE, CONTAINS, or CONTAINSTABLE statement.

    I would

    use CONTAINSTABLE - since the FREETEXT variants don't allow NEAR and WEIGHT,

    and CONTAINS doesn't return a RANK.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Finally after all troubles I got it work! The search is pretty good, I know that my query is probably violating all the rules, but unfortunately, I cannot influence the table design. I was able to get all the logic working(AND, OR, NOT) so I am very happy right now. Thank you all for your help!

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply