Full Text Search does not return expected results

  • On a Full-Text Search catalog I have rebuilt it using a finite set of stop list words:

    a, an, and, la, le, les, and the

    When I execute search on the catalog it works great for all words but for some reason any search with following words:

    from, here, there, and out

    ... returns nothing.

    I executed

    SELECT *

    FROM dbo.tBooks

    WHERE contains(tBooks.fTitle, 'from')

    and

    SELECT *

    FROM dbo.tBooks

    WHERE contains(tBooks.fTitle, ' "from" ')

    Both of these return zero results.

    I executed ...

    SELECT *

    FROM dbo.tBooks

    WHERE tBooks.fTitle LIKE '%from%'

    I get 55 rows back.

    So why is "from" being excluded from the Full Text Search even though it is not in my stop list? Similar results are returned with other words that I discovered so far there might be more; any hits or suggestions?

    I am using SQL Server 2005 Enterprise Edition, this is 3rd party application (SydneyPlus).

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • Further research it appears it excludes other preposition keywords; however it does index many keywords. I found following keywords are also being excluded:

    about, after, as, at, before, between, but, by, except, for

    in, into, like, mid, of, on, onto, opposite, over, per, than, via

    Reading various articles I cannot find why these preposition keywords are being excluded when many other search-able; I confirmed there are records in the table with these keywords and those records are search-able if I use another keyword.

    I am initiating call with Microsoft to see if I can get an answer...

    Mohit.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • Have you tried looking at the internal noise word list, called noise.dat?

    I think it's under:

    C:\Program Files\Microsoft SQL Server\MSSQL\FTDATA\SQLServer\config

  • Interesting I have never read anything on that file; how is that file applied in the Index build or search functionality?

    My issue turned out to be simple; my automation process that changes the stop list on the fly and rebuilds the index doesn't quite gets the timing right. So after I changed the stop list and manually rebuilt the index, everything worked nicely :). So I'll be putting a bit of delay in my automation and see how it works :).

    I'll ask the Microsoft guy about the internal stop list, but considering he didn't tell me anything over 4 hour conversation. I don't know how much I'll find out ;-).

    Thanks again for reply.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • benyos (9/10/2009)


    Have you tried looking at the internal noise word list, called noise.dat?

    I think it's under:

    C:\Program Files\Microsoft SQL Server\MSSQL\FTDATA\SQLServer\config

    It's under $SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\ and the name depends on the language.

    See Noise Words for full details

    (Noise words are replaced by stop lists and stop words in 2008).

  • Thanks Paul.

    What Tal said is also correct, I did actually found a noise.dat file in C:\Windows\System32 that was hidden. In addition to the langauge files; I don't know the purpose of noise.dat file. I have asked Microsoft guy to shed up light on this if I hear back I'll let you know :).

    Thanks.

    -Mohit.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • I did actually found a noise.dat file in C:\Windows\System32 that was hidden. In addition to the langauge files; I don't know the purpose of noise.dat file.

    The general .dat file under system32 is used for neutral language configured columns. The enu is for US-English, eng is for British and so on...

  • Hi ๐Ÿ™‚

    I got response from Microsoft, it is for Natural Langauge processing and was used in SQL 7.0 and 2000 but starting 2005 it is not used by SQL Server any more...

    Microsoft's Response ...

    ... when we shipped SQL 2000 and SQL 7.0 we used to create Noise.Dat file under \Mssql\Ftdata\Sqlserver\Config folder. Noise.dat was actually meant for Neutral Language noise words. But with the inception of SQL Server 2005, we donโ€™t use Noise.dat anymore; rather we have replaced this file with Noiseneu.txt (For Neutral language noise words).

    Now, Windows Server family also uses Noise.Dat Files for its indexing functionality which is used by component called MSSearch. This file is usually located under \System32\ folder and is hidden. This file is not used by SQL Server and is for exclusive use of MSSearch service. I am not sure how MSSearch use this file as this is taken care by the Windows Team and SQL has no reference for this file. So if you try to find this file on your server, there will only be one file with this name and that too under \System32\ folder NOT under MSSQL.x\MSSQL\FTData folder.

    Please let me know if I have been able to provide the answer that you were looking for. Also, please let me know if you any concerns or questions regarding the stop list\contains clause\ inflectional clause or anything related to Fulltext Search, I would try my best to provide you the explanations.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • Mohit-652702 (9/4/2009)


    On a Full-Text Search catalog I have rebuilt it using a finite set of stop list words:

    a, an, and, la, le, les, and the

    When I execute search on the catalog it works great for all words but for some reason any search with following words:

    from, here, there, and out

    ... returns nothing.

    We are using full-text (i.e., CONTAINS) under SQL Server 2005 with multiple languages with no problems. We recently cleared (set to empty file) all of our noise (stop) word list files due to many problems with noise words.

    Personally, I'd clear all of the noise*.txt files. Most "search" systems, e.g., Google, do not use stop (noise) words.

    Note that both the full-text indexer and run-time query will use the "default" language LCID of the instance and/or database unless otherwise specified for the column or with the CONTAINS query. See the LANGUAGE modifier on the CONTAINS verb in BOL for further details.

    You indicated that you modified a noise word file, but is it for the correct language? I suspect that it is using a noise word file that you didn't customize.

    Try an experiment without any noise words.

    Check the following items:

    1. Make sure that the noise*.txt files are present and empty. Note that you need to find the correct folder for the specific SQL Server instance. This is found in the Registry.

    Note that you cannot just delete the noise word files. If they are not present, Microsoft update will put the default files back without telling you!

    2. Restart SQL Server and the full-text service

    3. Rebuild the full-text index(s).

    4. Check the full-text indexer crawl logs (SQLFTnnnnnnnnn.LOG) located in: C:\Program Files\Microsoft SQL Server\instance ID\MSSQL\LOG for any errors.

    Note that after any change is made to the noise word files, the index must be rebuilt.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply