June 9, 2014 at 4:29 am
So I'm trying out full-text indexing for the first time and, in particular, FileTables in SQL Server 2012. I've followed a Microsoft walkthrough and everything seems to be ok. However, when I query the table using the CONTAINS keyword, I get no results (a regular query to make sure there are records in the table returns the expected number of results).
I'm now trying to troubleshoot, and have been using the FULLTEXTCATALOGPROPERTY function, but I don't understand the results.
If I run SELECT FULLTEXTCATALOGPROPERTY(N'CatlogName',N'ItemCount'), I get a result of 51. There are 96 documents in the NTFS folder where the documents are stored, and the table has 96 records, so I don't know where 51 is coming from. 55 of the documents are .DOC files, the rest are .PDF, and some (or maybe all) of the PDFs are scanned images of documents, which I don't expect to be indexed, so maybe that explains it. And in another thread in these forums, a poster suggests that the result for this function should be either 0 or 1, with 0 meaning that no documents are pending indexing, but maybe I've misunderstood that.
If I run SELECT FULLTEXTCATALOGPROPERTY(N'CatalogName',N'UniqueKeyCount'), I get a result of 2. I have got two full-text indexes in this catalog (one on the FileTable, one on a regular table with FT enabled). Is this result therefore expected? Again, reading online seems to suggest that a result of 0 is desirable, but I don't understand why, and if it is I don't understand why my result is 2!
EDIT: I've now also run SELECT* FROM sys.dm_fts_index_keywords(DB_ID('DatabaseName'), Object_ID('dbo.FileTableName)), which I believe is supposed to list all of the indexed words from the table specified. I get one row returned, as follows:
keyword: 0xFF
display_term: END OF FILE
column_id: 2
document_count: 40
So basically, it's not indexed any words at all. And why is the document count only 40 when there are 96 documents in the folder and table?
June 10, 2014 at 4:33 am
Ok, I now know what's causing this problem, but I have no idea how to fix it, or if a fix is even possible.
I discovered that for every document in my File Table, the crawl log contained the following error:
2014-06-10 10:43:17.21 spid21s Error '0x8004fd02: The filter daemon MSFTEFD failed to load an IFilter interface for document, so it can't be indexed.' occurred during full-text index population for table or indexed view '[FilestreamTestDB].[dbo].[filetable2]' (table or indexed view ID '18099105', database ID '5'), full-text key value '/7651524419116.116877223393438.3622501745/'. Attempt will be made to reindex it.
After doing some research, I came across this MSDN thread, where another user experiencing the same problem had been in touch with Microsoft and established that this error was the result of a known problem with the iFilter for .doc files.
Having learned this, I tried two things. Firstly, I added a much more recently-created .doc file to my File Table (the ones I'd been testing with were about 5 years old and probably created in Word 2003/2007). Secondly, I opened one of the old files that I'd been testing with in Word 2010 and resaved it as type .doc (Word 97-2003) - exactly the same as it was before, in theory. I then forced my full-text index to apply tracked changes and, lo and behold, those two documents were both indexed correctly.
So, I've established that I've done everything right, that the problem is actually a known issue with Microsoft's .doc. iFilter, and that I can cause an old document to be indexed by opening and re-saving it in a newer version of Word. Problem is, I have over 100,000 Word documents I need to index - clearly I'm not going to be able to open and re-save all of them!
Any ideas? Anyone come across this before?
June 11, 2014 at 2:53 am
Not much of a response here, but the bottom line is that this is a known issue with Microsoft's IFilter for Word. I'm going to try and work around it for now by using Microsoft's Office Compatibility Pack to convert all of my old .doc files to .docx, since they seem to get indexed just fine.
If you arrive here because you've having the same problem as me, please go here to add your voice to the bug report to Microsoft. The guy who originated this bug report has been in direct contact with Microsoft, who have confirmed that they know of the issue and, quite shockingly, that they don't have any plans to fix it.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply