June 18, 2013 at 8:13 am
I've scoured all the posts on this and followed all the advice and examples and still can't seem to get SQL to index pdfs or office documents so I figure I must be missing something really basic!
I have set up a database table for the documents and checked the various filters are installed and enabled (see code below).
I know that full text is installed and working as ifI upload a text document via a webpage it indexes fine and a containTable picks up the indexed words. If I do the same with a pdf or word doc then there are no errors, and the fulltext properties say that the document has been added but no index terms appear (using SELECT display_term, column_id, document_count FROM sys.dm_fts_index_keywords (DB_ID('test'), OBJECT_ID('documents'))
Any help at all greatly appreciated as I'm losing marbles over this!
Ta,
Jeff
/* code so far*/
/*not sure of the order some of these statements should appear in but have tried various permutations.. clearly not the right one! */
CREATE TABLE [dbo].[Documents]
(
[ID] INT IDENTITY(1000000,1) ,
[Extension] [VARCHAR] (10) NOT NULL ,
[Content] [VARBINARY] (MAX) NOT NULL ,
[FileSize] [INT] NOT NULL ,
[FileName] [NVARCHAR] (500) NOT NULL ,
[Stamp] [TIMESTAMP] NOT NULL
)
ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[Documents] WITH NOCHECK
ADD CONSTRAINT [PK_Documents] PRIMARY KEY CLUSTERED ([ID])
GO
Exec sp_fulltext_service 'load_os_resources',1
Exec sp_fulltext_service 'verify_signature',0
EXEC sp_fulltext_service 'update_languages'
reconfigure with override
CREATE FULLTEXT CATALOG testcatalog
GO
CREATE FULLTEXT INDEX ON [dbo].[Documents]
(
content TYPE COLUMN extension Language 1033
)
KEY INDEX pk_documents
ON testcatalog;
GO
if (select DATABASEPROPERTY(DB_NAME(), N'IsFullTextEnabled')) <> 1
exec sp_fulltext_database N'enable'
GO
if not exists (select * from dbo.sysfulltextcatalogs where name = N'Documents')
BEGIN
SELECT 'Creating new FT Catalogue'
exec sp_fulltext_catalog N'Documents', N'create'
end
GO
exec sp_fulltext_table N'[dbo].[Documents]', N'activate'
GO
/*
check adobe filter installed
EXEC sp_help_fulltext_system_components 'filter' --pdf and doc filters show up paths correct!!
SELECT * from sys.fulltext_document_types
*/
EXEC sp_fulltext_service 'restart_all_fdhosts' --tried out of desperation - no luck!
June 18, 2013 at 9:15 am
ok, slight clarification..
when running the command:
SELECT display_term, column_id, document_count FROM sys.dm_fts_index_keywords (DB_ID('test'), OBJECT_ID('documents'))
There are initially no entries as expected but after the first pdf / doc or whatever file is added there is a single entry - Display Term - 'END OF FILE'.
When adding a txt or csv file then the display terms get populated as expected.
..any ideas?
Thanks!
June 19, 2013 at 2:22 am
Are the extensions visible in the output of the following:
SELECT * FROM sys.fulltext_document_types
If not, the filters need installing.
June 19, 2013 at 2:41 am
Hi Howard..
Yes, the iFilters appear to be installed.. The iFilter for pdfs is C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll.
I've added this to the path environment list and also checked that the dll actually exists in that location.. All fine apart frm the fact it doesn't work.
Thanks for the reply, had almost given up!
Jeff
June 19, 2013 at 3:06 am
Not sure if this is useful information or not, but it appears that 'doc' files ARE being indexed.. just nit docX files. Hadn't noticed before as all the previous attempts I used docX files..
All a bit more to type into Google I guess.
June 19, 2013 at 3:23 am
Aha..!!
Just checked the docX filter a bit more carefully (using EXEC sp_help_fulltext_system_components 'filter' ) and it reads:
C:\Windows\system32\"C:\Program Files\Windows NT\Accessories\WordpadFilter.dll"
..instead of the correct path:
C:\Program Files\Windows NT\Accessories\WordpadFilter.dll
If I can work out how to change this then I hope the docX problem may be solved.. any ideas!?
Sadly no such fix for the pdf issue as the paths, version numbers etc are all correct.
June 19, 2013 at 4:14 am
For anyone stumbling upon this post, the fix for the 'docX' problem is to download the latest Ofice iFilters from here
Then reload the filters and restart the service..
EXEC sp_fulltext_service 'load_os_resources',1
EXEC sp_fulltext_service 'verify_signature',0
EXEC sp_fulltext_service 'restart_all_fdhosts'
GO
All docX (and presumably xlsX etc) now indexing ok..
Still no pdfs though!
Grr!
June 19, 2013 at 5:48 am
After finding this post:
I followed the advice and uninstalled Adobe iFilter11 and installed 9 instead.. after updating the path variable (to the iFilter) and a quick restart all is well and pdfs are now indexing fine!
The post above is from the beginning of April and as yet no fix from Adobe, in fact not even an acknowledgement there may be a compatibility issue.
If you're having problems I'd recommend installing V.9 and waiting for a new version of 11 or 12 to come out.
Hope this ends up being useful to someone!!
Jeff
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply