April 25, 2008 at 1:43 am
Hi,
my problem is the following:
I need to get a textual representation of text data stored in binary file of type mostly MS Office, pdf, ...
Simply I want to have a possibility get text (and store it in text column) from each file type that is supported with full-text searching.
Is there some easy way to do this?
Note: I need get text from documents written in Czech, ...
Thanks for any response.
May 21, 2008 at 1:56 am
SQL Server is not a text editor. 🙂
Seriously, your best option is to use a client application. You could also resort to SQL CLR, and design a procedure/function, but that would require access to *managed* (and allowed) libraries capable of handling each individual binary type you want to support.
My vote goes to the client application.
ML
---
Matija Lah, SQL Server MVP
http://milambda.blogspot.com
May 22, 2008 at 10:52 am
jumarrr (4/25/2008)
Hi,my problem is the following:
I need to get a textual representation of text data stored in binary file of type mostly MS Office, pdf, ...
Simply I want to have a possibility get text (and store it in text column) from each file type that is supported with full-text searching.
Is there some easy way to do this?
Note: I need get text from documents written in Czech, ...
Thanks for any response.
There is no need to extract out the text from a Microsoft Office document in order to perform full-text searches.
SQL Server 2005 can full-text Microsoft Office documents stored in an IMAGE (better: VARBINARY(MAX)) column. It will also use the language encoding of the document to control the tokenizing of the words based upon language.
We do it for Word, PowerPoint and PDF.
If, on the other hand, you wish to display the text as plain text, then you will have to extract it out via an application.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply