extract text from image data type (Office files, pdf file,)

  • Hi,

    my problem is the following:

    I need to get a textual representation of text data stored in binary file of type mostly MS Office, pdf, ...

    Simply I want to have a possibility get text (and store it in text column) from each file type that is supported with full-text searching.

    Is there some easy way to do this?

    Note: I need get text from documents written in Czech, ...

    Thanks for any response.

  • SQL Server is not a text editor. 🙂

    Seriously, your best option is to use a client application. You could also resort to SQL CLR, and design a procedure/function, but that would require access to *managed* (and allowed) libraries capable of handling each individual binary type you want to support.

    My vote goes to the client application.

    ML

    ---
    Matija Lah, SQL Server MVP
    http://milambda.blogspot.com

  • jumarrr (4/25/2008)


    Hi,

    my problem is the following:

    I need to get a textual representation of text data stored in binary file of type mostly MS Office, pdf, ...

    Simply I want to have a possibility get text (and store it in text column) from each file type that is supported with full-text searching.

    Is there some easy way to do this?

    Note: I need get text from documents written in Czech, ...

    Thanks for any response.

    There is no need to extract out the text from a Microsoft Office document in order to perform full-text searches.

    SQL Server 2005 can full-text Microsoft Office documents stored in an IMAGE (better: VARBINARY(MAX)) column. It will also use the language encoding of the document to control the tokenizing of the words based upon language.

    We do it for Word, PowerPoint and PDF.

    If, on the other hand, you wish to display the text as plain text, then you will have to extract it out via an application.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply