Search Microsft Word Documents in SQL 2005

  • Hi,
     
    Does anyone know how to search Microsoft word documents stored as BLOBs for particular keywords?
     
    Thanks

    Regards,Iain

  • I use the VB/ADO code to get the contents of a BLOB into a file. After that you can fire up Word using OLE Automation and do the search. This latter part I have never actually done, i.e. use the find function progammatically. But I see no reason why this should not be possible.

    Public Sub BlobToFile(adoField As ADODB.Field, strFilename As String)

    Const lngChunkSize As Long = 8192

    Dim intF As Integer

    Dim lngBytesLeft As Long

    Dim lngBytes As Long

    Dim byteArray() As Byte

    intF = FreeFile

    Open strFilename For Binary As #intF

    lngBytesLeft = adoField.ActualSize

    Do While lngBytesLeft

      lngBytes = lngBytesLeft

      If lngBytes > lngChunkSize Then lngBytes = lngChunkSize

      byteArray = adoField.GetChunk(lngBytes)

      Put #intF, , byteArray

      lngBytesLeft = lngBytesLeft - lngBytes

    Loop

    Close #intF

     

    End Sub

  • Thanks Michael,
     
    A colleague came up with a very good solution....
     

    Searching Word Docs:

    1. Create two columns e.g:

    DocumentData Image

    FileExtension varchar(4)

    Store the document in the DocData column and it's file extension (.doc or .rtf or .txt etc) in the FileExtension Column

    2. Create a full text index - I right clicked on the table and choose Full Text Index -> Define Full Text Index to open the SQL 05 Full Text wizard.

    3. Choose the DocData column as the column to index and specify the FileExtension Column as it's 'Type' Column.

    4. Use CONTAINS or FREETEXT to search e.g.

    SELECT * FROM MyDocs WHERE CONTAINS (DocumentData, 'Dear Sir'

    This works with Word documents, text files and rich text files (didnt try excel etc but I assume it works with those too?) I read somewhere that Adobe PDF files can be indexed too if you have the correct filter installed on the server - available from adobe i assume?

    This seems to work well.

     

    Regards,Iain

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply