webMethods

webMethods

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
Expand all | Collapse all

Extract text from MS Word Document

  • 1.  Extract text from MS Word Document

    Posted Tue March 22, 2005 02:58 PM

    Hi,

    I uploaded a MS Word Document, using Tamino 4.1.4 and Tamino API for Java.

    I already have the document into a collection, but i need to search according the text inside the document.

    How can i do this? Is there any utility? or… Do i have to extract the text of the document and add it like another element in a schema ???

    Thanks in advance,

    :wink:


    #webMethods-Tamino-XML-Server-APIs
    #webMethods
    #API-Management


  • 2.  RE: Extract text from MS Word Document

    Posted Fri June 10, 2005 02:31 AM

    Hi Maria,

    You don’t say how you loaded the Word doc, as non-XML or XML (“save-as XML” option from Word (2003)). I assume you loaded it/them as non-XML.

    This is resolved with Tamino 4.2.1. There is a non-xml indexer server extension included with v4.2.1 that allows the loading of non-XML data, including Word docs. This server extension creates a shadow document where the content (meta data) of the non-XML Word doc is extracted to.

    When you search the non-XML document you are really searching the shadow document (an XML doc). The two documents act as one. All searchs go against the shadow doc and then when you call the Word doc (using ino:id or ino:docname) it returns the non-XML (Word) document.

    I know I have not answered you question in regards to 4.1.4 but wanted to let you know what is available, even if it is for v4.2.1, encase you can upgrade.


    #webMethods-Tamino-XML-Server-APIs
    #API-Management
    #webMethods