IBM webMethods Hybrid Integration

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

#TechXchangePresenter

View Only

Back to discussions

Expand all | Collapse all

Extract text from MS Word Document

webMethods Community MemberTue March 22, 2005 02:58 PM

Hi, I uploaded a MS Word Document, using Tamino 4.1.4 and Tamino API for Java. I already have the ...

webMethods Community MemberFri June 10, 2005 02:31 AM

Hi Maria, You don’t say how you loaded the Word doc, as non-XML or XML (“save-as XML” option from ...

1. Extract text from MS Word Document

Like
webMethods Community Member
Posted Tue March 22, 2005 02:58 PM

Reply
Hi,

I uploaded a MS Word Document, using Tamino 4.1.4 and Tamino API for Java.

I already have the document into a collection, but i need to search according the text inside the document.

How can i do this? Is there any utility? or… Do i have to extract the text of the document and add it like another element in a schema ???

Thanks in advance,

#webMethods-Tamino-XML-Server-APIs
#webMethods
#API-Management
2. RE: Extract text from MS Word Document

Like
webMethods Community Member
Posted Fri June 10, 2005 02:31 AM

Reply
Hi Maria,

You don’t say how you loaded the Word doc, as non-XML or XML (“save-as XML” option from Word (2003)). I assume you loaded it/them as non-XML.

This is resolved with Tamino 4.2.1. There is a non-xml indexer server extension included with v4.2.1 that allows the loading of non-XML data, including Word docs. This server extension creates a shadow document where the content (meta data) of the non-XML Word doc is extracted to.

When you search the non-XML document you are really searching the shadow document (an XML doc). The two documents act as one. All searchs go against the shadow doc and then when you call the Word doc (using ino:id or ino:docname) it returns the non-XML (Word) document.

I know I have not answered you question in regards to 4.1.4 but wanted to let you know what is available, even if it is for v4.2.1, encase you can upgrade.

#webMethods-Tamino-XML-Server-APIs
#API-Management
#webMethods

IBM webMethods Hybrid Integration

IBM webMethods Hybrid Integration

Extract text from MS Word Document

webMethods Community MemberTue March 22, 2005 02:58 PM

webMethods Community MemberFri June 10, 2005 02:31 AM

1. Extract text from MS Word Document

2. RE: Extract text from MS Word Document

Additional
Resources

Office

Quick Links

IBM webMethods Hybrid Integration

IBM webMethods Hybrid Integration

Extract text from MS Word Document

webMethods Community MemberTue March 22, 2005 02:58 PM

webMethods Community MemberFri June 10, 2005 02:31 AM

1. Extract text from MS Word Document

2. RE: Extract text from MS Word Document

Related Content

shadow function for non-xml indexing

shadow function for non-xml indexing

Tamino WebDAV Server 4.1.4 and Tamino Non-XML Indexer releas

Tamino and large NON-XML Documents (e.G. Word)

Tamino Non-XML Indexer 4.1.4 for Unix and Tamino X-Plorer 4.

Additional Resources

Office

Quick Links

Additional
Resources