IBM webMethods Hybrid Integration

IBM webMethods Hybrid Integration

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.


#TechXchangePresenter
 View Only
Expand all | Collapse all

Jakarta-poi

  • 1.  Jakarta-poi

    Posted Fri September 26, 2003 02:41 PM

    Hello guys,
    this my question:
    could i extract documents content directly using jakarta-poi API???, you doit???

    Alfonso.


    #API-Management
    #webMethods
    #Tamino


  • 2.  RE: Jakarta-poi

    Posted Fri September 26, 2003 05:59 PM

    Hi,

    The non-xml indexer internally uses Jakarta POI to extact meta-data (and not content) from MS documents. The meta-data is stored as a XML document with the same internal id (ino:id) as the non-xml document.

    Hope this helps.

    Stuart Fyffe-Collins
    Software AG (UK) Ltd.


    #webMethods
    #Tamino
    #API-Management


  • 3.  RE: Jakarta-poi

    Posted Mon September 29, 2003 05:27 PM

    hi,
    i’m trying Tamino non xml indexer, but i’m really interesting only into “generated” by the indexer.

    Alfonso.


    #webMethods
    #Tamino
    #API-Management


  • 4.  RE: Jakarta-poi

    Posted Tue September 30, 2003 07:50 AM

    Hi Alfonso,

    The nonXMLIndexer generates . For Excel and MS word content POI is used internally. Of course all formatting disappears, but you can make text queries, for example "Find all word documents containing ‘Tamino’ "

    Regards,
    Martin


    #webMethods
    #Tamino
    #API-Management


  • 5.  RE: Jakarta-poi

    Posted Thu October 02, 2003 11:39 AM

    Hi Martin,
    i only want the content, if is possible, without using the indexer.

    IndexedDocument.getContent();

    Regards,
    Alfonso


    #Tamino
    #webMethods
    #API-Management


  • 6.  RE: Jakarta-poi

    Posted Thu October 02, 2003 11:58 AM

    Hi Alfonso,

    then you have to write your own indexer (or content extractor). What do you want to do with that content?

    regards,
    Martin


    #Tamino
    #webMethods
    #API-Management


  • 7.  RE: Jakarta-poi

    Posted Thu October 02, 2003 12:02 PM


  • 8.  RE: Jakarta-poi

    Posted Thu October 02, 2003 01:22 PM

    Thats what the nonXMLIndexer is designed for.

    Regards,
    Martin


    #webMethods
    #Tamino
    #API-Management


  • 9.  RE: Jakarta-poi

    Posted Thu October 02, 2003 04:16 PM

    My problem is metadata(xml) adding for content, i only want use one collection.


    #webMethods
    #API-Management
    #Tamino


  • 10.  RE: Jakarta-poi

    Posted Thu October 02, 2003 04:37 PM

    nonXML indexer writes the metadata(XML for properties AND content) in the same collection (even in the same schema) as the document itself.

    Regards,
    Martin


    #Tamino
    #webMethods
    #API-Management