Currently Software AG don’t describe how non XML is stored
But essentially you can think of it as a blob.
What I do to index HTML is as follows,
The tool that loads the nonXML reads any HTML extracts meta data and content words and creates meta data . Both metadata and non XML are stored.
The meta data references the documents.
The application queries the meta data and get Tamino URLs. The application uses these URLs.
The meta data implemenation is based on RDF and implements the Dublin Core document meta data standard.
The method is also applicable to other non XML formats - you just need to build a component that extracts the data for each document type.
The indexing method is uniform for all possible metadata vocabularies so there is no fiddling about with Tamino Schemas.
You can extend the meta data for instances manually - for instance if it is missing for an instance or if the NonXML document has no meta data.
This should also be implementable as a server extension but I haven’t done that yet.
If you want a copy of the implemenation just ask
#Tamino#API-Management#webMethods