When processing documents in a retrieval-augmented generation (RAG) use case, you can now test a new document text extraction API in watsonx.ai on IBM Cloud. You can use the new API to simplify complex business documents into a JSON file format that can be easily processed by foundation models as part of a generative AI workflow. The text extraction API can extract text from visual elements such as images, diagrams, and tables that are in your documents. These visual elements are often difficult to correctly interpret programmatically. After the API completes the extraction process, you can use the simplified JSON representation of the document content to enhance the contextual information for a foundation model prompt in a RAG use case. The document text extraction API can process input from the following file types: GIF JPG PDF PNG TIFF The API can also extract text from documents written in several languages. For details, see Extracting text from documents.
For a great example of how to run a text extraction job by using the watsonx.ai Python library, see the sample Python notebook on GitHub.