Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Local Classification of PDF Documents with Granite Vision 3.2, Ollama and Docling

By Patrick Meyer posted Sun May 04, 2025 10:56 AM

  

This step-by-step guide shows you how to set up a local PDF document classification system using three complementary technologies: Granite Vision 3.2 for visual analysis, Ollama for running AI models on a computer, and Docling for accurate document conversion. This fully on-premises solution protects your data while providing advanced document analysis capabilities. The solution runs locally on Windows 11 or macOS (Intel/Apple Silicon).

Introduction to Local PDF Document Processing

In a world where data privacy is becoming paramount, analyzing sensitive documents without transmitting them to external cloud services is a huge advantage. The three technologies we'll combine in this guide offer a complete solution:

  • Granite Vision 3.2: IBM's multimodal model for understanding visual documents.
  • Ollama: a platform facilitating the local execution of language models
  • Docling: IBM library for converting and enriching PDF, Word (DOCX), Excel (XLSX), PowerPoint (PPTX), text, HTML, images, etc.

This approach allows you to maintain full control over your data, operate without an internet connection (once the model is downloaded), and avoid the costs of subscribing to inference cloud services.

Ollama Installation

Ollama serves as the foundation of our system by enabling the local execution of AI models. It is an open-source platform that makes it easy to run language models locally on your computer, without relying on the cloud, with the ability to use a GPU to accelerate performance. It simplifies the deployment and execution of models such as LLaMA, Mistral, or Gemma through online commands and provides a local API to integrate them into applications.

Here's how to install it according to your operating system.

Windows :

  1. Download the installer from the ollama.com website
  2. Run the OllamaSetup.exe file and follow the instructions
  3. Check the installation in PowerShell:

powershell

ollama --version

macOS:

  1. Download the Ollama-darwin.zip file from the official website. This installation requires macOS 11 Big Sur or higher.
  2. Double-click on the .zip file and drag and drop the application into the Applications folder
  3. Launch Ollama from the Launchpad
  4. Check the installation in Terminal:

Bash

ollama --version

Configuring Granite Vision 3.2 in Ollama

Granite Vision 3.2 is a multimodal model developed by IBM, specifically designed for understanding visual documents. It has been trained specifically for extracting information from tables, graphs, and other visual elements.

Download the template

To download and configure the template:

bash or powershell

ollama run granite3.2-vision

This command retrieves the model from the Hugging Face Model Repository and configures it for use with Ollama.

Initial model testing

To verify that the model is working correctly, the following command must be executed:

Command line ollama

>>> What can you do to analyze PDF documents?

Docling Installation and Configuration

Docling is an IBM tool that specializes in converting PDF, Word, Excel, PowerPoint, HTML, text, and other documents. It preserves the structure and integrity of documents, even with complex layouts.

Installations via pip

bash or powershell

pip install docling

pip install pdf2image

pip install flask

To verify the installation:

bash or powershell

python -c "import importlib.metadata; print(importlib.metadata.version('docling'))"

Development of a complete classification pipeline

Now that we have all of our tools installed, let's create a complete pipeline for PDF document classification.

Module 1: PDF Conversion with Docling

Let's first create a Python module to convert PDFs into a usable structure:

python

pdf_converter.py File

from docling.document_converter import DocumentConverter, PdfFormatOption

from docling.datamodel.pipeline_options import PdfPipelineOptions

from docling.datamodel.base_models import InputFormat

import json

import os

def convert_pdf_to_json(pdf_path, output_file=None):

    """

    Converts a PDF document to a rich JSON structure with Docling.

   

    Args:

        pdf_path: Path to PDF

        output_file: Path to save JSON (optional)

   

    Returns:

        Structured Docling Document

    """

    print(f"Conversion du document: {pdf_path}")

   

    # Configuring Pipeline Options for Docling

    pipeline_options = PdfPipelineOptions()

    # Enable enrichments for better analysis

    pipeline_options.do_picture_classification = False

    pipeline_options.do_picture_description = False

    pipeline_options.do_code_enrichment = False

    pipeline_options.do_formula_enrichment = False

   

    # Creating the converter with configured options

    converter = DocumentConverter(format_options={

        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)

    })

   

    # PDF Document Conversion

    result = converter.convert(pdf_path)

    doc = result.document

   

    # Save the document in JSON format if requested

    if output_file:

        with open(output_file, "w", encoding="utf-8") as f:

            json.dump(doc.model_dump(), f, ensure_ascii=False, indent=2)

        print(f"JSON document saved in: {output_file}")

   

    return doc

This module uses all the features of Docling to extract as much information as possible from the PDF document, including image recognition, formulas and code.

Module 2: Analysis with Granite Vision via Ollama

Now let's create a module that will use Granite Vision to analyze the information extracted from the document:

python

document_analyzer.py File

import requests

import json

import base64

from PIL import Image

import io

from pdf2image import convert_from_path

def analyze_document_with_granite(json_path, pdf_path, output_file=None):

    """

    Scans a converted PDF document using Granite Vision via Ollama.

   

    Args:

        json_path: Path to the Docling Generated JSON File

        pdf_path: Path to the original PDF document

        output_file: Path to save the scan (optional)

   

    Returns:

        Dictionary containing analysis results

    """

    print(f"Document analysis with Granite Vision...")

   

    # Function to send a request to Ollama

    def query_ollama(prompt, images=None):

        api_url = "http://localhost:11434/api/generate"

        request_data = {

            "model": "granite3.2-vision",

            "prompt": prompt,

            "stream": False

        }

       

        if images:

            request_data["images"] = images

       

        response = requests.post(api_url, json=request_data)

        if response.status_code != 200:

            raise Exception(f"Erreur API Ollama: {response.text}")

           

        return response.json()["response"]

   

    # Upload the JSON document generated by Docling

    with open(json_path, "r", encoding="utf-8") as f:

        doc_data = json.load(f)

   

    # Extract text from document

    doc_content = ""

    for item in doc_data["texts"]:

        # if item["type"] == "TextItem":

        doc_content += item.get("text", "") + "\n"

   

    # Limit the length for the initial scan

    summary_content = doc_content[:2000] + "..." if len(doc_content) > 2000 else doc_content

   

    # Textual classification of the document

    classification_prompt = f"""

    Analyze the following content from a PDF document and answer the questions:

    {summary_content}

   

    1. What is the type of document (technical report, scientific article, documentation, etc.)?

    2. What are the main themes covered?

    3. Who is this document intended for?

    4. What is the general structure of the document?

    """

   

    classification_result = query_ollama(classification_prompt)

    print("Text classification complete")

   

    # Visual analysis of the first page

    try:

        # Converting the first page to an image

        images = convert_from_path(pdf_path, first_page=1, last_page=1)

        first_page = images[0]

       

        # Resize if necessary

        max_size = (1200, 1600)

        if first_page.width > max_size[0] or first_page.height > max_size[1]:

            first_page.thumbnail(max_size, Image.LANCZOS)

       

        # Conversion to base64

        buffered = io. BytesIO()

        first_page.save(buffered, format="PNG")

        img_base64 = base64.b64encode(buffered.getvalue()).decode("utf-8")

       

        # Visual analysis

        visual_prompt = """

        Analyzes the layout and visual structure of this document page:

        1. How is the layout (columns, sections) organized?

        2. What visuals are present (tables, figures, diagrams)?

        3. How is the visual hierarchy (titles, subtitles) structured?

        4. Are there any distinctive or special elements?

        """

       

        visual_analysis = query_ollama(visual_prompt, [img_base64])

        print("Visual Analysis Completed")

    except Exception as e:

        visual_analysis = f"Error during visual analysis: {str(e)}"

        print(f"Warning: {visual_analysis}")

   

    # Extract tables for specific analysis

    tables = doc_data.get("tables", [])

    html_output = ""

    for table in tables:

        data = table.get("data", {})

        num_rows = data.get("num_rows", 0)

        num_cols = data.get("num_cols", 0)

        grid = data.get("grid", [])

        # Initializing an empty grid for the final render

        final_grid = [[None for _ in range(num_cols)] for _ in range(num_rows)]

        # Mark cells according to their positions to manage spans

        for row in grid:

            for cell in row:

                r_start = cell["start_row_offset_idx"]

                c_start = cell["start_col_offset_idx"]

                rowspan = cell["row_span"]

                colspan = cell["col_span"]

                text = cell["text"].strip()

                # Avoid overwriting cells that have already been filled in (duplicates in the data)

                if final_grid[r_start][c_start] is None:

                    final_grid[r_start][c_start] = {

                        'text' means text,

                        "rowspan": rowspan,

                        "colspan": colspan,

                        "is_header": cell.get("column_header", False) or cell.get("row_header", False),

                        "row_section": cell.get("row_section", False)

                    }

                    # Mark the cells covered by the spans as occupied

                    for r in range(r_start, r_start + rowspan):

                        for c in range(c_start, c_start + colspan):

                            if r == r_start and c == c_start:

                                continue

                            final_grid[r][c] = "SPAN"

        # HTML Generation

        html_output += "<table border='1'>\n"

        for row in final_grid:

            html_output += "<tr> n"

            for cell in row:

                if cell is None or cell == "SPAN":

                    continue

                tag = "th" if cell["is_header"] else "td"

                rowspan = f" rowspan='{cell['rowspan']}'" if cell["rowspan"] > 1 else ""

                colspan = f" colspan='{cell['colspan']}'" if cell["colspan"] > 1 else ""

                section_class = " class='section'" if cell["row_section"] else ""

                html_output += f" <{tag}{rowspan}{colspan}{section_class}>{cell['text']}</{tag}>\n"

            html_output += "</tr>\n"

        html_output += "</table>\n"

    table_analysis = None

    if tables:

        table_prompt = f"""

        The document contains {len(final_grid[0])} array(s). Here are the data in the first table:

        {html_output}

        Analyze this table and describe its content and purpose in the document.

        """

       

        table_analysis = query_ollama(table_prompt)

        print("Table Analysis Completed")

   

    # Combine all results

    final_analysis = {

        "classification_textuelle": classification_result,

        "analyse_structure_visuelle": visual_analysis,

        "analyse_tableaux": table_analysis

    }

   

    # Save results if requested

    if output_file:

        with open(output_file, "w", encoding="utf-8") as f:

            json.dump(final_analysis, f, ensure_ascii=False, indent=2)

        print(f"Scan results saved in: {output_file}")

   

    return final_analysis

This module uses Granite Vision's multimodal capabilities to analyze both the textual and visual content of the document.

Main Module: Pipeline Orchestration

Finally, let's create a main script that orchestrates the entire pipeline:

python

classify_pdf.py File

import os

import argparse

import json

from datetime import datetime

import shutil

# Import functions from previous modules

from pdf_converter import convert_pdf_to_json

from document_analyzer import analyze_document_with_granite

def classify_pdf_document(pdf_path, output_dir=None):

    """

    Full pipeline for PDF document classification.

   

    Args:

        pdf_path: Path to the PDF document to be parsed

        output_dir: Output directory for results (optional)

   

    Returns:

        dict: Results of the analysis

    """

    # Create an output directory if necessary

    if output_dir is None:

        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

        output_dir = f"pdf_analysis_{timestamp}"

   

    os.makedirs(output_dir, exist_ok=True)

   

    # Set Output Paths

    json_output = os.path.join(output_dir, "document_converted.json")

    analysis_output = os.path.join(output_dir, "document_analysis.json")

    html_output = os.path.join(output_dir, "rapport_analyse.html")

   

    print(f"PDF Document Analysis Pipeline")

    print(f"==================================")

    print(f"Document: {pdf_path}")

    print(f"Output folder: {output_dir}")

    print()

   

    #Step 1: Converting PDF with Docling

    print(f"Step 1/3: Convert PDF with Docling...")

    document = convert_pdf_to_json(pdf_path, output_file=json_output)

    print(f" Conversion complete.")

   

    #Step 2: Analysis with Granite Vision

    print(f"Step 2/3: Analysis with Granite Vision...")

    analysis = analyze_document_with_granite(json_output, pdf_path, output_file=analysis_output)

    print(f" Scan complete.")

   

    # Step 3: Generating the HTML Report

    print(f"Step 3/3: Generate the report...")

    generate_html_report(pdf_path, analysis, html_output)

    print(f" Rapport généré: {html_output}")

    return {

        "document_path": pdf_path,

        "json_path": json_output,

        "analysis_path": analysis_output,

        "report_path": html_output

    }

def generate_html_report(pdf_path, analysis, output_file):

    """

    Generates an HTML report from the analysis results.

    """

    filename = os.path.basename(pdf_path)

   

    html_content = f""<! DOCTYPE html>

<html lang="fr">

<head>

    <meta charset="UTF-8">

    <meta name="viewport" content="width=device-width, initial-scale=1.0">

    <title>Analytice de document - {filename}</title>

    <style>

        body {{ font-family: Arial, sans-serif; line-height: 1.6; max-width: 1000px; margin: 0 auto; padding: 20px; }}

        h1, h2, h3 {{ color: #2c3e50; }}

        .header {{ background-color: #3498db; color: white; padding: 20px; border-radius: 5px; margin-bottom: 20px; }}

        .section {{ background-color: #f9f9f9; padding: 20px; border-radius: 5px; margin-bottom: 20px; }}

        .footer {{ text-align: center; margin-top: 30px; font-size: 0.8em; color: #7f8c8d; }}

        pre {{ background-color: #f5f5f5; padding: 15px; border-radius: 5px; overflow-x: auto; white-space: pre-wrap; }}

    </style>

</head>

<body>

    <div class="header">

        <h1>PDF Document Analysis Report</h1>

        <p>Document: {filename}</p>

        <p>Scan Date: {datetime.now().strftime("%d/%m/%Y %H:%M:%S")}</p>

    </div>

   

    <div class="section">

        <h2>Classification du document</h2>

        <pre>{analysis ['classification_textuelle']}</pre>

    </div>

   

    <div class="section">

        <h2>Visual Structure Analysis</h2>

        <pre>{analysis['analyse_structure_visuelle']}</pre>

    </div>

"""

    # Add table analysis if available

    if analysis.get('analyse_tableaux'):

        html_content += f"""

    <div class="section">

        <h2>Array analysis</h2>

        <pre>{analysis['analyse_tableaux']}</pre>

    </div>

"""

   

    html_content += """

    <div class="footer">

        <p> analysis performed with Ollama, Granite Vision 3.2 and Docling</p>

    </div>

</body>

</html>

"""

   

    with open(output_file, "w", encoding="utf-8") as f:

        f.write(html_content)

if __name__ == "__main__":

    parser = argparse. ArgumentParser(description="Classification locale de documents PDF")

    parser.add_argument("pdf_path", help="Path to the PDF document to be parsed")

    parser.add_argument("--output", "-o", help="Output directory for results")

   

    args = parser.parse_args()

   

    results = classify_pdf_document(args.pdf_path, args.output)

    print(f"\nScan complete. Report available in: {results['report_path']}")

Execution

Create an 'outputs' directory in your tree.

bash or powershell

python classify_pdf.py <path_to_pdf> --output outputs

Web interface for ease of use

To make our solution more accessible, let's create a simple web interface with Flask:

python

app.py File

from flask import Flask, request, render_template, redirect, url_for, send_from_directory

import os

import uuid

from werkzeug.utils import secure_filename

import threading

from classify_pdf import classify_pdf_document

app = Flask(__name__)

app.config['UPLOAD_FOLDER'] = os.path.join(os.getcwd(), 'uploads')

app.config['RESULTS_FOLDER'] = os.path.join(os.getcwd(), 'results')

# Create the necessary folders

os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)

os.makedirs(app.config['RESULTS_FOLDER'], exist_ok=True)

# Dictionary to track the status of treatments

processing_status = {}

@app.route('/')

def index():

    return render_template('index.html')

@app.route('/upload', methods=['POST'])

def upload_file():

    if 'file' not in request.files:

        return redirect(url_for('index'))

   

    file = request.files['file']

    if file.filename == '':

        return redirect(url_for('index'))

   

    if file and file.filename.lower().endswith('.pdf'):

        # Generate a unique identifier for this processing

        process_id = str(uuid.uuid4())

        process_folder = os.path.join(app.config['RESULTS_FOLDER'], process_id)

        os.makedirs(process_folder, exist_ok=True)

       

        # Save the file

        filename = secure_filename(file.filename)

        pdf_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)

        file.save(pdf_path)

       

        # Initialize the processing status

        processing_status[process_id] = {

            'status': 'processing',

            'filename': filename,

            'message': 'Processing...'

        }

       

        # Start processing in a separate thread

        threading. Thread(

            target=process_document,

            args=(pdf_path, process_folder, process_id)

        ).start()

        return redirect(url_for('show_results', process_id=process_id))

   

    return "Unsupported file type. Please download a PDF file."

def process_document(pdf_path, output_folder, process_id):

    """Function to process the document in a separate thread"""

    try:

        # Update Status

        processing_status[process_id]['message'] = 'Scanning the document in progress...'

       

        # Start Processing

        results = classify_pdf_document(pdf_path, output_folder)

       

        # Update status when completed

        processing_status[process_id]['status'] = 'completed'

        processing_status[process_id]['message'] = 'Scan completed'

        processing_status[process_id]['report_path'] = results['report_path']

       

        # Delete the original file to save space

        os.remove(pdf_path)

    except Exception as e:

        # In case of error

        processing_status[process_id]['status'] = 'failed'

        processing_status[process_id]['message'] = f'Error: {str(e)}'

@app.route('/results/<process_id>')

def show_results(process_id):

    # Check if the process exists

    if process_id not in processing_status:

        return "Processing not found"

   

    status = processing_status[process_id]

   

    # If the processing is complete, view the report

    if status['status'] == 'completed':

        report_path = os.path.basename(status['report_path'])

        return send_from_directory(

            os.path.join(app.config['RESULTS_FOLDER'], process_id),

            "rapport_analyse.html"

        )

   

    # Otherwise, show the waiting page

    return render_template(

        'processing.html',

        process_id=process_id,

        filename=status['filename'],

        message=status['message']

    )

@app.route('/status/<process_id>')

def get_status(process_id):

    ""Endpoint API to retrieve current status"""

    if process_id not in processing_status:

        return {"status": "not_found"}

   

    return processing_status[process_id]

if __name__ == '__main__':

    app.run(debug=True, host='0.0.0.0', port=5000)

You need to create a "templates" folder with two HTML files.

html

templates/index.html File

<! DOCTYPE html>

<html lang="fr">

<head>

    <meta charset="UTF-8">

    <title>Classification de Documents PDF</title>

    <style>

        body { font-family: Arial, sans-serif; line-height: 1.6; max-width: 800px; margin: 0 auto; padding: 20px; }

        h1 { color: #2c3e50; text-align: center; }

        .upload-box { border: 2px dashed #3498db; border-radius: 10px; padding: 40px; text-align: center; margin: 30px 0; }

        .btn { background-color: #3498db; color: white; border: none; padding: 10px 20px; cursor: pointer; font-size: 16px; border-radius: 5px; }

        .info { background-color: #f8f9fa; padding: 20px; border-radius: 5px; margin-top: 30px; }

    </style>

</head>

<body>

    <h1>Local Document Classification PDF</h1>

   

    <div class="upload-box">

        <h2>Download your PDF</h2>

        <form method="post" action="/upload" enctype="multipart/form-data">

            <input type="file" name="file" accept=".pdf"><br><br>

            <button type="submit" class="btn">Analyser le document</button>

        </form>

    </div>

   

    <div class="info">

        <h3>About this tool</h3>

        <p>

            This application uses three cutting-edge technologies to analyze your PDF documents locally:

        </p>

        <ul>

            <li><strong>Ollama</strong>: A platform for running AI</li models locally>

            <li><strong>Granite Vision 3.2</strong>: IBM model specialized in understanding visual documents</li>

            <li><strong>Docling</strong>: IBM tool for accurately converting PDF</li documents>

        </ul>

        <p>

            All scans are performed locally on your server, ensuring the confidentiality of your sensitive documents. No data is sent to external services.

        </p>

    </div>

</body>

</html>

html

templates/processing.html File

<! DOCTYPE html>

<html lang="fr">

<head>

    <meta charset="UTF-8">

    <title>Processing</title>

    <style>

        body { font-family: Arial, sans-serif; line-height: 1.6; max-width: 800px; margin: 0 auto; padding: 20px; text-align: center; }

        h1 { color: #2c3e50; }

        .loader { border: 16px solid #f3f3f3; border-top: 16px solid #3498db; border-radius: 50%; width: 80px; height: 80px; animation: spin 2s linear infinite; margin: 40px auto; }

        @keyframes spin { 0% { transform: rotate(0deg); } 100% { transform: rotate(360deg); } }

        .status { background-color: #f8f9fa; padding: 20px; border-radius: 5px; margin-top: 30px; }

    </style>

    <script>

        Check status every 3 seconds

        function checkStatus() {

            fetch('/status/{{ process_id }}')

                .then(response => response.json())

                .then(data => {

                    document.getElementById('message').textContent = data.message;

                    if (data.status === 'completed') {

                        window.location.reload();

                    }

                });

        }

       

        Start verification on page load

        document.addEventListener('DOMContentLoaded', function() {

            setInterval(checkStatus, 3000);

        });

    </script>

</head>

<body>

    <h1>Processing</h1>

    <p>Votre document <strong>{{ filename }}</strong> est en cours d'analyse</p>

   

    <div class="loader"></div>

   

    <div class="status">

        <h3>Statut actuel</h3>

        <'s id="message">{{message }}</p>

        <p>This page will automatically refresh when processing is complete.</p>

    </div>

</body>

</html>

Execution

bash or powershell

python .\app.py

Then access the application, using an Internet browser, by typing the following address in the navigation bar: http://127.0.0.1:5000/.

Use cases and customization

This pipeline can be customized for different use cases:

Automatic classification of document archives

To automatically process an entire PDF folder:

python

batch_processor.py File

import os

import argparse

import concurrent.futures

from classify_pdf import classify_pdf_document

def process_directory(input_dir, output_dir, max_workers=2):

    """

    Processes all PDF files in a directory.

   

    Args:

        input_dir: Directory containing PDFs to be processed

        output_dir: Directory to store results

        max_workers: Maximum number of concurrent processes

    """

    os.makedirs(output_dir, exist_ok=True)

   

    # Retrieve all PDF files from the directory

    pdf_files = [

        os.path.join(input_dir, f) for f in os.listdir(input_dir)

        if f.lower().endswith('.pdf')

    ]

   

    print(f"Processing {len(pdf_files)} PDF files with {max_workers} workers...")

   

    # Process files in parallel

    with concurrent.futures.ProcessPoolExecutor(max_workers=max_workers) as executor:

        futures = {}

        for pdf_file in pdf_files:

            # Create a subfolder for each PDF

            filename = os.path.basename(pdf_file).replace('.pdf', '')

            pdf_output_dir = os.path.join(output_dir, filename)

           

            # Submit Task

            future = executor.submit(classify_pdf_document, pdf_file, pdf_output_dir)

            futures[future] = pdf_file

       

        # Process results as you go

        for future in concurrent.futures.as_completed(futures):

            pdf_file = futures[future]

            try:

                results = future.result()

                print(f" Processing completed for {os.path.basename(pdf_file)}")

            except Exception as e:

                print(f" Error processing {os.path.basename(pdf_file)}: {str(e)}")

if __name__ == "__main__":

    parser = argparse. ArgumentParser(description="PDF document batching")

    parser.add_argument("input_dir", help="Directory containing PDFs to be processed")

    parser.add_argument("--output", "-o", help="Directory for results", default="batch_results")

    parser.add_argument("--workers", "-w", type=int, help="Number of concurrent processes", default=2)

   

    args = parser.parse_args()

    process_directory(args.input_dir, args.output, args.workers)

Execution

bash or powershell

python .\batch_processor.py <input_dir> --output <output_dir> --workers 2

Targeted information extraction

To extract specific information from a document, you can add a dedicated function:

python

File extract_processor.py

import argparse

import json

import requests

def extract_specific_information(json_path, info_type):

    """

    Extract specific information from a PDF document.

   

    Args:

        json_path: Path to the Docling Generated JSON File

        info_type: Type of information to be extracted (e.g. 'dates', 'amounts', 'names')

   

    Returns:

        List of information retrieved

    """

    with open(json_path, "r", encoding="utf-8") as f:

        doc_data = json.load(f)

   

    # Retrieve Full Text

    doc_content = ""

    for item in doc_data["texts"]:

        # if item["type"] == "TextItem":

        doc_content += item.get("text", "") + "\n"

   

    # Create a specific query for Granite Vision

    prompt = f"""

    Extract all the "{info_type}" information from the following text and present it as a list:

   

    {doc_content}

    """

   

    # Call Ollama

    api_url = "http://localhost:11434/api/generate"

    request_data = {

        "model": "granite3.2-vision",

        "prompt": prompt,

        "stream": False

    }

   

    response = requests.post(api_url, json=request_data)

   

    return response.json()["response"]

if __name__ == "__main__":

    parser = argparse. ArgumentParser(description="Extracting information from the json document extracted from docling.")

    parser.add_argument("input_path", help="Path to the json file to be processed")

    parser.add_argument("info_type", help="Type of information to be extracted, such as 'dates', 'amounts', or 'names'")

   

    args = parser.parse_args()

    response = extract_specific_information(args.input_path, args.info_type)

    print(response)

Execution

The program allows you to extract data (names, dates, amounts, etc.) from the json extracted by Docling.

bash or powershell

python .\extract_processor.py <lien_vers_fichier_json> dates

Conclusion and outlook

We have developed a complete pipeline for the classification and analysis of PDF documents locally, combining three complementary technologies: Granite Vision 3.2, Ollama, and Docling.

This approach has several major advantages:

  1. Confidentiality guaranteed: All data is processed locally, without transit through external services.
  2. Full Control: You can customize each step of the process to your specific needs.
  3. Independence: Once the templates are downloaded, the system works without an internet connection.
  4. Savings: No subscription fees for cloud services.

This pipeline can be expanded to meet a variety of needs, such as integration with electronic document management systems, automating forms processing, or extracting structured data to populate databases.

The combined use of Docling for document structure extraction and Granite Vision for visual and textual analysis provides capabilities that were previously reserved for expensive proprietary solutions, all in an environment that is fully controlled by you.


#IBMChampion
#champions-blog-feed
1 comment
26 views

Permalink

Comments

Fri May 09, 2025 03:44 PM

I appreciate the article.

Was the code generated using Mistral because there are some French words here and there like "erreur" :)