Content Management and Capture

 View Only

 OCRPL confidence levels

Duncan Shields's profile image
Duncan Shields posted Mon February 02, 2026 06:02 AM

Hi everyone,

I've been using OCRPL with a customer that processes a number of similar payment slips, some with machine print and some which are handwritten.

The level and accuracy of capture is pretty impressive, but it is let down by what I consider to be overly high confidence levels. 

Nearly every character has 100% confidence which, especially with handwritten fields, is an artificially high level.  It basically makes it impossible to use confidence levels to manage whether or not a slip should be sent for manual validation.

There is nothing worse than false positive results where the captured fields are wrong, but the confidence is 100%.

I'm using a variety of methods to get the values, mostly using PopulateZNField as all the fields have been captured using the OCR_PL Recognize() action.  Where a fields hasn't been populated it will use RecognizeFieldOCRPL() and I have noticed that on some occasions that even on the same field 2 different values might be captured, but both will have 100% confidence.

I've noticed that occasionally RecognizeFieldOCRPL() appears to capture the field upside down and back to front so 950.00 would appear as 00.056.  (It still has 100% confidence)

Has anyone else had similar experiences with OCRPL, is there anything I can do to get more realistic confidence levels in the results?

Has anyone tried DC9.1.10 with OCRPL has there been any significant changes?

As an aside OCRA's results are much less accurate than OCRPL but when it does fail to get it right its confidence levels reflect this.

Thanks,

Duncan 

 

Suman Suhag's profile image
Suman Suhag
import re
from abbyy_sdk import AbbyyAPI  # Hypothetical; replace with actual ABBYY import
 
def recognize_field_with_validation(image_path, field_name):
    api = AbbyyAPI(app_id='your_app_id', password='your_password')
    
    # Preprocess: Detect and correct orientation
    processed_image = api.process_image(image_path, options={'orientation': 'auto'})
    
    # Primary recognition
    result = api.recognize_field(processed_image, field_name)
    value = result['text']
    confidence = result['confidence']  # Often inflated
    
    # Custom validation for numeric fields (e.g., currency)
    if re.match(r'^\d+\.\d{2}$', value):  # Normal format
        return value, confidence
    elif re.match(r'^\d{2}\.\d{3}$', value[::-1]):  # Reversed? e.g., 00.056 -> 650.00
        corrected_value = value[::-1]  # Reverse string
        if re.match(r'^\d+\.\d{2}$', corrected_value):
            return corrected_value, confidence * 0.8  # Penalize confidence for correction
    else:
        # Fallback to RecognizeFieldOCRPL with retries
        for attempt in range(3):
            alt_result = api.recognize_field_ocr_pl(processed_image, field_name, options={'retries': 1})
            alt_value = alt_result['text']
            if re.match(r'^\d+\.\d{2}$', alt_value):
                return alt_value, alt_result['confidence'] * 0.9  # Slight penalty
        return None, 0  # Failed
 
# Usage
value, conf = recognize_field_with_validation('path/to/image.jpg', 'amount_field')
print(f"Recognized: {value}, Adjusted Confidence: {conf}")