Solution Overview
The core problem is that Datacap's image-based fingerprint matching breaks when scan dimensions vary. The fix is to shift from pixel-space comparison to structural/semantic comparison using layout XML files generated by the Datacap recognition engine (FastDoc/RRS). This approach is dimension-agnostic and far more robust.
Detailed Technical Solution
1. Fingerprint XML Structure (replaces TIF image)
Instead of storing a TIF snapshot, you store a normalized layout descriptor XML per document class. Here's the schema:
<!-- fingerprint_invoice_v1.xml -->
<DocumentFingerprint id="INV-001" version="1.0" created="2024-01-15">
<Metadata>
<DocumentClass>Invoice</DocumentClass>
<Language>ENG</Language>
<PageCount>1</PageCount>
<LayoutHash>a3f7c2d9</LayoutHash>
</Metadata>
<BlockStructure totalBlocks="12" textBlocks="9" tableBlocks="2" imageBlocks="1">
<!-- Each block uses relative coordinates (0.0–1.0) to be dimension-agnostic -->
<Block id="B001" type="TEXT" role="HEADER">
<RelativePosition x="0.05" y="0.03" w="0.60" h="0.06"/>
<TextAnchor pattern="INVOICE" confidence="HIGH"/>
<FieldMapping fieldName="DocumentTitle"/>
</Block>
<Block id="B002" type="TEXT" role="LABEL_VALUE">
<RelativePosition x="0.60" y="0.08" w="0.35" h="0.04"/>
<TextAnchor pattern="Invoice No\." confidence="HIGH"/>
<FieldMapping fieldName="InvoiceNumber"/>
<AdjacentBlock direction="RIGHT" id="B003"/>
</Block>
<Block id="B003" type="TEXT" role="VALUE">
<RelativePosition x="0.75" y="0.08" w="0.20" h="0.04"/>
<FieldMapping fieldName="InvoiceNumberValue"/>
</Block>
<Block id="B010" type="TABLE" role="LINE_ITEMS">
<RelativePosition x="0.05" y="0.35" w="0.90" h="0.40"/>
<ColumnHeaders count="5">
<Column index="0" anchor="Description"/>
<Column index="1" anchor="Qty"/>
<Column index="2" anchor="Unit Price"/>
<Column index="3" anchor="Tax"/>
<Column index="4" anchor="Amount"/>
</ColumnHeaders>
<FieldMapping fieldName="LineItems" type="REPEATING"/>
</Block>
</BlockStructure>
<!-- Topology graph: which blocks are spatially related -->
<Topology>
<Relation type="ABOVE" from="B001" to="B002"/>
<Relation type="LEFT_OF" from="B002" to="B003"/>
<Relation type="ABOVE" from="B003" to="B010"/>
</Topology>
</DocumentFingerprint>
2. Layout XML Generator - Improved Repeatability
This is the most critical piece. You must post-process the raw Datacap/RRS layout XML output to normalize it so that structurally identical documents always produce the same fingerprint regardless of DPI, scan size, or skew.
// LayoutNormalizer.java - Datacap custom task action
public class LayoutNormalizer {
/**
* Converts absolute pixel coordinates from RRS output
* to relative (0.0–1.0) coordinates based on page dimensions.
* This eliminates sensitivity to scan resolution.
*/
public NormalizedBlock normalizeBlock(RRSBlock rawBlock, PageDimensions page) {
NormalizedBlock nb = new NormalizedBlock();
nb.setId(rawBlock.getId());
nb.setType(rawBlock.getBlockType());
// Convert absolute → relative coordinates
nb.setRelX(rawBlock.getX() / (double) page.getWidth());
nb.setRelY(rawBlock.getY() / (double) page.getHeight());
nb.setRelW(rawBlock.getWidth() / (double) page.getWidth());
nb.setRelH(rawBlock.getHeight() / (double) page.getHeight());
// Extract a stable text anchor (first significant word or regex pattern)
nb.setTextAnchor(extractAnchor(rawBlock.getOCRText()));
return nb;
}
/**
* Assign stable, repeatable block IDs based on spatial grid position,
* NOT on the arbitrary order RRS emits them.
* Grid: divide page into 10×10 cells; block ID = cell(row,col) + type + seq
*/
public String assignStableBlockId(NormalizedBlock block, int sequenceInCell) {
int gridRow = (int)(block.getRelY() * 10);
int gridCol = (int)(block.getRelX() * 10);
return String.format("B%02d%02d_%s_%d",
gridRow, gridCol, block.getType().substring(0,1), sequenceInCell);
}
/**
* Builds topology edges between spatially related blocks.
* These edges are the "structural fingerprint" used for matching.
*/
public List<TopologyEdge> buildTopology(List<NormalizedBlock> blocks) {
List<TopologyEdge> edges = new ArrayList<>();
for (NormalizedBlock a : blocks) {
for (NormalizedBlock b : blocks) {
if (a.getId().equals(b.getId())) continue;
double vertGap = b.getRelY() - (a.getRelY() + a.getRelH());
double horizGap = b.getRelX() - (a.getRelX() + a.getRelW());
if (Math.abs(vertGap) < 0.02 && Math.abs(a.getRelX() - b.getRelX()) < 0.05)
edges.add(new TopologyEdge("ABOVE", a.getId(), b.getId()));
else if (Math.abs(horizGap) < 0.02 && Math.abs(a.getRelY() - b.getRelY()) < 0.02)
edges.add(new TopologyEdge("LEFT_OF", a.getId(), b.getId()));
}
}
return edges;
}
private String extractAnchor(String ocrText) {
if (ocrText == null || ocrText.isBlank()) return "";
// Take first 3 significant tokens as the anchor pattern
return Arrays.stream(ocrText.trim().split("\\s+"))
.filter(w -> w.length() > 2)
.limit(3)
.collect(Collectors.joining(" "));
}
}
3. Structural Similarity Engine (Matching Algorithm)
This replaces Datacap's TIF pixel-comparison with a weighted structural score:
// StructuralSimilarityEngine.java
public class StructuralSimilarityEngine {
// Configurable weights for each comparison dimension
private static final double W_BLOCK_COUNT = 0.20;
private static final double W_TOPOLOGY = 0.35; // most important
private static final double W_TEXT_ANCHOR = 0.30;
private static final double W_RELATIVE_POS = 0.15;
/**
* Returns a score between 0.0 (no match) and 1.0 (perfect match).
* Threshold for acceptance: typically 0.75+
*/
public double computeSimilarity(DocumentLayout incoming, DocumentFingerprint candidate) {
double blockCountScore = scoreBlockCount(incoming, candidate);
double topologyScore = scoreTopology(incoming, candidate);
double textAnchorScore = scoreTextAnchors(incoming, candidate);
double relativePosScore = scoreRelativePositions(incoming, candidate);
return (W_BLOCK_COUNT * blockCountScore)
+ (W_TOPOLOGY * topologyScore)
+ (W_TEXT_ANCHOR * textAnchorScore)
+ (W_RELATIVE_POS * relativePosScore);
}
/** Score how closely the number of text/table/image blocks matches */
private double scoreBlockCount(DocumentLayout inc, DocumentFingerprint fp) {
int incText = inc.countByType("TEXT");
int fpText = fp.countByType("TEXT");
int incTable = inc.countByType("TABLE");
int fpTable = fp.countByType("TABLE");
double textRatio = 1.0 - Math.abs(incText - fpText) / (double) Math.max(fpText, 1);
double tableRatio = 1.0 - Math.abs(incTable - fpTable) / (double) Math.max(fpTable, 1);
return Math.max(0, (textRatio * 0.7) + (tableRatio * 0.3));
}
/** Score topology edge overlap - same ABOVE/LEFT_OF relationships */
private double scoreTopology(DocumentLayout inc, DocumentFingerprint fp) {
Set<String> fpEdgeKeys = fp.getTopologyEdgeKeys(); // e.g. "ABOVE:B0101_T:B0201_T"
Set<String> incEdgeKeys = inc.getTopologyEdgeKeys();
long matched = incEdgeKeys.stream()
.filter(e -> fuzzyEdgeMatch(e, fpEdgeKeys))
.count();
return fpEdgeKeys.isEmpty() ? 0 : (double) matched / fpEdgeKeys.size();
}
/** Check if a text anchor in an incoming block fuzzy-matches a fingerprint anchor */
private double scoreTextAnchors(DocumentLayout inc, DocumentFingerprint fp) {
int matchCount = 0;
int total = fp.getBlocks().size();
for (FingerprintBlock fpBlock : fp.getBlocks()) {
String fpAnchor = fpBlock.getTextAnchor();
if (fpAnchor == null || fpAnchor.isBlank()) { total--; continue; }
boolean found = inc.getBlocks().stream()
.filter(b -> spatiallyClose(b, fpBlock, 0.08)) // within 8% of page
.anyMatch(b -> fuzzyTextMatch(b.getOCRPreview(), fpAnchor, 0.80));
if (found) matchCount++;
}
return total == 0 ? 1.0 : (double) matchCount / total;
}
/** Score how closely relative block positions match */
private double scoreRelativePositions(DocumentLayout inc, DocumentFingerprint fp) {
List<Double> distances = new ArrayList<>();
for (FingerprintBlock fpB : fp.getBlocks()) {
inc.getBlocks().stream()
.min(Comparator.comparingDouble(b -> blockDistance(b, fpB)))
.ifPresent(closest -> distances.add(blockDistance(closest, fpB)));
}
double avgDist = distances.stream().mapToDouble(d -> d).average().orElse(1.0);
return Math.max(0, 1.0 - (avgDist / 0.15)); // 0.15 = max tolerated avg deviation
}
private double blockDistance(LayoutBlock a, FingerprintBlock b) {
double dx = a.getRelX() - b.getRelX();
double dy = a.getRelY() - b.getRelY();
return Math.sqrt(dx*dx + dy*dy);
}
private boolean spatiallyClose(LayoutBlock a, FingerprintBlock b, double tolerance) {
return Math.abs(a.getRelX() - b.getRelX()) < tolerance &&
Math.abs(a.getRelY() - b.getRelY()) < tolerance;
}
private boolean fuzzyEdgeMatch(String edge, Set<String> candidates) {
// Allow block ID mismatch up to 1 grid cell - same topology type must match exactly
return candidates.stream().anyMatch(c -> edgeTypeMatches(edge, c) && edgeProximityMatches(edge, c));
}
private boolean fuzzyTextMatch(String a, String b, double threshold) {
// Levenshtein ratio >= threshold
int dist = LevenshteinDistance.apply(a.toLowerCase(), b.toLowerCase());
int maxLen = Math.max(a.length(), b.length());
return maxLen == 0 || (1.0 - (double) dist / maxLen) >= threshold;
}
}
4. Datacap Task Action Integration
Register this as a custom Datacap task action that replaces the standard fingerprint search task:
// XMLFingerprintSearchAction.java - Datacap task action
@DatacapTaskAction(name = "XMLFingerprintSearch", category = "Classification")
public class XMLFingerprintSearchAction implements TaskAction {
@Override
public int execute(DCO dco, TaskActionConfig config) throws DatacapException {
String layoutXmlPath = dco.getVariable("LayoutXMLPath");
String fingerprintDir = config.getProperty("FingerprintDirectory");
double threshold = Double.parseDouble(config.getProperty("MatchThreshold", "0.75"));
// 1. Parse incoming layout XML
DocumentLayout incomingLayout = LayoutParser.parse(layoutXmlPath);
// 2. Normalize (relative coords + stable block IDs)
LayoutNormalizer normalizer = new LayoutNormalizer();
incomingLayout = normalizer.normalize(incomingLayout);
// 3. Load all fingerprint XMLs from store
List<DocumentFingerprint> fingerprints = FingerprintStore.loadAll(fingerprintDir);
// 4. Score each fingerprint
StructuralSimilarityEngine engine = new StructuralSimilarityEngine();
DocumentFingerprint bestMatch = null;
double bestScore = 0.0;
for (DocumentFingerprint fp : fingerprints) {
double score = engine.computeSimilarity(incomingLayout, fp);
if (score > bestScore) {
bestScore = score;
bestMatch = fp;
}
}
// 5. Apply result to DCO
if (bestMatch != null && bestScore >= threshold) {
dco.setDocumentClass(bestMatch.getDocumentClass());
dco.setVariable("FingerprintMatchScore", String.valueOf(bestScore));
dco.setVariable("MatchedFingerprintId", bestMatch.getId());
// Store block-to-field mapping for extraction
dco.setVariable("BlockFieldMap", bestMatch.serializeBlockFieldMap());
return TASK_RESULT_SUCCESS;
}
dco.setDocumentClass("UNKNOWN");
return TASK_RESULT_NONFATAL_ERROR;
}
}
5. Field Extraction by Block ID (replaces zone-based extraction)
// BlockBasedFieldExtractor.java
@DatacapTaskAction(name = "ExtractByBlockID", category = "Extraction")
public class BlockBasedFieldExtractor implements TaskAction {
@Override
public int execute(DCO dco, TaskActionConfig config) throws DatacapException {
String blockFieldMapJson = dco.getVariable("BlockFieldMap");
String layoutXmlPath = dco.getVariable("LayoutXMLPath");
Map<String, String> blockToField = BlockFieldMap.deserialize(blockFieldMapJson);
DocumentLayout layout = LayoutParser.parse(layoutXmlPath);
for (Map.Entry<String, String> entry : blockToField.entrySet()) {
String blockId = entry.getKey(); // e.g. "B0208_T_0"
String fieldName = entry.getValue(); // e.g. "InvoiceNumber"
LayoutBlock block = layout.findBlockById(blockId);
if (block == null) {
// Fallback: find block at same relative position ± tolerance
block = layout.findBlockNear(
layout.getFingerprint().getBlock(blockId).getRelX(),
layout.getFingerprint().getBlock(blockId).getRelY(),
0.05
);
}
if (block != null) {
String value = block.getOCRText().trim();
dco.setFieldValue(fieldName, value);
dco.setFieldConfidence(fieldName, block.getOCRConfidence());
}
}
return TASK_RESULT_SUCCESS;
}
}
6. Improved Repeatability - Key Techniques
To make the RRS layout XML generation consistent across near-identical documents, apply these in your Datacap profile:
| Technique |
Implementation |
| Deskew before recognition |
Enable AutoDeskew=True in RRS profile settings |
| Normalize DPI |
Resample all pages to 300 DPI before sending to RRS |
| Block merging threshold |
Set BlockMergeGap=5px to prevent fragmented text runs |
| Relative coordinates only |
Always divide by PageWidth/PageHeight post-parse |
| Stable block ID assignment |
Use spatial grid (10×10 cells) instead of emission order |
| Suppress orphan blocks |
Filter out blocks with RelW < 0.01 or RelH < 0.005 |
| Anchor normalization |
Lowercase, strip punctuation, trim before comparing |
Summary
The solution replaces Datacap's TIF fingerprint with a three-part structural XML pipeline: a normalized layout XML stored as the fingerprint (using relative coordinates and stable block IDs), a weighted structural similarity engine scoring block count, topology, text anchors, and relative position, and a block ID-based field extractor that replaces hardcoded zone positions. This makes classification completely independent of scan resolution or physical dimensions - if the document structure is the same, it will match.
------------------------------
Dibyansh Pandey
------------------------------
Original Message:
Sent: Mon April 20, 2026 08:20 AM
From: Martin Pistora
Subject: Datacap - searching for fingerprints instead of images based on layout XML
Datacap uses TIF image matching for fingerprint search. If we do not control the creation of scans, the dimensions of the images may be so different that the fingerprint cannot be found. Perhaps a structural comparison of layout XML files would give better results:
- instead of an image, a layout XML file would be stored in the "fingerprint"
- instead of a zone position, a block identification would be stored
- when searching for a "fingerprint", a layout XML file with the most similar block structure would be searched for
- when loading field values, a value from the corresponding block would be used instead of zones
For this to work, the repeatability of generating layout XML files would probably have to be improved so that the structure from almost identical patterns would be almost the same.
------------------------------
Martin Pistora
------------------------------