Content Management and Capture

Content Management and Capture

Come for answers. Stay for best practices. All we’re missing is you.

 View Only
  • 1.  Datacap - searching for fingerprints instead of images based on layout XML

    Posted 26 days ago

    Datacap uses TIF image matching for fingerprint search. If we do not control the creation of scans, the dimensions of the images may be so different that the fingerprint cannot be found. Perhaps a structural comparison of layout XML files would give better results:

    • instead of an image, a layout XML file would be stored in the "fingerprint"
    • instead of a zone position, a block identification would be stored
    • when searching for a "fingerprint", a layout XML file with the most similar block structure would be searched for
    • when loading field values, a value from the corresponding block would be used instead of zones

    For this to work, the repeatability of generating layout XML files would probably have to be improved so that the structure from almost identical patterns would be almost the same.



    ------------------------------
    Martin Pistora
    ------------------------------


  • 2.  RE: Datacap - searching for fingerprints instead of images based on layout XML

    Posted 25 days ago

    Hi Martin, enChoice is an IBM Business Partner with deep expertise in the IBM Datacap platform. We've seen similar challenges and have tools that can replace traditional fingerprinting with very high accuracy! Feel free to reach out to me at aandrien@enchoice.com and I can connect you with our Datacap team who can walk you through this in more detail.



    ------------------------------
    Ashley Andrien
    ------------------------------



  • 3.  RE: Datacap - searching for fingerprints instead of images based on layout XML

    Posted 24 days ago

    Hi Martin, we have extensive knowledge of IBM Datacap but also have a data capture solution that is 100% AI based without the need for fingerprinting and the challenges that come with it.  I'd be happy to connect you with someone if you'd like to understand more.  Please reach out and we can discuss - fkuhn@intellichief.com



    ------------------------------
    Francis Kuhn
    ------------------------------



  • 4.  RE: Datacap - searching for fingerprints instead of images based on layout XML

    Posted 23 days ago


    Solution Overview

    The core problem is that Datacap's image-based fingerprint matching breaks when scan dimensions vary. The fix is to shift from pixel-space comparison to structural/semantic comparison using layout XML files generated by the Datacap recognition engine (FastDoc/RRS). This approach is dimension-agnostic and far more robust.


    Detailed Technical Solution

    1. Fingerprint XML Structure (replaces TIF image)

    Instead of storing a TIF snapshot, you store a normalized layout descriptor XML per document class. Here's the schema:

    <!-- fingerprint_invoice_v1.xml -->
    <DocumentFingerprint id="INV-001" version="1.0" created="2024-01-15">
      <Metadata>
        <DocumentClass>Invoice</DocumentClass>
        <Language>ENG</Language>
        <PageCount>1</PageCount>
        <LayoutHash>a3f7c2d9</LayoutHash>
      </Metadata>
    
      <BlockStructure totalBlocks="12" textBlocks="9" tableBlocks="2" imageBlocks="1">
    
        <!-- Each block uses relative coordinates (0.0–1.0) to be dimension-agnostic -->
        <Block id="B001" type="TEXT" role="HEADER">
          <RelativePosition x="0.05" y="0.03" w="0.60" h="0.06"/>
          <TextAnchor pattern="INVOICE" confidence="HIGH"/>
          <FieldMapping fieldName="DocumentTitle"/>
        </Block>
    
        <Block id="B002" type="TEXT" role="LABEL_VALUE">
          <RelativePosition x="0.60" y="0.08" w="0.35" h="0.04"/>
          <TextAnchor pattern="Invoice No\." confidence="HIGH"/>
          <FieldMapping fieldName="InvoiceNumber"/>
          <AdjacentBlock direction="RIGHT" id="B003"/>
        </Block>
    
        <Block id="B003" type="TEXT" role="VALUE">
          <RelativePosition x="0.75" y="0.08" w="0.20" h="0.04"/>
          <FieldMapping fieldName="InvoiceNumberValue"/>
        </Block>
    
        <Block id="B010" type="TABLE" role="LINE_ITEMS">
          <RelativePosition x="0.05" y="0.35" w="0.90" h="0.40"/>
          <ColumnHeaders count="5">
            <Column index="0" anchor="Description"/>
            <Column index="1" anchor="Qty"/>
            <Column index="2" anchor="Unit Price"/>
            <Column index="3" anchor="Tax"/>
            <Column index="4" anchor="Amount"/>
          </ColumnHeaders>
          <FieldMapping fieldName="LineItems" type="REPEATING"/>
        </Block>
    
      </BlockStructure>
    
      <!-- Topology graph: which blocks are spatially related -->
      <Topology>
        <Relation type="ABOVE" from="B001" to="B002"/>
        <Relation type="LEFT_OF" from="B002" to="B003"/>
        <Relation type="ABOVE" from="B003" to="B010"/>
      </Topology>
    
    </DocumentFingerprint>
    

    2. Layout XML Generator - Improved Repeatability

    This is the most critical piece. You must post-process the raw Datacap/RRS layout XML output to normalize it so that structurally identical documents always produce the same fingerprint regardless of DPI, scan size, or skew.

    // LayoutNormalizer.java - Datacap custom task action
    public class LayoutNormalizer {
    
        /**
         * Converts absolute pixel coordinates from RRS output
         * to relative (0.0–1.0) coordinates based on page dimensions.
         * This eliminates sensitivity to scan resolution.
         */
        public NormalizedBlock normalizeBlock(RRSBlock rawBlock, PageDimensions page) {
            NormalizedBlock nb = new NormalizedBlock();
            nb.setId(rawBlock.getId());
            nb.setType(rawBlock.getBlockType());
    
            // Convert absolute → relative coordinates
            nb.setRelX(rawBlock.getX() / (double) page.getWidth());
            nb.setRelY(rawBlock.getY() / (double) page.getHeight());
            nb.setRelW(rawBlock.getWidth() / (double) page.getWidth());
            nb.setRelH(rawBlock.getHeight() / (double) page.getHeight());
    
            // Extract a stable text anchor (first significant word or regex pattern)
            nb.setTextAnchor(extractAnchor(rawBlock.getOCRText()));
    
            return nb;
        }
    
        /**
         * Assign stable, repeatable block IDs based on spatial grid position,
         * NOT on the arbitrary order RRS emits them.
         * Grid: divide page into 10×10 cells; block ID = cell(row,col) + type + seq
         */
        public String assignStableBlockId(NormalizedBlock block, int sequenceInCell) {
            int gridRow = (int)(block.getRelY() * 10);
            int gridCol = (int)(block.getRelX() * 10);
            return String.format("B%02d%02d_%s_%d",
                gridRow, gridCol, block.getType().substring(0,1), sequenceInCell);
        }
    
        /**
         * Builds topology edges between spatially related blocks.
         * These edges are the "structural fingerprint" used for matching.
         */
        public List<TopologyEdge> buildTopology(List<NormalizedBlock> blocks) {
            List<TopologyEdge> edges = new ArrayList<>();
            for (NormalizedBlock a : blocks) {
                for (NormalizedBlock b : blocks) {
                    if (a.getId().equals(b.getId())) continue;
                    double vertGap = b.getRelY() - (a.getRelY() + a.getRelH());
                    double horizGap = b.getRelX() - (a.getRelX() + a.getRelW());
    
                    if (Math.abs(vertGap) < 0.02 && Math.abs(a.getRelX() - b.getRelX()) < 0.05)
                        edges.add(new TopologyEdge("ABOVE", a.getId(), b.getId()));
                    else if (Math.abs(horizGap) < 0.02 && Math.abs(a.getRelY() - b.getRelY()) < 0.02)
                        edges.add(new TopologyEdge("LEFT_OF", a.getId(), b.getId()));
                }
            }
            return edges;
        }
    
        private String extractAnchor(String ocrText) {
            if (ocrText == null || ocrText.isBlank()) return "";
            // Take first 3 significant tokens as the anchor pattern
            return Arrays.stream(ocrText.trim().split("\\s+"))
                         .filter(w -> w.length() > 2)
                         .limit(3)
                         .collect(Collectors.joining(" "));
        }
    }
    

    3. Structural Similarity Engine (Matching Algorithm)

    This replaces Datacap's TIF pixel-comparison with a weighted structural score:

    // StructuralSimilarityEngine.java
    public class StructuralSimilarityEngine {
    
        // Configurable weights for each comparison dimension
        private static final double W_BLOCK_COUNT    = 0.20;
        private static final double W_TOPOLOGY       = 0.35;  // most important
        private static final double W_TEXT_ANCHOR    = 0.30;
        private static final double W_RELATIVE_POS   = 0.15;
    
        /**
         * Returns a score between 0.0 (no match) and 1.0 (perfect match).
         * Threshold for acceptance: typically 0.75+
         */
        public double computeSimilarity(DocumentLayout incoming, DocumentFingerprint candidate) {
    
            double blockCountScore   = scoreBlockCount(incoming, candidate);
            double topologyScore     = scoreTopology(incoming, candidate);
            double textAnchorScore   = scoreTextAnchors(incoming, candidate);
            double relativePosScore  = scoreRelativePositions(incoming, candidate);
    
            return (W_BLOCK_COUNT   * blockCountScore)
                 + (W_TOPOLOGY      * topologyScore)
                 + (W_TEXT_ANCHOR   * textAnchorScore)
                 + (W_RELATIVE_POS  * relativePosScore);
        }
    
        /** Score how closely the number of text/table/image blocks matches */
        private double scoreBlockCount(DocumentLayout inc, DocumentFingerprint fp) {
            int incText  = inc.countByType("TEXT");
            int fpText   = fp.countByType("TEXT");
            int incTable = inc.countByType("TABLE");
            int fpTable  = fp.countByType("TABLE");
    
            double textRatio  = 1.0 - Math.abs(incText  - fpText)  / (double) Math.max(fpText,  1);
            double tableRatio = 1.0 - Math.abs(incTable - fpTable) / (double) Math.max(fpTable, 1);
    
            return Math.max(0, (textRatio * 0.7) + (tableRatio * 0.3));
        }
    
        /** Score topology edge overlap - same ABOVE/LEFT_OF relationships */
        private double scoreTopology(DocumentLayout inc, DocumentFingerprint fp) {
            Set<String> fpEdgeKeys  = fp.getTopologyEdgeKeys();   // e.g. "ABOVE:B0101_T:B0201_T"
            Set<String> incEdgeKeys = inc.getTopologyEdgeKeys();
    
            long matched = incEdgeKeys.stream()
                .filter(e -> fuzzyEdgeMatch(e, fpEdgeKeys))
                .count();
    
            return fpEdgeKeys.isEmpty() ? 0 : (double) matched / fpEdgeKeys.size();
        }
    
        /** Check if a text anchor in an incoming block fuzzy-matches a fingerprint anchor */
        private double scoreTextAnchors(DocumentLayout inc, DocumentFingerprint fp) {
            int matchCount = 0;
            int total = fp.getBlocks().size();
    
            for (FingerprintBlock fpBlock : fp.getBlocks()) {
                String fpAnchor = fpBlock.getTextAnchor();
                if (fpAnchor == null || fpAnchor.isBlank()) { total--; continue; }
    
                boolean found = inc.getBlocks().stream()
                    .filter(b -> spatiallyClose(b, fpBlock, 0.08))  // within 8% of page
                    .anyMatch(b -> fuzzyTextMatch(b.getOCRPreview(), fpAnchor, 0.80));
    
                if (found) matchCount++;
            }
            return total == 0 ? 1.0 : (double) matchCount / total;
        }
    
        /** Score how closely relative block positions match */
        private double scoreRelativePositions(DocumentLayout inc, DocumentFingerprint fp) {
            List<Double> distances = new ArrayList<>();
            for (FingerprintBlock fpB : fp.getBlocks()) {
                inc.getBlocks().stream()
                   .min(Comparator.comparingDouble(b -> blockDistance(b, fpB)))
                   .ifPresent(closest -> distances.add(blockDistance(closest, fpB)));
            }
            double avgDist = distances.stream().mapToDouble(d -> d).average().orElse(1.0);
            return Math.max(0, 1.0 - (avgDist / 0.15));  // 0.15 = max tolerated avg deviation
        }
    
        private double blockDistance(LayoutBlock a, FingerprintBlock b) {
            double dx = a.getRelX() - b.getRelX();
            double dy = a.getRelY() - b.getRelY();
            return Math.sqrt(dx*dx + dy*dy);
        }
    
        private boolean spatiallyClose(LayoutBlock a, FingerprintBlock b, double tolerance) {
            return Math.abs(a.getRelX() - b.getRelX()) < tolerance &&
                   Math.abs(a.getRelY() - b.getRelY()) < tolerance;
        }
    
        private boolean fuzzyEdgeMatch(String edge, Set<String> candidates) {
            // Allow block ID mismatch up to 1 grid cell - same topology type must match exactly
            return candidates.stream().anyMatch(c -> edgeTypeMatches(edge, c) && edgeProximityMatches(edge, c));
        }
    
        private boolean fuzzyTextMatch(String a, String b, double threshold) {
            // Levenshtein ratio >= threshold
            int dist = LevenshteinDistance.apply(a.toLowerCase(), b.toLowerCase());
            int maxLen = Math.max(a.length(), b.length());
            return maxLen == 0 || (1.0 - (double) dist / maxLen) >= threshold;
        }
    }
    

    4. Datacap Task Action Integration

    Register this as a custom Datacap task action that replaces the standard fingerprint search task:

    // XMLFingerprintSearchAction.java - Datacap task action
    @DatacapTaskAction(name = "XMLFingerprintSearch", category = "Classification")
    public class XMLFingerprintSearchAction implements TaskAction {
    
        @Override
        public int execute(DCO dco, TaskActionConfig config) throws DatacapException {
    
            String layoutXmlPath   = dco.getVariable("LayoutXMLPath");
            String fingerprintDir  = config.getProperty("FingerprintDirectory");
            double threshold       = Double.parseDouble(config.getProperty("MatchThreshold", "0.75"));
    
            // 1. Parse incoming layout XML
            DocumentLayout incomingLayout = LayoutParser.parse(layoutXmlPath);
    
            // 2. Normalize (relative coords + stable block IDs)
            LayoutNormalizer normalizer = new LayoutNormalizer();
            incomingLayout = normalizer.normalize(incomingLayout);
    
            // 3. Load all fingerprint XMLs from store
            List<DocumentFingerprint> fingerprints = FingerprintStore.loadAll(fingerprintDir);
    
            // 4. Score each fingerprint
            StructuralSimilarityEngine engine = new StructuralSimilarityEngine();
            DocumentFingerprint bestMatch = null;
            double bestScore = 0.0;
    
            for (DocumentFingerprint fp : fingerprints) {
                double score = engine.computeSimilarity(incomingLayout, fp);
                if (score > bestScore) {
                    bestScore = score;
                    bestMatch = fp;
                }
            }
    
            // 5. Apply result to DCO
            if (bestMatch != null && bestScore >= threshold) {
                dco.setDocumentClass(bestMatch.getDocumentClass());
                dco.setVariable("FingerprintMatchScore", String.valueOf(bestScore));
                dco.setVariable("MatchedFingerprintId", bestMatch.getId());
                // Store block-to-field mapping for extraction
                dco.setVariable("BlockFieldMap", bestMatch.serializeBlockFieldMap());
                return TASK_RESULT_SUCCESS;
            }
    
            dco.setDocumentClass("UNKNOWN");
            return TASK_RESULT_NONFATAL_ERROR;
        }
    }
    

    5. Field Extraction by Block ID (replaces zone-based extraction)

    // BlockBasedFieldExtractor.java
    @DatacapTaskAction(name = "ExtractByBlockID", category = "Extraction")
    public class BlockBasedFieldExtractor implements TaskAction {
    
        @Override
        public int execute(DCO dco, TaskActionConfig config) throws DatacapException {
    
            String blockFieldMapJson = dco.getVariable("BlockFieldMap");
            String layoutXmlPath    = dco.getVariable("LayoutXMLPath");
    
            Map<String, String> blockToField = BlockFieldMap.deserialize(blockFieldMapJson);
            DocumentLayout layout = LayoutParser.parse(layoutXmlPath);
    
            for (Map.Entry<String, String> entry : blockToField.entrySet()) {
                String blockId   = entry.getKey();   // e.g. "B0208_T_0"
                String fieldName = entry.getValue();  // e.g. "InvoiceNumber"
    
                LayoutBlock block = layout.findBlockById(blockId);
                if (block == null) {
                    // Fallback: find block at same relative position ± tolerance
                    block = layout.findBlockNear(
                        layout.getFingerprint().getBlock(blockId).getRelX(),
                        layout.getFingerprint().getBlock(blockId).getRelY(),
                        0.05
                    );
                }
    
                if (block != null) {
                    String value = block.getOCRText().trim();
                    dco.setFieldValue(fieldName, value);
                    dco.setFieldConfidence(fieldName, block.getOCRConfidence());
                }
            }
    
            return TASK_RESULT_SUCCESS;
        }
    }
    

    6. Improved Repeatability - Key Techniques

    To make the RRS layout XML generation consistent across near-identical documents, apply these in your Datacap profile:

    Technique Implementation
    Deskew before recognition Enable AutoDeskew=True in RRS profile settings
    Normalize DPI Resample all pages to 300 DPI before sending to RRS
    Block merging threshold Set BlockMergeGap=5px to prevent fragmented text runs
    Relative coordinates only Always divide by PageWidth/PageHeight post-parse
    Stable block ID assignment Use spatial grid (10×10 cells) instead of emission order
    Suppress orphan blocks Filter out blocks with RelW < 0.01 or RelH < 0.005
    Anchor normalization Lowercase, strip punctuation, trim before comparing

    Summary

    The solution replaces Datacap's TIF fingerprint with a three-part structural XML pipeline: a normalized layout XML stored as the fingerprint (using relative coordinates and stable block IDs), a weighted structural similarity engine scoring block count, topology, text anchors, and relative position, and a block ID-based field extractor that replaces hardcoded zone positions. This makes classification completely independent of scan resolution or physical dimensions - if the document structure is the same, it will match.



    ------------------------------
    Dibyansh Pandey
    ------------------------------



  • 5.  RE: Datacap - searching for fingerprints instead of images based on layout XML

    Posted 23 days ago

    Thanks, I suspected there were such solutions.
    It was just a challenge for IBM to include it in Datacap. So that with just a small modification of the application it would be possible to use "structural fingerprints", use Datacap Navigator to mark field zones in them, etc.
    Just as easily as the application can be modified to another OCR engine.



    ------------------------------
    Martin Pistora
    ------------------------------