Robotic Process Automation (RPA)

 View Only
  • 1.  WDG PDF table extraction OCR capabilities

    Posted Tue November 10, 2020 11:02 AM
    I have a requirement to extract table data from some PDF invoices, and from what I can see, the OCR commands available can only really pull out text (words, phrases etc).

    So the question is, am I missing something that is available in the toolbox?

    ------------------------------
    Justin Phillips
    ------------------------------


  • 2.  RE: WDG PDF table extraction OCR capabilities

    Posted Thu November 12, 2020 11:04 AM
    Hi Justin!

    Not 100% sure what you're looking for, but you could use Get Text from PDF command from the toolbox to extract all the data (text, numbers, special characters) from the PDF. The table data will be in different rows in the text extracted and of course you would need to parse through the extracted text to find what you need, but that's one of the options here.

    // Jukka

    ------------------------------
    Jukka Juselius
    WebSphere Solution Architect
    IBM
    Helsinki
    +358 50 317 6611
    ------------------------------



  • 3.  RE: WDG PDF table extraction OCR capabilities

    Posted Fri November 13, 2020 04:45 AM
    Hi, yes I have been using the Get Text from PDF command and using regex to extract the data. That works well, but was hoping to use the OCR command to extract table data in a structured form. However, I am having issues with it even pulling field data out with the invoices I have, but I will raise a ticket for that.

    ------------------------------
    Justin Phillips
    ------------------------------



  • 4.  RE: WDG PDF table extraction OCR capabilities

    Posted Tue November 17, 2020 03:48 AM
    Hi Justin!

    Just FYI if you have not watched this yet, there's quite a nice video going through the basic OCR capabilities within the tool: https://youtu.be/_CfB-YtwawI

    ------------------------------
    Jukka Juselius
    Senior Solution Architect - IBM EMEA
    WDG Automation Program Lead for EMEA
    IBM Digital Business Automation
    ------------------------------



  • 5.  RE: WDG PDF table extraction OCR capabilities

    Posted Fri July 30, 2021 08:50 AM
    Hey Justin,
    I'm also trying to get line items from a PDF invoice so I was wondering if you found any way around to do so?

    ------------------------------
    xavier nicolas
    ------------------------------