Robotic Process Automation (RPA)

Robotic Process Automation (RPA)

Come for answers. Stay for best practices. All we’re missing is you.

 View Only
  • 1.  IBM RPA Extraction of data from scanned PDF and put it excel/csv file

    Posted Thu July 24, 2025 09:29 AM

    Hello team,

    am stuck here and i need your help

    am extracting documents from pdf and insert into excel/csv

    reading the document from the pdf am okey with it. However inputing it into csv file am facing challenges.

    the may i have instructed my robot to insert the details seeems to be wrong

    anyone who can help please!

    let me share with you a script of my code

    "defVar --name PdfFile --type Pdf
    defVar --name Name --type String
    defVar --name DateOfBirth --type String
    defVar --name IdNo --type String
    defVar --name PhoneNumber --type String
    defVar --name PolicyNumber --type String
    defVar --name receive --type DataTable
    defVar --name claim --type Excel

    pdfOpen --file "C:\\Users\\Admin\\Desktop\\Bitbiz\\IBM RPA\\claimformTyped.pdf" PdfFile=value
    pdfRegionText --language "en-US" --region "211,217,87,37" --useocr  --ocrprovider "Google" --page 1 --dpix 110 --dpiy 110 --file ${PdfFile} Name=value
    pdfRegionText --language "en-US" --region "261,276,168,56" --useocr  --ocrprovider "Google" --page 1 --dpix 110 --dpiy 110 --file ${PdfFile} DateOfBirth=value
    pdfRegionText --language "en-US" --region "190,348,93,49" --useocr  --ocrprovider "Abbyy" --page 1 --dpix 110 --dpiy 110 --file ${PdfFile} IdNo=value
    pdfRegionText --language "en-US" --region "220,417,259,38" --useocr  --ocrprovider "Google" --page 1 --dpix 110 --dpiy 110 --file ${PdfFile} PhoneNumber=value
    pdfRegionText --language "en-US" --region "240,474,229,58" --useocr  --ocrprovider "Google" --page 1 --dpix 110 --dpiy 110 --file ${PdfFile} PolicyNumber=value

    excelOpen --file "C:\\Users\\Admin\\Desktop\\Bitbiz\\IBM RPA\\Claims Request Form.xlsx" claim=value
    excelSetTable --dataTable ${receive} --file ${claim} --sheet 1 --row 1 --column 1
    excelClose --file ${claim}"

    Besides using the bot to extract text from pdf document, i feel like the process is still manual coz i have copied the path location of the pdf document. i want it to be automated so that i dont have to use the file path and also declare the margins of the text.  i just want it if any scanned attachment is downloaded to a folder can detect and start the process of extracting data and save it to a file. i have attached the scanned pdf document the handwritten and the typed one

    I know i have nested so many questions but any help i will really appreciate



    ------------------------------
    John Okasiba
    ------------------------------

    Attachment(s)

    pdf
    claimform.pdf   17 KB 1 version
    pdf
    claimformTyped.pdf   15 KB 1 version


  • 2.  RE: IBM RPA Extraction of data from scanned PDF and put it excel/csv file

    Posted Mon July 28, 2025 12:58 PM

    PDFs can be awkward depending on the quality of the text.

    I use get text from PDF, read this in to a temporary text file and read the text file line by line in to a datatable or whatever other format I need. Probably not the most elegant solution but it works and I am using it in a large scale project extracting many thousands of PDF documents.

    For the file issue, I read the files in the folder in to a list making sure to only include the right file type with the right naming convention and then process them one by one.

    One thing I always do is read the list in to a file and keep a track in the file of whether the file is processed successfully. That way if there is a failure it is noted in the control file and restarting the script it can start at the next file to be processed. Let's face it these scripts sometimes do fail.

    Hope that helps.



    ------------------------------
    Sheridan Hindle
    ------------------------------



  • 3.  RE: IBM RPA Extraction of data from scanned PDF and put it excel/csv file

    Posted Mon July 28, 2025 02:29 PM

    Hi John!

    Let me try and give you some ideas on how this can be approached to make it more dynamic/robust.

    Regarding automatically identifying files, I suggest using a command like Get Files, which allows you to specify a folder to retrieve all the files that exist within it. In that case, you would need to inform all users that they should put the files in that folder and, ideally, it would only contain files that the robot should process.

    As to automatically extracting text from the PDF, my suggestion would be getting all the text from the document using a command like Get PDF Text by OCR, and then post-process it to extract the relevant values using text manipulation like regex or string splitting. For the PDF files that you attached, I would say this could work reasonably well. Things could get more complicated if you needed to extract fields from a table using OCR, for example, and in that case more specialized solutions or even an LLM would make things easier for you.

    Hope this helps you in getting closer to a solution!



    ------------------------------
    Vinicius Marques
    Data Engineer
    Music.AI
    Rio Grande
    ------------------------------