IBM Security Join our 16,000+ members as we work together to overcome the toughest challenges of cybersecurity. Join the Community
I need to get the content of a pdf file in string format. I didn't find an application or Python code that helped me with this case. Can they help me?
Hi, while we don't have an app to get the content of a PDF as a string, you may find the Image OCR Functions for IBM SOAR app helpful as it's able to interpret text from image files.
There is no package ready to extract PDF information into a file, sadly.
But, you could obtain the contents of a PDF file (or any attachment, really) using the REST API Functionality:
First, query to get all Incident Attachments using:
Obtain the ID of the desired attachment, and then get the content using:
Anyways you should be careful with this approach. Malicious code can be injected into PDF files and then executed upon reading it's contents. I'm not entirely sure if this can be triggered with GET actions specifically for PDS, but nonetheless you should still be cautious if you are going to read external /unsecure files.
Hello Pol, I was testing with the indicated function. I cannot obtain the content of the PDF file in plain text, but I can obtain it in JSON. I am attaching screenshots of the Playbook error and the configuration of the "Call REST API" function.
Thanks and regards!
Hi Pol and team,
Do they have a news for this topic?
I'm waiting, greetings!