Hi Vinicius
You are right, extracting one-page at a time clearly shows the empty page (which I need to detect).
I was extracting the complete pdf (all pages) in one pdfText-command and counting the empty lines in the resulting variable to keep track of the pages. Doing that, I was unable to detect the empty page.
But extracting the pdf page-by-page makes more sense so thanks a lot for this solution.
------------------------------
nordine vandezande
------------------------------
Original Message:
Sent: Tue March 02, 2021 07:49 AM
From: Vinicius Pinto Dutra
Subject: Convert PDF to Text: missing empty pages
Hi,
I can't find any problem with text extraction on my tests. By the way, that page 6 is empty in your PDF file.
My script extract one-page per time, I believe this way will be easier to solve your problem:
defVar --name pdf --type Pdf
defVar --name counter --type Numeric
defVar --name text --type String
pdfOpen --file "C:\\Users\\ViniciusPintoDutra\\Downloads\\PDF File.pdf" pdf=value
for --variable ${counter} --from 1 --to ${pdf.NumberOfPages} --step 1
pdfText --range "${counter}" --file ${pdf} text=value
writeToFile --value "Page ${counter}:\r\n\r\n${text}\r\n------------------------------------" --file "C:\\Users\\ViniciusPintoDutra\\Downloads\\PDF File.txt" --encoding "Default" --writeasnewline
next
pdfClose --file ${pdf}
------------------------------
Vinicius Pinto Dutra
IBM
Original Message:
Sent: Mon March 01, 2021 06:22 AM
From: nordine vandezande
Subject: Convert PDF to Text: missing empty pages
Hi
I need to find the exact page number where a specific sentence occurs in a PDF file. I am using the command pdfText to convert the file to a text variable and then parse this variable. It seems that the conversion is replacing each pagebreak with an empty line so I can use that to keep track of the processed pages but I noticed that there is no empty line when there is an empty page in the PDF file. Is this a bug?
I added an example of a pdf and the corresponding text-conversion to this post so hopefully this makes it clear. In this case, the empty page is on page 6 but it can occur on other locations too.
------------------------------
nordine vandezande
------------------------------