Hi Ryan, Abbyy OCR returns text in Unicode encoding, so you must use \u2028
and \u2029
, for line breaks
A sample to split the text to list format
replaceText --texttoparse "${pdfText}" --useregex --pattern "\\r\\n?|\\n|\\u2028|\\u2029" --regexOptions "0" --replacement "|" pdfText=value
splitString --text "${pdfText}" --delimiteroption "CustomDelimiter" --customdelimiter "|" --count 3000 listText=value
------------------------------
Angelo Alves
------------------------------
Original Message:
Sent: Fri March 12, 2021 11:42 AM
From: Ryan Freedman
Subject: Delimiter in Output from Recognize Image Text or PDF
Hi,
I am reading a PDF document page by page and extracting the text from each page. The outputted text object looks like it has new line that I could use as a potential delimiter, but when I try to split the text by using \n it doesn't work. Was wondering if anyone has seen this before or if anyone has any advice on how to resolve this issue and split the data extracted from the PDF.
Thanks,
Ryan
------------------------------
Ryan Freedman
------------------------------