Convert-PDFToText - possible bug #51

PrzemyslawKlys · 2024-04-28T07:05:05Z

Reported on linkedin to be verified

It seems that the function Convert-PDFToText is working a bit incorrect - I have to test further, but for the moment (in my environment) it works like this:

Assuming that PDF has multiple pages with PageText1, PageText2,.. PageTextN, after running the function I get the result where text from every next page has all the text from previous pages, smthng like "PageText1PageText1PageText2PageText1PageText2PageText3" for pdf of 3 pages.

It seems that (in my environment) I could fix it by explicitly declaring new TextExtractionStrategy for every call of GetTextFromPage

so, line 1754

[iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, $iTextExtractionStrategy) converted to [iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, [iText.Kernel.Pdf.Canvas.Parser.Listener.LocationTextExtractionStrategy]::new())

after this fix extraction worked as expected.

The text was updated successfully, but these errors were encountered:

PrzemyslawKlys added the bug Something isn't working label Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert-PDFToText - possible bug #51

Convert-PDFToText - possible bug #51

PrzemyslawKlys commented Apr 28, 2024

Convert-PDFToText - possible bug #51

Convert-PDFToText - possible bug #51

Comments

PrzemyslawKlys commented Apr 28, 2024