Improve fixing spaces and add new argument #30

mdbraber · 2022-01-30T20:55:36Z

Improve fixing spaces when seeing similar consecutive characters
Add argument to force fixing spaces
Strip possible newlines from end result

metebalci · 2022-02-01T11:17:16Z

pdftitle.py

            result += first_page[p]
            t += 1

+        # if the current character is the same as the previous


why is there a need for this ?

Suppose I'm looking for the title "Een scheiding" and in the page I find the string "Armoede\nEen scheiding" it will iterate over the page characters and find "e E" as a temporary result (replacing the newline with a presumed space), but when it hits the following 'e' it will decide it's not the title we're looking for ("eee" != "een"). This is false, because we can still be on track to find the title, but we should shift the window one character to the right and decide again which is what we're doing here (by not doing t+=1). Maybe I'm overlooking someting, but this solved this use case for me (refer to the trouw.nl PDF I sent separately as a test)

Improve fixing spaces and add new argument

8c58452

metebalci reviewed Feb 1, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve fixing spaces and add new argument #30

Improve fixing spaces and add new argument #30

Uh oh!

mdbraber commented Jan 30, 2022 •

edited

Loading

Uh oh!

metebalci Feb 1, 2022

Uh oh!

mdbraber Feb 1, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve fixing spaces and add new argument #30

Are you sure you want to change the base?

Improve fixing spaces and add new argument #30

Uh oh!

Conversation

mdbraber commented Jan 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

metebalci Feb 1, 2022

Choose a reason for hiding this comment

Uh oh!

mdbraber Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mdbraber commented Jan 30, 2022 •

edited

Loading

mdbraber Feb 1, 2022 •

edited

Loading