Skip to content

PyPDF2#15

Open
TomVeeDee wants to merge 2 commits intorschroll:masterfrom
TomVeeDee:master
Open

PyPDF2#15
TomVeeDee wants to merge 2 commits intorschroll:masterfrom
TomVeeDee:master

Conversation

@TomVeeDee
Copy link
Copy Markdown

I modified the code to use PyPDF2 instead of pypdf (which seems to be unmaintained). This was necessary to make the code run on my Gentoo system.
I thought you could be interested in this code.

Further minor modifications are:

  • I also increased the python recursive limit, because it was necessary to work correctly for one of my documents.
  • I also removed the initialize() statement, because this is not needed (anymore?).

unmaintained).
I also increased the python recursive limit, because it was necessary to
work correctly for one of my documents.
I also removed the initialize() statement, because this is not needed
(anymore?).
@rschroll
Copy link
Copy Markdown
Owner

Thanks! I didn't know about PyPDF2.

It looks like PyPDF2 is mostly API-compatible with pyPdf. What I'd
like to do is try importing PyPDF2, and fall back to pyPdf if it's not
found. The PdfFileReader initializers take different kwargs, so we'd
need a helper function that gets this right. Would you be interested
in doing this? If not, I'll give it a go, but it might not happen
right away.

Also, the README should be updated with the new dependency information.

Re the initialize() call: Maybe there's no point anymore, but I was
trying to support older versions of PdfMiner (which was made difficult
by that author breaking the API in a minor-version update). If it's
not causing any problems, I'd prefer to leave it in. If it is
problematic, we should remove all of the old PdfMiner support.

The recursion issue is odd. I don't think we're doing anything
recursive, but I guess one of the libraries could be. Would you mind
splitting that out into a separate pull request? If you could post a
stack trace from an example problem, I'd be interested in seeing what
when wrong, and if there's a non-recursive solution we should be using.
(This could be large, I imagine. A pastebin would be fine.)

Thanks again!

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this import is needed. Perhaps it's left over from diagnostics you were doing?

I have also removed the import of the write statements.
@TomVeeDee
Copy link
Copy Markdown
Author

Okay, in my latest commit I have removed the recursion statement and the diagnostics issue you pointed out in your earlier message. This should thus only contain the upgrade to PyPDF2. I have kept the removal of the .initialize() statement, because the code just crashes on my machine (i.e. with a newer version of pdfminer, currently at 20140328).

I don't know if I should create a new pull request. Or can you also import my commits from here? (I'm new to github).

@TomVeeDee
Copy link
Copy Markdown
Author

I actually just got a DPT-S1, which is far superior in annotating pdf-files. Therefore, I have no further need for the prsannots program, and thus I would not be inclined to be involved in writing a helper function.

@TomVeeDee
Copy link
Copy Markdown
Author

For one of the documents, I had trouble with the recursion issue: the code crashed in a pdfminer statement (I don't have the details anymore). It said the recursion limit had been exceeded, so I just increased it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants