Skip to content

Conversation

@jtkiley
Copy link

@jtkiley jtkiley commented Aug 8, 2014

Fixes #24.

Obviously, this is the simple fix. When I looked at stopping Image from inheriting from Paragraph, I didn't get errors (and without this change, I still got the image hex in files). I'm still a little fuzzy on the finer points of the RTF spec and the reader's logic, so I probably need to clear that up before working on Image.

@watercrossing
Copy link
Contributor

That will do the trick, even though its a bit hackish... I don't know if people would like this but it might be useful to include a snippet: {Image stripped, 123 bytes} or some other information to the text file explaining that an image used to be here?

@jtkiley
Copy link
Author

jtkiley commented Aug 19, 2014

I agree that it's a specific and not-at-all pretty fix. I'm just not familiar enough with pyth and the finer points of the RTF format to intelligently make changes to the design.

As for the snippet, I do a lot of content analysis, and I use pyth to process RTFs into plain text. It's probably my specific research use case, but I'm wary of adding text into a document. Also, the images in my documents are an artifact of the data provider (not the original data). It may be a good option, though. If I were looking at documents with "real" embedded images, being able to capture that fact might lead to interesting results. I would guess that a lot of use cases would similarly be interested in at least knowing about images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hex (encoded images) in \pict control groups is not removed.

2 participants