Skip to content

Implement pdf content extraction#170

Closed
dhx wants to merge 1 commit intonextcloud:mainfrom
dhx:enh/pdf_content_extraction
Closed

Implement pdf content extraction#170
dhx wants to merge 1 commit intonextcloud:mainfrom
dhx:enh/pdf_content_extraction

Conversation

@dhx
Copy link

@dhx dhx commented Dec 20, 2024

Currently pdf files can't be used as a text source in the assistant (while they work in context chat).

@dhx dhx force-pushed the enh/pdf_content_extraction branch 2 times, most recently from f616b07 to a3ea3a5 Compare December 20, 2024 21:56
Signed-off-by: dhx <dh.tx.dev@dhx.at>
@dhx dhx force-pushed the enh/pdf_content_extraction branch from a3ea3a5 to 69c6d11 Compare December 20, 2024 21:57
@dhx
Copy link
Author

dhx commented Dec 27, 2024

Hi @julien-nc I see you've added rtf support, so maybe you can take a look if this implementation for pdf looks good to you or if something is missing?

Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Can you rebase on the main branch?
There's a conflict with composer.lock. Just get rid of yours, rebase on main, install smalot/pdfparser and make a new commit.

@julien-nc
Copy link
Member

I made another PR with some adjustments. See #204
Thanks for your PR though!

@dhx
Copy link
Author

dhx commented Mar 8, 2025

@julien-nc thanks for taking it over!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants