Extract text from moodle files

### Describe the feature

Before indexing we must extract text from pdf, and other formats. It will be good to keep some structure: from which page this text, and so on.

### Suggested solution

We can do it via [pymupdf4llm](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) or [pypdf](https://github.com/py-pdf/pypdf) fastly. By default it will extract just text, without processing text in images (OCR) and so on.

Also there are more accurate (but slow) solutions:
https://github.com/dantetemplar/pdf-extraction-agenda/

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extract text from moodle files #83

Describe the feature

Suggested solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extract text from moodle files #83

Description

Describe the feature

Suggested solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions