Skip to content

Scape All Available PDF Metadata #35

@kylebd99

Description

@kylebd99

Currently, the embedding pipeline only scrapes a minimal subset of the metadata available in PDFs. We may as well retrieve all of the available data:

Image

This should just be a matter of updating the sqlite metadata table definition, updating the embedding pipeline to add that data into the metadata.json, and updating the generate_index_metadata.py script to insert it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions