CiteWright

Anybody else have a huge folder full of files with names like 235680_download.PDF and smith_et_al_2008_full.pdf(2)?

... yeah.

I wrote this because I got mass-downloading papers from Sci-Hub and then staring at a folder of cryptic filenames wondering which one was the paper about transformer attention mechanisms and which one was about soil bacteria. Life's too short.

What It Does

Strips text from documents and uses arXiv, Semantic Scholar, Crossref, PubMed, OpenLibrary, and Unpaywall to find the actual source
Renames files to Author_Year_Title.ext like a civilized person
Handles PDF, TXT, Markdown, DOC/DOCX, and Python files - throw it at it, let's find out
Maintains a BibTeX database so you don't have to
Logs everything, doesn't break anything, asks before doing anything destructive
Optionally uses a local LLM (Ollama) or cloud providers (OpenAI, Anthropic, Gemini) if the free APIs come up empty

The Philosophy

I built this with a "try the free stuff first" approach. Why pay for API calls when CrossRef is right there?

Tier	What Happens
1	Check if the PDF already has metadata embedded. Usually garbage, but sometimes you get lucky.
2	Extract DOIs, arXiv IDs, ISBNs from the text and look them up. This is where the magic happens.
3	Search academic APIs using whatever title/author text it can scrape. Works more often than you'd think.
4	(Optional) Throw the text at an LLM and ask nicely. Costs money unless you're running Ollama locally.

Installation

git clone https://github.com/lukeslp/citewright.git
cd citewright
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install .

Want the LLM-powered features and media processing?

pip install ".[all]"

Usage

Preview what would happen (dry run, safe):

citewright pdf ~/papers

Actually rename things:

citewright pdf ~/papers --execute

Go recursive and spit out a BibTeX file:

citewright pdf ~/papers -r --execute --bibtex library.bib

Let the LLM analyze the stubborn ones:

citewright pdf ~/papers --ai --execute

Rename photos and videos too (uses EXIF data):

citewright media ~/photos --execute

Use vision models to describe images:

citewright media ~/photos --ai --execute

Oh no go back:

citewright undo

Configuration

Config lives at ~/.config/citewright/config.json, or use the CLI:

citewright config --show
citewright config --ai-provider openai  # Select LLM provider
citewright config --ai-enabled
citewright config --unpaywall-email "you@example.com"

The Unpaywall email is optional but they appreciate it. Be cool.

License

MIT. Do whatever.

Author

Luke Steuber https://github.com/lukeslp luke@dr.eamer.dev

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src/citewright		src/citewright
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CiteWright

What It Does

The Philosophy

Installation

Usage

Configuration

License

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CiteWright

What It Does

The Philosophy

Installation

Usage

Configuration

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages