Alpha software — This project is in early development. Features may change, break, or be removed without notice. Use at your own risk and please report issues.
Convert academic PDFs to Obsidian-compatible Markdown.
Uses Marker for PDF conversion, then post-processes the output for Obsidian: proper [^n] footnotes, cleaned page artifacts, normalised headings, smart quotes, and YAML frontmatter.
pip install litmdRequires Marker installed separately:
pip install marker-pdf# Convert a PDF by path
litmd paper.pdf
# Convert by citation key (searches Zotero storage)
litmd Smith2024Example
# Multiple inputs
litmd Paper1.pdf Paper2.pdf Smith2024
# With local LLM for better accuracy
litmd --llm paper.pdf
# Best quality (larger model, slower)
litmd --best paper.pdf# Convert by file path
litmd ./path/to/paper.pdf
# Convert by Zotero citation key (BetterBibTeX)
litmd SmithYear
# Multiple files
litmd paper1.pdf paper2.pdf Smith2024Local LLMs improve accuracy for complex layouts, tables, and figures:
# Use local LLM (auto-detects LM Studio or Ollama)
litmd --llm paper.pdf
# Best quality (larger model)
litmd --best paper.pdf
# Disable LLM (overrides config default)
litmd --no-llm paper.pdfFor long documents like books:
# Convert specific pages (1-indexed, inclusive)
litmd --pages 1-50 book.pdf
litmd --pages 10-20,30-40 book.pdf # Multiple ranges
litmd --pages 50- book.pdf # Page 50 to end
# Split into chapters with labels
litmd --chapters "1-25:Intro,26-100:Methods" book.pdf
# Auto-detect chapters from PDF bookmarks
litmd --extract-chapters book.pdf
# Convert only chapters 1-10 (from bookmarks)
litmd --extract-chapters --from-chapter 1 --to-chapter 10 book.pdf
# Preview what would be converted (dry run)
litmd --dry-run --extract-chapters book.pdf
# Override page limits
litmd --no-limit large-document.pdfThe --dry-run flag shows available chapters and what would be converted without actually running the conversion — useful for exploring a PDF's structure first.
# Force OCR on all pages
litmd --ocr scanned-paper.pdf
# Show marker's progress bars
litmd --progress paper.pdf
# Specify output directory
litmd --output ~/Notes/Papers paper.pdf# List all BetterBibTeX-named PDFs in Zotero
litmd list
# Filter with grep
litmd list | grep -i ethics# Re-run postprocessing on already converted files
litmd lint Smith2024 # By citekey
litmd lint ~/Notes/paper.md # By path
litmd lint Smith2024 Jones2023 Paper.md # Multiple filesUseful after changing postprocess settings in config.
Configuration lives at ~/.config/litmd/config.toml.
# Show current config
litmd config
# Initialise config with defaults
litmd config --init
# Set a value
litmd config paths.output ~/Notes/Papers
litmd config llm.enabled true
litmd config llm.backend lmstudio[paths]
output = "~/Vault/References" # Where to write output files
zotero_storage = "~/Zotero/storage" # Zotero's storage folder
attachments = "" # Image folder (empty = <citekey>_attachments)
[llm]
backend = "auto" # auto | lmstudio | ollama | none
enabled = false # Use LLM by default
default_quality = "fast" # fast | best
model_fast = "Qwen/Qwen3-VL-8B-Instruct-GGUF" # Smaller, faster model
model_best = "Qwen/Qwen3-VL-32B-Instruct-GGUF" # Larger, more accurate model
lmstudio_url = "http://localhost:1234/v1"
ollama_model = "llama3.2-vision:11b"
[postprocess]
footnotes = true # Convert <sup> tags to [^n]
remove_artifacts = true # Remove page break logos
normalise_headings = true # Fix heading levels (#### → ##)
clean_metadata = true # Remove author blocks from body
detect_plain_footnotes = true # "text.14" → "text.[^14]"
compact_newlines = true # Remove excessive blank lines
smart_quotes = true # Straight → curly quotes
abstract_callout = false # Style abstract as Obsidian callout
title_case_headings = true # Convert ALL CAPS to Title Case
clean_heading_formatting = true # Remove **bold**/italics from headings
[frontmatter]
enabled = true
fields = ["title", "author", "citekey", "published"]
tags = ["literature"]
# Keywords from PDF metadata
keywords_property = "keywords" # Property to write keywords to
keywords_mode = "append" # "append" or "write" (overwrite)
keywords_case = "lower" # "lower", "upper", "title", "preserve"
keywords_separator = "-" # Replace spaces in keywords
remove_keywords_from_body = true
# Conversion timestamp (empty = disabled)
converted_property = "" # e.g., "converted" for tracking
[limits]
max_pages = 100 # Refuse above this (0 = no limit)
warn_pages = 50 # Warn above this threshold
chapter_folders = false # Create subfolders for chapters| Issue | Solution |
|---|---|
<sup>n</sup> footnotes |
Obsidian [^n] format with definitions at end |
| Plain-text footnote numbers | Detected and converted (e.g., text.14 → text.[^14]) |
| Page break logos | Removed (Springer, etc.) |
#### section headings |
Normalised to ## |
| ALL CAPS HEADINGS | Converted to Title Case |
| Bold headings | Bold formatting removed from all headings |
| Italic h1/h2 headings | Italic formatting removed from h1 and h2 |
| Author/address blocks | Cleaned from body |
| Escaped brackets in citations | Fixed |
| Excessive blank lines | Compacted (page breaks) |
| Straight quotes | Converted to curly quotes (via smartypants) |
| Spaces before footnotes | Removed (. [^1] → .[^1]) |
| PDF keywords | Extracted and written to frontmatter |
| Keywords in body | Optionally removed (first occurrence only) |
| Abstract section | Optionally styled as Obsidian callout |
| Images not saved | Copied to attachments folder with updated references |
Italics: PDF→Markdown conversion often loses italics because PDFs store them as font variants rather than explicit markers. Using --llm or --ocr may help in some cases.
Long documents: By default, litmd warns for PDFs over 50 pages and refuses over 100 pages. Use --pages, --chapters, or --no-limit for longer documents.
For better conversion accuracy, litmd supports local LLMs via LM Studio or Ollama.
- Download and open LM Studio
- Load a vision model (e.g.,
qwen3-vl-8b-instruct) - Start the local server
- Run
litmd --llm paper.pdf
ollama pull llama3.2-vision:11b
litmd --llm paper.pdfThe --llm flag auto-detects which backend is available, preferring LM Studio if running.
litmd config llm.enabled true
litmd config llm.default_quality fast # or "best"litmd/
├── cli.py # Command-line interface
├── convert.py # Core conversion pipeline
├── postprocess.py # Markdown cleanup and formatting
├── config.py # TOML configuration management
├── pages.py # Page range parsing and chapter utilities
├── metadata.py # PDF metadata extraction
├── llm.py # LLM backend detection (LM Studio, Ollama)
└── sources/ # Input handlers
├── path.py # File path input
└── zotero.py # Zotero storage lookup
- Input resolution: Determine if input is a file path or citation key
- PDF metadata extraction: Extract title, keywords, creation date
- Page limit checking: Warn or refuse if document exceeds limits
- Marker conversion: Run
marker_singleto convert PDF → Markdown - Post-processing: Clean up artifacts, convert footnotes, add smart quotes
- Frontmatter generation: Add YAML frontmatter with metadata
- Output: Write to configured output directory
# Clone the repository
git clone https://github.com/yourusername/litmd.git
cd litmd
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest- Marker — the PDF conversion engine
- obsidian-marker — Obsidian plugin (UI-based, no post-processing)
- Zotero — reference manager
- Better BibTeX — citation key management for Zotero
- LM Studio — local LLM runner
- Ollama — open-source LLM runner
MIT