litmd

Alpha software — This project is in early development. Features may change, break, or be removed without notice. Use at your own risk and please report issues.

Convert academic PDFs to Obsidian-compatible Markdown.

Uses Marker for PDF conversion, then post-processes the output for Obsidian: proper [^n] footnotes, cleaned page artifacts, normalised headings, smart quotes, and YAML frontmatter.

Installation

pip install litmd

Requires Marker installed separately:

pip install marker-pdf

Quick start

# Convert a PDF by path
litmd paper.pdf

# Convert by citation key (searches Zotero storage)
litmd Smith2024Example

# Multiple inputs
litmd Paper1.pdf Paper2.pdf Smith2024

# With local LLM for better accuracy
litmd --llm paper.pdf

# Best quality (larger model, slower)
litmd --best paper.pdf

Usage

Basic conversion

# Convert by file path
litmd ./path/to/paper.pdf

# Convert by Zotero citation key (BetterBibTeX)
litmd SmithYear

# Multiple files
litmd paper1.pdf paper2.pdf Smith2024

LLM-enhanced conversion

Local LLMs improve accuracy for complex layouts, tables, and figures:

# Use local LLM (auto-detects LM Studio or Ollama)
litmd --llm paper.pdf

# Best quality (larger model)
litmd --best paper.pdf

# Disable LLM (overrides config default)
litmd --no-llm paper.pdf

Page ranges and chapters

For long documents like books:

# Convert specific pages (1-indexed, inclusive)
litmd --pages 1-50 book.pdf
litmd --pages 10-20,30-40 book.pdf    # Multiple ranges
litmd --pages 50- book.pdf             # Page 50 to end

# Split into chapters with labels
litmd --chapters "1-25:Intro,26-100:Methods" book.pdf

# Auto-detect chapters from PDF bookmarks
litmd --extract-chapters book.pdf

# Convert only chapters 1-10 (from bookmarks)
litmd --extract-chapters --from-chapter 1 --to-chapter 10 book.pdf

# Preview what would be converted (dry run)
litmd --dry-run --extract-chapters book.pdf

# Override page limits
litmd --no-limit large-document.pdf

The --dry-run flag shows available chapters and what would be converted without actually running the conversion — useful for exploring a PDF's structure first.

Other options

# Force OCR on all pages
litmd --ocr scanned-paper.pdf

# Show marker's progress bars
litmd --progress paper.pdf

# Specify output directory
litmd --output ~/Notes/Papers paper.pdf

Listing available PDFs

# List all BetterBibTeX-named PDFs in Zotero
litmd list

# Filter with grep
litmd list | grep -i ethics

Re-linting existing files

# Re-run postprocessing on already converted files
litmd lint Smith2024                      # By citekey
litmd lint ~/Notes/paper.md               # By path
litmd lint Smith2024 Jones2023 Paper.md   # Multiple files

Useful after changing postprocess settings in config.

Configuration

Configuration lives at ~/.config/litmd/config.toml.

# Show current config
litmd config

# Initialise config with defaults
litmd config --init

# Set a value
litmd config paths.output ~/Notes/Papers
litmd config llm.enabled true
litmd config llm.backend lmstudio

Full config reference

[paths]
output = "~/Vault/References"          # Where to write output files
zotero_storage = "~/Zotero/storage"    # Zotero's storage folder
attachments = ""                        # Image folder (empty = <citekey>_attachments)

[llm]
backend = "auto"                        # auto | lmstudio | ollama | none
enabled = false                         # Use LLM by default
default_quality = "fast"                # fast | best
model_fast = "Qwen/Qwen3-VL-8B-Instruct-GGUF"   # Smaller, faster model
model_best = "Qwen/Qwen3-VL-32B-Instruct-GGUF"  # Larger, more accurate model
lmstudio_url = "http://localhost:1234/v1"
ollama_model = "llama3.2-vision:11b"

[postprocess]
footnotes = true                        # Convert <sup> tags to [^n]
remove_artifacts = true                 # Remove page break logos
normalise_headings = true               # Fix heading levels (#### → ##)
clean_metadata = true                   # Remove author blocks from body
detect_plain_footnotes = true           # "text.14" → "text.[^14]"
compact_newlines = true                 # Remove excessive blank lines
smart_quotes = true                     # Straight → curly quotes
abstract_callout = false                # Style abstract as Obsidian callout
title_case_headings = true              # Convert ALL CAPS to Title Case
clean_heading_formatting = true         # Remove **bold**/italics from headings

[frontmatter]
enabled = true
fields = ["title", "author", "citekey", "published"]
tags = ["literature"]

# Keywords from PDF metadata
keywords_property = "keywords"          # Property to write keywords to
keywords_mode = "append"                # "append" or "write" (overwrite)
keywords_case = "lower"                 # "lower", "upper", "title", "preserve"
keywords_separator = "-"                # Replace spaces in keywords
remove_keywords_from_body = true

# Conversion timestamp (empty = disabled)
converted_property = ""                 # e.g., "converted" for tracking

[limits]
max_pages = 100                         # Refuse above this (0 = no limit)
warn_pages = 50                         # Warn above this threshold
chapter_folders = false                 # Create subfolders for chapters

What it fixes

Issue	Solution
`<sup>n</sup>` footnotes	Obsidian `[^n]` format with definitions at end
Plain-text footnote numbers	Detected and converted (e.g., `text.14` → `text.[^14]`)
Page break logos	Removed (Springer, etc.)
`####` section headings	Normalised to `##`
ALL CAPS HEADINGS	Converted to Title Case
Bold headings	Bold formatting removed from all headings
Italic h1/h2 headings	Italic formatting removed from h1 and h2
Author/address blocks	Cleaned from body
Escaped brackets in citations	Fixed
Excessive blank lines	Compacted (page breaks)
Straight quotes	Converted to curly quotes (via smartypants)
Spaces before footnotes	Removed (`. [^1]` → `.[^1]`)
PDF keywords	Extracted and written to frontmatter
Keywords in body	Optionally removed (first occurrence only)
Abstract section	Optionally styled as Obsidian callout
Images not saved	Copied to attachments folder with updated references

Known limitations

Italics: PDF→Markdown conversion often loses italics because PDFs store them as font variants rather than explicit markers. Using --llm or --ocr may help in some cases.

Long documents: By default, litmd warns for PDFs over 50 pages and refuses over 100 pages. Use --pages, --chapters, or --no-limit for longer documents.

Local LLM support

For better conversion accuracy, litmd supports local LLMs via LM Studio or Ollama.

LM Studio (recommended for Apple Silicon)

Download and open LM Studio
Load a vision model (e.g., qwen3-vl-8b-instruct)
Start the local server
Run litmd --llm paper.pdf

Ollama

ollama pull llama3.2-vision:11b
litmd --llm paper.pdf

The --llm flag auto-detects which backend is available, preferring LM Studio if running.

Making LLM the default

litmd config llm.enabled true
litmd config llm.default_quality fast  # or "best"

Architecture

litmd/
├── cli.py          # Command-line interface
├── convert.py      # Core conversion pipeline
├── postprocess.py  # Markdown cleanup and formatting
├── config.py       # TOML configuration management
├── pages.py        # Page range parsing and chapter utilities
├── metadata.py     # PDF metadata extraction
├── llm.py          # LLM backend detection (LM Studio, Ollama)
└── sources/        # Input handlers
    ├── path.py     # File path input
    └── zotero.py   # Zotero storage lookup

Conversion pipeline

Input resolution: Determine if input is a file path or citation key
PDF metadata extraction: Extract title, keywords, creation date
Page limit checking: Warn or refuse if document exceeds limits
Marker conversion: Run marker_single to convert PDF → Markdown
Post-processing: Clean up artifacts, convert footnotes, add smart quotes
Frontmatter generation: Add YAML frontmatter with metadata
Output: Write to configured output directory

Development

# Clone the repository
git clone https://github.com/yourusername/litmd.git
cd litmd

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/litmd		src/litmd
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

litmd

Installation

Quick start

Usage

Basic conversion

LLM-enhanced conversion

Page ranges and chapters

Other options

Listing available PDFs

Re-linting existing files

Configuration

Full config reference

What it fixes

Known limitations

Local LLM support

LM Studio (recommended for Apple Silicon)

Ollama

Making LLM the default

Architecture

Conversion pipeline

Development

See also

Licence

About

Uh oh!

Releases

Packages

Languages

License

gavriilfakih/litmd

Folders and files

Latest commit

History

Repository files navigation

litmd

Installation

Quick start

Usage

Basic conversion

LLM-enhanced conversion

Page ranges and chapters

Other options

Listing available PDFs

Re-linting existing files

Configuration

Full config reference

What it fixes

Known limitations

Local LLM support

LM Studio (recommended for Apple Silicon)

Ollama

Making LLM the default

Architecture

Conversion pipeline

Development

See also

Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages