Skip to content

gavriilfakih/litmd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

litmd

Alpha software — This project is in early development. Features may change, break, or be removed without notice. Use at your own risk and please report issues.

Convert academic PDFs to Obsidian-compatible Markdown.

Uses Marker for PDF conversion, then post-processes the output for Obsidian: proper [^n] footnotes, cleaned page artifacts, normalised headings, smart quotes, and YAML frontmatter.

Installation

pip install litmd

Requires Marker installed separately:

pip install marker-pdf

Quick start

# Convert a PDF by path
litmd paper.pdf

# Convert by citation key (searches Zotero storage)
litmd Smith2024Example

# Multiple inputs
litmd Paper1.pdf Paper2.pdf Smith2024

# With local LLM for better accuracy
litmd --llm paper.pdf

# Best quality (larger model, slower)
litmd --best paper.pdf

Usage

Basic conversion

# Convert by file path
litmd ./path/to/paper.pdf

# Convert by Zotero citation key (BetterBibTeX)
litmd SmithYear

# Multiple files
litmd paper1.pdf paper2.pdf Smith2024

LLM-enhanced conversion

Local LLMs improve accuracy for complex layouts, tables, and figures:

# Use local LLM (auto-detects LM Studio or Ollama)
litmd --llm paper.pdf

# Best quality (larger model)
litmd --best paper.pdf

# Disable LLM (overrides config default)
litmd --no-llm paper.pdf

Page ranges and chapters

For long documents like books:

# Convert specific pages (1-indexed, inclusive)
litmd --pages 1-50 book.pdf
litmd --pages 10-20,30-40 book.pdf    # Multiple ranges
litmd --pages 50- book.pdf             # Page 50 to end

# Split into chapters with labels
litmd --chapters "1-25:Intro,26-100:Methods" book.pdf

# Auto-detect chapters from PDF bookmarks
litmd --extract-chapters book.pdf

# Convert only chapters 1-10 (from bookmarks)
litmd --extract-chapters --from-chapter 1 --to-chapter 10 book.pdf

# Preview what would be converted (dry run)
litmd --dry-run --extract-chapters book.pdf

# Override page limits
litmd --no-limit large-document.pdf

The --dry-run flag shows available chapters and what would be converted without actually running the conversion — useful for exploring a PDF's structure first.

Other options

# Force OCR on all pages
litmd --ocr scanned-paper.pdf

# Show marker's progress bars
litmd --progress paper.pdf

# Specify output directory
litmd --output ~/Notes/Papers paper.pdf

Listing available PDFs

# List all BetterBibTeX-named PDFs in Zotero
litmd list

# Filter with grep
litmd list | grep -i ethics

Re-linting existing files

# Re-run postprocessing on already converted files
litmd lint Smith2024                      # By citekey
litmd lint ~/Notes/paper.md               # By path
litmd lint Smith2024 Jones2023 Paper.md   # Multiple files

Useful after changing postprocess settings in config.

Configuration

Configuration lives at ~/.config/litmd/config.toml.

# Show current config
litmd config

# Initialise config with defaults
litmd config --init

# Set a value
litmd config paths.output ~/Notes/Papers
litmd config llm.enabled true
litmd config llm.backend lmstudio

Full config reference

[paths]
output = "~/Vault/References"          # Where to write output files
zotero_storage = "~/Zotero/storage"    # Zotero's storage folder
attachments = ""                        # Image folder (empty = <citekey>_attachments)

[llm]
backend = "auto"                        # auto | lmstudio | ollama | none
enabled = false                         # Use LLM by default
default_quality = "fast"                # fast | best
model_fast = "Qwen/Qwen3-VL-8B-Instruct-GGUF"   # Smaller, faster model
model_best = "Qwen/Qwen3-VL-32B-Instruct-GGUF"  # Larger, more accurate model
lmstudio_url = "http://localhost:1234/v1"
ollama_model = "llama3.2-vision:11b"

[postprocess]
footnotes = true                        # Convert <sup> tags to [^n]
remove_artifacts = true                 # Remove page break logos
normalise_headings = true               # Fix heading levels (#### → ##)
clean_metadata = true                   # Remove author blocks from body
detect_plain_footnotes = true           # "text.14" → "text.[^14]"
compact_newlines = true                 # Remove excessive blank lines
smart_quotes = true                     # Straight → curly quotes
abstract_callout = false                # Style abstract as Obsidian callout
title_case_headings = true              # Convert ALL CAPS to Title Case
clean_heading_formatting = true         # Remove **bold**/italics from headings

[frontmatter]
enabled = true
fields = ["title", "author", "citekey", "published"]
tags = ["literature"]

# Keywords from PDF metadata
keywords_property = "keywords"          # Property to write keywords to
keywords_mode = "append"                # "append" or "write" (overwrite)
keywords_case = "lower"                 # "lower", "upper", "title", "preserve"
keywords_separator = "-"                # Replace spaces in keywords
remove_keywords_from_body = true

# Conversion timestamp (empty = disabled)
converted_property = ""                 # e.g., "converted" for tracking

[limits]
max_pages = 100                         # Refuse above this (0 = no limit)
warn_pages = 50                         # Warn above this threshold
chapter_folders = false                 # Create subfolders for chapters

What it fixes

Issue Solution
<sup>n</sup> footnotes Obsidian [^n] format with definitions at end
Plain-text footnote numbers Detected and converted (e.g., text.14text.[^14])
Page break logos Removed (Springer, etc.)
#### section headings Normalised to ##
ALL CAPS HEADINGS Converted to Title Case
Bold headings Bold formatting removed from all headings
Italic h1/h2 headings Italic formatting removed from h1 and h2
Author/address blocks Cleaned from body
Escaped brackets in citations Fixed
Excessive blank lines Compacted (page breaks)
Straight quotes Converted to curly quotes (via smartypants)
Spaces before footnotes Removed (. [^1].[^1])
PDF keywords Extracted and written to frontmatter
Keywords in body Optionally removed (first occurrence only)
Abstract section Optionally styled as Obsidian callout
Images not saved Copied to attachments folder with updated references

Known limitations

Italics: PDF→Markdown conversion often loses italics because PDFs store them as font variants rather than explicit markers. Using --llm or --ocr may help in some cases.

Long documents: By default, litmd warns for PDFs over 50 pages and refuses over 100 pages. Use --pages, --chapters, or --no-limit for longer documents.

Local LLM support

For better conversion accuracy, litmd supports local LLMs via LM Studio or Ollama.

LM Studio (recommended for Apple Silicon)

  1. Download and open LM Studio
  2. Load a vision model (e.g., qwen3-vl-8b-instruct)
  3. Start the local server
  4. Run litmd --llm paper.pdf

Ollama

ollama pull llama3.2-vision:11b
litmd --llm paper.pdf

The --llm flag auto-detects which backend is available, preferring LM Studio if running.

Making LLM the default

litmd config llm.enabled true
litmd config llm.default_quality fast  # or "best"

Architecture

litmd/
├── cli.py          # Command-line interface
├── convert.py      # Core conversion pipeline
├── postprocess.py  # Markdown cleanup and formatting
├── config.py       # TOML configuration management
├── pages.py        # Page range parsing and chapter utilities
├── metadata.py     # PDF metadata extraction
├── llm.py          # LLM backend detection (LM Studio, Ollama)
└── sources/        # Input handlers
    ├── path.py     # File path input
    └── zotero.py   # Zotero storage lookup

Conversion pipeline

  1. Input resolution: Determine if input is a file path or citation key
  2. PDF metadata extraction: Extract title, keywords, creation date
  3. Page limit checking: Warn or refuse if document exceeds limits
  4. Marker conversion: Run marker_single to convert PDF → Markdown
  5. Post-processing: Clean up artifacts, convert footnotes, add smart quotes
  6. Frontmatter generation: Add YAML frontmatter with metadata
  7. Output: Write to configured output directory

Development

# Clone the repository
git clone https://github.com/yourusername/litmd.git
cd litmd

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

See also

Licence

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages