intake-document

A Python application to convert documents into markdown format using the OCR capabilities provided by Mistral.ai.

Features

Uses Mistral.ai's OCR API to analyze document content and convert into markdown
Processes document using Mistral batch mode to save cost
Uploads documents to Mistral as a separate step
Maintains document structure and hierarchy
Preserves formatting including:
- Headers and subheadings
- Paragraphs
- Lists (ordered and unordered)
- Tables with headers and data
The Mistral API returns results in clean markdown format for easy parsing and rendering
Handles complex layouts including multi-column text and mixed content
Maintains non-text images as references that can be downloaded separately

Installation

Requirements

Python 3.10 or higher
Mistral.ai API key

Install from source

# Clone the repository
git clone https://github.com/yourusername/intake-document.git
cd intake-document

# Install using pip in development mode
uv pip install -e .

Environment setup

Set your Mistral.ai API key as an environment variable:

export MISTRAL_API_KEY="your-api-key-here"

Configuration

The application follows the XDG Base Directory specification for configuration:

Configuration: ~/.config/intake-document/init.cfg
Data: ~/.local/share/intake-document/
Cache: ~/.cache/intake-document/
State: ~/.local/state/intake-document/

Configuration file example

[mistral]
api_key = your-api-key-here
batch_size = 5
max_retries = 3
timeout = 60

[app]
output_dir = ./output
log_level = INFO

Usage

The application can be used as a command-line tool to process documents:

# Process a single file
intake-document --input path/to/document.pdf --output-dir path/to/output

# Process all documents in a directory
intake-document --input path/to/document/folder --output-dir path/to/output

# Show the current configuration
intake-document --show-config

# Display help
intake-document --help

Command Line Arguments

-i, --input PATH                  Path to input file or directory
-o, --output-dir PATH             Output directory (default: ./output)
-c, --config PATH                 Path to config file 
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}  Set logging level (default: ERROR)
--show-config                     Show current configuration and exit
-h, --help                        Show help message and exit

Dependencies

mistralai - Official Python library for Mistral.ai
pydantic - Data validation and settings management
rich - Rich text and formatting for the terminal
structlog - Structured logging
typer - CLI creation with type hints
configparser - Configuration file parsing
xdg-base-dirs - XDG Base Directory specification

Development

Testing

# Install development dependencies
uv pip install pytest

# Run tests
pytest

Formatting and Linting

# Install development tools
pip install ruff mypy

# Run formatter
ruff format src/ tests/

# Run linter
ruff check src/ tests/

# Run type checking
mypy src/

License

This project is licensed under the MIT License - see the LICENSE file for details.

API Documentation

The Mistral.ai OCR API is documented at: https://docs.mistral.ai/capabilities/document/

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
docs		docs
output		output
reference		reference
specs		specs
src/intake_document		src/intake_document
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
SPEC.md		SPEC.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

intake-document

Features

Installation

Requirements

Install from source

Environment setup

Configuration

Configuration file example

Usage

Command Line Arguments

Dependencies

Development

Testing

Formatting and Linting

License

API Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

intake-document

Features

Installation

Requirements

Install from source

Environment setup

Configuration

Configuration file example

Usage

Command Line Arguments

Dependencies

Development

Testing

Formatting and Linting

License

API Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages