๐ฌ๐ง English version
ย โขย
๐ฎ๐น Italian version
๐ Report a Bug
ย โขย
โจ Request a Feature
Converts Trade Republic statements to CSV using an Ollama vision model. Built for Trade Republic PDFs; other layouts may work but are unsupported/untested.
- ๐ Overview
- ๐ป Platform Support
โ ๏ธ Disclaimers- ๐งฐ Requirements
- ๐ ๏ธ Installation
- โ๏ธ Configuration
โถ๏ธ Usage- ๐ค Recommended Models
- ๐ค Output
- ๐ ๏ธ Troubleshooting
- ๐ Notes & Limitations
- ๐ License
- Vision-first pipeline: renders each PDF page to an image, enhances it, and asks an Ollama vision model to return table rows.
- Language-flexible: keeps the headers/values exactly as they appear in the PDF (no translations); works across Trade Republic locales.
- Summary-aware: skips account overviews, rollups, and liquidity/market-value summaries; focuses on the transaction tables.
- Single file per PDF (CSV, XLSX, or JSON):
<pdf_name>.<ext>with all rows in the original column order (headers normalized to lowercase).
| OS | Status | Notes |
|---|---|---|
| Linux | Supported / tested | Primary development platform |
| Windows | Supported / tested | - |
| macOS (Apple Silicon) | Not yet tested | Planned |
- Built and tested only on Trade Republic statements. Other banks may partly work if you adapt the prompts; no guarantees.
- Personal utility: no warranties; use at your own risk.
- Install Python 3.10+ from python.org
- Install Git from git-scm.com
- Install Ollama from ollama.com
git clone https://github.com/kalix127/tradesight.git
cd tradesight
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtPoppler is required for PDF processing. Check if you have it installed by calling pdftoppm -h in your terminal.
Ubuntu/Debian:
sudo apt-get install poppler-utilsArch Linux:
sudo pacman -S popplermacOS:
brew install popplerRun setup to choose/pull a model and create settings.json (optional but recommended):
python3 setup.pyEnsure Ollama is running:
ollama servegit clone https://github.com/kalix127/tradesight.git
cd tradesight
python -m venv venv
.\venv\Scripts\activate.bat
python -m pip install -r .\requirements.txtPoppler is required for PDF processing.
Windows:
- Download the latest poppler package from @oschwartz10612 version which is the most up-to-date
- Move the extracted directory to the desired place on your system
- Add the
bin/directory to your PATH - Test that all went well by opening cmd and making sure that you can call
pdftoppm -h
Run the setup wizard to configure and pull a model:
python setup.pyThe setup will automatically detect Ollama installation on Windows, even if it's not in your PATH.
Ensure Ollama is running before processing PDFs:
ollama servepython3 main.py- Reads all PDFs in
input_dir. - Writes
<filename>.<ext>tooutput_dir.
python3 main.py --debug- Prints raw model responses and column info; slower but useful for tuning.
python3 main.py --debug --page 3- Skips pages before the given (1-indexed) page; handy when focusing on a problem page without reprocessing the whole PDF.
All config lives in settings.json. Keys:
| Key | Type | Default | Notes |
|---|---|---|---|
input_dir |
string | input |
Folder for PDFs. |
output_dir |
string | output |
Folder for outputs. |
output_format |
string | csv |
csv (default), xlsx, or json; configurable via setup. |
max_response_chars |
int | 8000 |
Truncate/retry if the model response exceeds this many characters. |
max_tokens |
int | 8000 |
Upper bound passed to Ollama (num_predict) to limit generation length. |
save_page_images |
bool | true |
When true, saves rendered page PNGs to <pdf>_images/ for debugging. |
image.dpi |
int | 300 |
Render DPI (higher = sharper, slower). |
image.brightness |
float | 1.5 |
Image brightening factor. |
image.contrast |
float | 1.5 |
Image contrast factor. |
image.scale |
float | 0.5 |
Scale factor (1.0 = no change; >1 upscale, <1 downscale); useful for small numerals. |
parser.model |
string | ministral-3:8b |
Ollama model name. |
parser.ollama_url |
string | http://localhost:11434 |
Ollama endpoint. |
parser.temperature |
float | 0 |
LLM randomness (lower = more deterministic). |
parser.num_ctx |
int | 8192 |
Context window passed to Ollama (num_ctx). |
balance_tolerance |
float | 0.00 |
Allowed running-balance delta when validating swaps; 0.00 = strict. |
include_error_column |
bool | true |
Include the error column (recommended; flags rows missing required fields). |
include_balance_check_column |
bool | false |
Include the balance_check column (shows swapped/mismatch flags from balance validation; edit settings.json to enable). |
prompts/system_prompt.txt |
file | editable | User-facing prompt; can tweak wording for new layouts/languages. |
You can override the model/URL at runtime: python3 main.py --model <name> --ollama-url <url>.
- Recommended:
ministral-3:8bโ highest accuracy for Trade Republic PDFs; expect ~6GB vRAM (or ample system RAM with swap, slower). Use this to avoid messy/incorrect outputs.
- One file per PDF in the chosen format only (
.csv,.xlsx, or.jsonarray of rows). - Headers are normalized to lowercase but keep the PDF order; case-duplicate headers are collapsed.
- Amount columns (
in entrata,in uscita,saldo) are numeric strings without currency symbols. - If
include_error_columnis enabled (recommended),error = yesmarks rows missing required fields (data, tipo, descrizione, saldo, and at least one of in entrata/in uscita). - If
include_balance_check_columnis enabled,balance_checkshowsswapped,swapped_in_out, ormismatchwhen balance validation intervened. - Rows exclude summary/overview/liquidity/portfolio tables; only transaction tables remain.
- Missing small amounts: Increase
image.dpi(e.g., 550โ600) andimage.scale(e.g., 1.6โ1.8); keep contrast โฅ1.3. Use a stronger model if available. - Wrong/extra columns: Ensure
system_prompt.txtwasnโt edited incorrectly; headers are kept in PDF order and lowercased. - Summary rows showing up: Verify the PDF matches Trade Republic layout; liquidity/portfolio/overview blocks should be skipped by prompt and code. If a new layout appears, update
system_prompt.txtaccordingly. - Ollama not responding: Confirm
ollama serveis running and the model is pulled.