Skip to content

Convert Trade Republic PDF statements to CSV format for easy financial analysis and tracking

License

Notifications You must be signed in to change notification settings

kalix127/tradesight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

34 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Converts Trade Republic statements to CSV using an Ollama vision model. Built for Trade Republic PDFs; other layouts may work but are unsupported/untested.

๐Ÿ“š Table of Contents

๐Ÿš€ Overview

  • Vision-first pipeline: renders each PDF page to an image, enhances it, and asks an Ollama vision model to return table rows.
  • Language-flexible: keeps the headers/values exactly as they appear in the PDF (no translations); works across Trade Republic locales.
  • Summary-aware: skips account overviews, rollups, and liquidity/market-value summaries; focuses on the transaction tables.
  • Single file per PDF (CSV, XLSX, or JSON): <pdf_name>.<ext> with all rows in the original column order (headers normalized to lowercase).

๐Ÿ’ป Platform Support

OS Status Notes
Linux Supported / tested Primary development platform
Windows Supported / tested -
macOS (Apple Silicon) Not yet tested Planned

โš ๏ธ Disclaimers

  • Built and tested only on Trade Republic statements. Other banks may partly work if you adapt the prompts; no guarantees.
  • Personal utility: no warranties; use at your own risk.

๐Ÿงฐ Requirements

  1. Install Python 3.10+ from python.org
  2. Install Git from git-scm.com
  3. Install Ollama from ollama.com

๐Ÿ› ๏ธ Installation (Unix/macOS)

git clone https://github.com/kalix127/tradesight.git
cd tradesight
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Installing Poppler

Poppler is required for PDF processing. Check if you have it installed by calling pdftoppm -h in your terminal.

Ubuntu/Debian:

sudo apt-get install poppler-utils

Arch Linux:

sudo pacman -S poppler

macOS:

brew install poppler

Run setup to choose/pull a model and create settings.json (optional but recommended):

python3 setup.py

Ensure Ollama is running:

ollama serve

๐Ÿ› ๏ธ Installation (Windows)

Installation Steps

git clone https://github.com/kalix127/tradesight.git
cd tradesight
python -m venv venv
.\venv\Scripts\activate.bat
python -m pip install -r .\requirements.txt

Installing Poppler

Poppler is required for PDF processing.

Windows:

  1. Download the latest poppler package from @oschwartz10612 version which is the most up-to-date
  2. Move the extracted directory to the desired place on your system
  3. Add the bin/ directory to your PATH
  4. Test that all went well by opening cmd and making sure that you can call pdftoppm -h

Setup Configuration

Run the setup wizard to configure and pull a model:

python setup.py

The setup will automatically detect Ollama installation on Windows, even if it's not in your PATH.

Start Ollama

Ensure Ollama is running before processing PDFs:

ollama serve

โ–ถ๏ธ Usage

Basic run (default)

python3 main.py
  • Reads all PDFs in input_dir.
  • Writes <filename>.<ext> to output_dir.

Debug mode

python3 main.py --debug
  • Prints raw model responses and column info; slower but useful for tuning.

Start from a specific page (debug only)

python3 main.py --debug --page 3
  • Skips pages before the given (1-indexed) page; handy when focusing on a problem page without reprocessing the whole PDF.

โš™๏ธ Configuration

All config lives in settings.json. Keys:

Key Type Default Notes
input_dir string input Folder for PDFs.
output_dir string output Folder for outputs.
output_format string csv csv (default), xlsx, or json; configurable via setup.
max_response_chars int 8000 Truncate/retry if the model response exceeds this many characters.
max_tokens int 8000 Upper bound passed to Ollama (num_predict) to limit generation length.
save_page_images bool true When true, saves rendered page PNGs to <pdf>_images/ for debugging.
image.dpi int 300 Render DPI (higher = sharper, slower).
image.brightness float 1.5 Image brightening factor.
image.contrast float 1.5 Image contrast factor.
image.scale float 0.5 Scale factor (1.0 = no change; >1 upscale, <1 downscale); useful for small numerals.
parser.model string ministral-3:8b Ollama model name.
parser.ollama_url string http://localhost:11434 Ollama endpoint.
parser.temperature float 0 LLM randomness (lower = more deterministic).
parser.num_ctx int 8192 Context window passed to Ollama (num_ctx).
balance_tolerance float 0.00 Allowed running-balance delta when validating swaps; 0.00 = strict.
include_error_column bool true Include the error column (recommended; flags rows missing required fields).
include_balance_check_column bool false Include the balance_check column (shows swapped/mismatch flags from balance validation; edit settings.json to enable).
prompts/system_prompt.txt file editable User-facing prompt; can tweak wording for new layouts/languages.

You can override the model/URL at runtime: python3 main.py --model <name> --ollama-url <url>.

๐Ÿค– Recommended Models

  • Recommended: ministral-3:8b โ€” highest accuracy for Trade Republic PDFs; expect ~6GB vRAM (or ample system RAM with swap, slower). Use this to avoid messy/incorrect outputs.

๐Ÿ“ค Output

  • One file per PDF in the chosen format only (.csv, .xlsx, or .json array of rows).
  • Headers are normalized to lowercase but keep the PDF order; case-duplicate headers are collapsed.
  • Amount columns (in entrata, in uscita, saldo) are numeric strings without currency symbols.
  • If include_error_column is enabled (recommended), error = yes marks rows missing required fields (data, tipo, descrizione, saldo, and at least one of in entrata/in uscita).
  • If include_balance_check_column is enabled, balance_check shows swapped, swapped_in_out, or mismatch when balance validation intervened.
  • Rows exclude summary/overview/liquidity/portfolio tables; only transaction tables remain.

๐Ÿ› ๏ธ Troubleshooting

  • Missing small amounts: Increase image.dpi (e.g., 550โ€“600) and image.scale (e.g., 1.6โ€“1.8); keep contrast โ‰ฅ1.3. Use a stronger model if available.
  • Wrong/extra columns: Ensure system_prompt.txt wasnโ€™t edited incorrectly; headers are kept in PDF order and lowercased.
  • Summary rows showing up: Verify the PDF matches Trade Republic layout; liquidity/portfolio/overview blocks should be skipped by prompt and code. If a new layout appears, update system_prompt.txt accordingly.
  • Ollama not responding: Confirm ollama serve is running and the model is pulled.

๐Ÿ“„ License

MIT

About

Convert Trade Republic PDF statements to CSV format for easy financial analysis and tracking

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Languages