whatsthedamage

An opinionated open source tool written in Python to process bank account transaction exports in CSV files.

Efforts were made to be able to customize the behavior and potentially work with any CSV format finance companies may produce.

The project contains a command line tool as well as a web interface for easier usage.

An experimental Machine Learning model is also available to help reducing the burden of writing regular expressions.

The slang phrase "what's the damage?" is often used to ask about the cost or price of something, typically in a casual or informal context. The phrase is commonly used in social settings, especially when discussing expenses or the results of an event.

Features

Multi-account support: Process CSV exports containing multiple accounts, each with separate currency metadata.
Categorizes transactions into well known accounting categories.
Categorizes transactions into custom categories by using regular expressions.
Calculator pattern for extensible custom transaction calculations.
Transactions can be filtered by start and end dates. If no filter is set, grouping is based on the number of months.
Shows a report about the summarized amounts grouped by transaction categories, including Total Spendings.
Reports can be saved into CSV or HTML files with interactive DataTable visualization (sorting, searching).
Localization support. Currently English (default) and Hungarian languages are supported.
Web interface for easier use.
REST API for programmatic access and integrations.

Example output on console. The values in the following example are arbitrary.

                         January          February
Balance            129576.00 HUF    1086770.00 HUF
Vehicle           -106151.00 HUF     -54438.00 HUF
Clothes            -14180.00 HUF          0.00 HUF
Deposit            725313.00 HUF    1112370.00 HUF
Fee                 -2494.00 HUF      -2960.00 HUF
Grocery           -172257.00 HUF    -170511.00 HUF
Health             -12331.00 HUF     -25000.00 HUF
Home Maintenance        0.00 HUF     -43366.00 HUF
Interest                5.00 HUF          8.00 HUF
Loan               -59183.00 HUF     -59183.00 HUF
Other              -86411.00 HUF     -26582.00 HUF
Payment            -25500.00 HUF     583580.00 HUF
Refund                890.00 HUF        890.00 HUF
Transfer                0.00 HUF          0.00 HUF
Utility            -68125.00 HUF     -78038.00 HUF
Withdrawal         -50000.00 HUF    -150000.00 HUF

Calculator Pattern for Extensibility

whatsthedamage now supports a calculator pattern that allows you to define custom transaction calculations beyond the built-in categorization. This makes the tool extensible for specific business logic or custom reporting needs.

For implementation examples, see calculator_pattern_example.py in the documentation.

Machine Learning categorization (experimental)

Writing regular expressions might be easy for IT professionals, but it is definitely hard or even impossible for others. Maintaining them can also be challenging, even for professionals.

Using a machine learning model can automatically learn patterns from a given transaction history, making categorization faster and probably more accurate without manual rule creation.

If you want to read more about the ML model used by whatsthedamage, check out its own README.md file.

The repository has an experimental pre-built model.

The model currently relies on the English language. Language-agnostic models are planned for the future.

Warning

The model is expected to be opinionated. Predicted categories could be completely wrong.
The model is currently persisted using 'joblib', which may pose a security risk of executing arbitrary code upon loading. Use the model you trust; use it at your own risk.

Try experimenting with it by providing the --ml command line argument to whatsthedamage.

Architecture Overview

whatsthedamage provides three interfaces for different use cases:

Command-Line Interface (CLI) - For local, interactive use and automation scripts
Web Interface - Browser-based UI for users who prefer forms over terminal commands
REST API - Programmatic access for integrations, CI/CD pipelines, and external applications

All three interfaces share the same core business logic through a well-defined service layer (including ProcessingService, ValidationService, ConfigurationService, and others), ensuring consistent transaction processing regardless of how you access the tool. The architecture was introduced in version 0.8.0 and further enhanced in 0.9.0 with v2 processing pipeline, multi-account support, and performance optimizations. The unified data format (DataTablesResponse) ensures consistency across all clients: CLI, Web, and API.

Caching Strategy

The web interface implements a 10-minute caching strategy to improve performance and user experience:

Note: The current implementation does not automatically reprocess files on cache miss. Users need to re-upload and process their files if the cache expires. The file references are stored for a later automatic reprocessing. It is not decided yet.

Interface Comparison

Feature	CLI	Web UI	REST API
Access Method	Terminal commands	Browser forms	HTTP requests
Authentication	None	None	None
Input	File paths	File upload	Multipart form data
Output	Console/CSV/HTML	HTML page	JSON
Use Case	Local analysis, scripts	Ad-hoc exploration	Automation, integrations
Requires Server	❌ No	✅ Yes	✅ Yes
Interactive	✅ Yes	✅ Yes	❌ No (stateless)

When to Use What?

Use the CLI when:

Running locally on your machine
Automating with shell scripts
Processing files in batch
Integrating with terminal workflows
You prefer command-line tools

Use the Web UI when:

You prefer graphical interfaces
Sharing access with non-technical users
Quick ad-hoc analysis without installing anything
You want interactive table features (sorting, searching)

Use the REST API when:

Integrating with other applications
Building custom frontends
Automating in CI/CD pipelines
Processing transactions from external systems
Need programmatic access with JSON responses

For complete REST API documentation, see API.md.

Install

This chapter describes how to install whatsthedamage in production. For development purposes check out the Development chapter.

Manual install

The package is published to https://pypi.org/project/whatsthedamage/ therefore you can use pip / pipx to install it.

$ pipx install whatsthedamage
$ pip install --user whatsthedamage

The web interface requires you to start WSGI server (ie. gunicorn) manually.

Gunicorn requires either a configuration file or proper command line arguments passed when invoked from command line.

The repository contains an example gunicorn_conf.py you can use out of the box.

$ cd
$ gunicorn --config gunicorn_conf.py whatsthedamage.app:app

Docker image

There is also an experimental Docker image you can use hosted on GitHub.

$ docker run --rm -ti --publish 5000:5000/tcp ghcr.io/abalage/whatsthedamage:latest

You can access the web interface on http://localhost:5000.

Usage:

usage: whatsthedamage [-h] [--start-date START_DATE] [--end-date END_DATE] [--verbose] [--version] [--config CONFIG] [--category CATEGORY] [--output OUTPUT] [--output-format OUTPUT_FORMAT] [--nowrap]
                      [--filter FILTER] [--lang LANG] [--training-data] [--ml]
                      filename

A CLI tool to process bank account transaction exports in CSV files.

positional arguments:
  filename              The CSV file to read.

options:
  -h, --help            show this help message and exit
  --start-date START_DATE
                        Start date (e.g. YYYY.MM.DD.)
  --end-date END_DATE   End date (e.g. YYYY.MM.DD.)
  --verbose, -v         Print categorized rows for troubleshooting.
  --version             Show the version of the program.
  --config, -c CONFIG   Path to the configuration file.
  --category CATEGORY   The attribute to categorize by. (default: category)
  --output, -o OUTPUT   Save the result into a CSV file with the specified filename.
  --output-format OUTPUT_FORMAT
                        Supported formats are: html, csv. (default: csv).
  --nowrap, -n          Do not wrap the output text. Useful for viewing the output without line wraps.
  --filter, -f FILTER   Filter by category. Use it in conjunction with --verbose.
  --lang, -l LANG       Language for localization.
  --training-data       Print training data in JSON format to STDERR. Use 2> redirection to save it to a file.
  --ml                  Use machine learning for categorization instead of regular expressions. (experimental)

Configuration File

The config file format and syntax has considerably changed in v0.6.0 (JSON to YAML). Please refer to the default config file for details.

A default configuration file is provided as config.yml.default.

If you do not want to create a configuration file then you can try the experimental Machine Learning mode to categorize transactions.

Troubleshooting

To troubleshoot why a transaction was assigned to a particular category, enable verbose mode using the -v or --verbose command line option.
By default, only the attributes (columns) specified by selected_attributes in the configuration file are displayed. The category attribute is generated by the tool.

Should you want to check your regular expressions then you can use a handy online tool like https://regex101.com/.

Note: Regexp values are not stored as raw strings, so watch out for possible backslashes. For more information, see What exactly is a raw string regex and how can you use it?.

Transaction categories

This is the list of transaction categories whatsthedamage uses by default.

Balance: Your total balance per time period. Basically the sum of all deposits minus the sum of all your purchases.
Clothes: Clothing related purchases.
Deposit: Money added to the account, such as direct deposits from employers, cash deposits, or transfers from other accounts.
Fee: Charges applied by the bank, such as monthly maintenance fees, overdraft fees, or ATM fees.
Grocery: Everything considered to sustain your life. Mostly food and other basic things required by your household.
Health: Medicines, vising a doctor, etc.
Home Maintenance: Spendings on your housing, maintencance, reconstruction, etc.
Interest: Earnings on the account balance, typically seen in savings accounts or interest-bearing checking accounts.
Loan: Any type of loans, mortgage.
Other: Any transactions which do not fit into any of the other categories.
Payment: Scheduled payments for bills or loans, which can be set up as automatic payments.
Refund: Money returned to the account, often from returned purchases or corrections of previous transactions.
Sports Recreation: Spending related to sports and recreations like massage, going into a bar or cinema.
Transfer: Movements of money between accounts, either within the same bank or to different banks.
Utility: Regular, monthly recurring payments for stuff like Rent, Electricity, Gas, Water, Phone bills, etc.
Vehicle: All purchases - except Insurance - related to owning a vehicle.
Withdrawal: Money taken out of the account, including ATM withdrawals, cash withdrawals at the bank, and electronic transfers.

Custom categories can be user-defined via config. Feel free to add your own categories into config.yml.

Note: the Machine Learning model was trained on the categories listed here.

Limitations

The categorization process may fail to categorize transactions because of the quality of the regular expressions / ML model. The transaction might be categorized as 'other'.
The tool assumes that an account only use a single currency.
The Machine Learning model is currently English-centric; language-agnostic models are planned for future releases.
No authentication.

Development

The git project contains static files tracked with git-lfs. To checkout the contents follow these steps:

Use your package manager to install git-lfs
Install git lfs extension: git lfs install
Clone the project repository
Change directory to checked out directory: git lfs fetch

The repository comes with a Makefile using 'GNU make' to automatize recurring actions. Here is the usage of the Makefile.

$ make help
Development workflow:
  dev            - Create venv, install pip-tools, sync all requirementsm, install frontend dependencies
  web            - Run Flask development server
  test           - Run tests using tox
  ruff           - Run ruff linter/formatter
  mypy           - Run mypy type checker
  image          - Build Podman image with version tag
  lang           - Extract translatable strings to English .pot file
  docs           - Build Sphinx documentation
  frontend ARG=script - Run any npm script (e.g., 'frontend ARG=dev', 'frontend ARG=build')
  build          - Full stack build (Python + JS)

Dependency management for Python:
  compile-deps   - Compile requirements files from pyproject.toml
  update-deps    - Update requirements to latest versions

Cleanup for Python and JavaScript:
  clean          - Clean up build files
  mrproper       - Clean + remove virtual environment, node_modules

Frontend Development

The project uses a modern frontend build system with npm and Vite, replacing the previous CDN-based approach. The frontend source code is located in src/whatsthedamage/view/frontend/.

JavaScript Architecture

Build System:

npm + Vite: Modern build toolchain with fast Hot Module Replacement (HMR) and optimized production builds
TypeScript: Type-safe JavaScript development with modern ES module syntax
ES Modules: Native ES module system for better code organization and tree-shaking

Frontend Structure:

src/whatsthedamage/view/frontend/
├── src/
│   ├── main.ts              # Main entry point, imports and initializes all modules
│   ├── js/
│   │   ├── index.ts         # Index page functionality (form clearing)
│   │   ├── main.ts          # DataTables initialization and Bootstrap components
│   │   ├── statistical-analysis.ts  # Statistical controls
│   │   ├── utils.ts         # Utility functions
│   │   └── api-docs.ts      # API documentation page
│   └── css/                 # CSS source files (future)
├── package.json             # npm dependencies and scripts
├── vite.config.js           # Vite configuration
└── public/                  # Public assets

Frontend Dependencies:

All frontend dependencies are managed via npm in package.json
Run make dev to install dependencies (includes both Python and JavaScript dependencies)
Run make frontend ARG=build to build production assets

Frontend Build Output:

Build assets are output to src/whatsthedamage/view/static/dist/
Backend static assets (ML models, favicon) remain in src/whatsthedamage/static/ (committed to Git)
The Dockerfile automatically installs npm dependencies and builds assets

Static Assets Management:

Backend static assets: src/whatsthedamage/static/ - ML models, configuration files (committed to Git)
Frontend build artifacts: src/whatsthedamage/view/static/dist/ - Vite build output (NOT committed to Git, excluded via .gitignore)
Flask static files: src/whatsthedamage/view/static/ - favicon.ico and other web assets (committed to Git)

Localization

The application by default uses the English language, however it also supports Hungarian language.

For translation support the tool uses Python's gettext library.

To update the English .pot file with new translatable strings use make lang.

$ make lang

Create or edit the .po file to add translations by a tool like poedit.

$ poedit locale/en/LC_MESSAGES/messages.po

Compile the .po file into a .mo file. (poedit will do this for you):

$ msgfmt locale/en/LC_MESSAGES/messages.po -o locale/en/LC_MESSAGES/messages.mo

Contributing

Contributions are welcome! If you have ideas for improvements, bug fixes, new features, or additional documentation, feel free to open an issue or submit a pull request.

To contribute:

Fork the repository and create your branch from main.
Make your changes with clear commit messages.
Test your changes to ensure nothing is broken.
Open a pull request describing your changes and the motivation behind them.

If you have questions or need help getting started, open an issue and we'll be happy to assist.

Thank you for helping make this project better!

Name		Name	Last commit message	Last commit date
Latest commit History 364 Commits
.github/workflows		.github/workflows
config		config
docs		docs
src/whatsthedamage		src/whatsthedamage
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
API.md		API.md
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PRODUCT.md		PRODUCT.md
README.md		README.md
README_ML.md		README_ML.md
babel.cfg		babel.cfg
changelog.md		changelog.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-web.txt		requirements-web.txt
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whatsthedamage

Features

Calculator Pattern for Extensibility

Machine Learning categorization (experimental)

Architecture Overview

Caching Strategy

Interface Comparison

When to Use What?

Install

Manual install

Docker image

Usage:

Configuration File

Troubleshooting

Transaction categories

Limitations

Development

Frontend Development

JavaScript Architecture

Localization

Contributing

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whatsthedamage

Features

Calculator Pattern for Extensibility

Machine Learning categorization (experimental)

Architecture Overview

Caching Strategy

Interface Comparison

When to Use What?

Install

Manual install

Docker image

Usage:

Configuration File

Troubleshooting

Transaction categories

Limitations

Development

Frontend Development

JavaScript Architecture

Localization

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages