Drop your CSV, Excel, Parquet, or JSON. Let the duck do the rest.
MALLARD is a 100% local, zero-server data tool built on DuckDB + Streamlit. Upload messy datasets, auto-clean them, explore with interactive charts, run custom SQL, and export the refined result — all without touching the cloud.
- Multi-format Ingestion — CSV, Excel (.xlsx/.xls), Parquet, JSON.
- Zero-Pandas Memory Footprint — Uses DuckDB native SQL pushdowns to process larger-than-RAM datasets without Out-Of-Memory (OOM) crashes.
- Auto Deep Clean — Deduplication, empty column removal, and statistical type healing (no hardcoded logic).
- Smart Summary — Automated insights and missing-value detection processed instantly inside the DB engine.
- Analytics Explorer — Uses smart SQL sampling and native aggregation before rendering charts to keep the browser lightning fast.
- Custom SQL — Run any DuckDB query directly against your tables.
- Direct-to-Disk Export — Ultra-fast downloads (CSV, Parquet, JSON) bypassing Python RAM using DuckDB's native
COPYcommand. - 100% Local — Data never leaves your machine, no cloud upload required.
- Standalone .exe — Distributable to Windows users with no Python setup needed.
# 1. Clone repo
git clone https://github.com/KMoex-HZ/mallard.git
cd mallard
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run
streamlit run mallard.py- Download
MALLARD.zipfrom Releases - Extract the zip
- Double-click
MALLARD.exe - Browser opens automatically at
http://localhost:8501
streamlit==1.45.1
duckdb==1.2.2
pandas==2.2.3
plotly==6.0.1
openpyxl==3.1.5
pyarrow==19.0.1
xlrd==2.0.1
psutil==6.1.1
Python 3.10+ recommended.
pip install pyinstaller pyinstaller-hooks-contrib
python setup_exe.pyOutput: dist/MALLARD/MALLARD.exe
Distribute the entire dist/MALLARD/ folder as a zip.
| Step | What it does |
|---|---|
| Deduplication | Removes exact duplicate rows via native SQL DISTINCT |
| Empty column removal | Drops columns with zero non-null values instantly via metadata count |
| Smart Numeric Healing | Uses statistical execution (TRY_CAST) to detect and convert dirty VARCHAR numbers into DOUBLE, bypassing hardcoded column names |
| Wide-format detection | Auto-melts date-header pivot tables into long format |
mallard/
├── mallard.py # Main Streamlit app
├── launcher.py # Entry point for .exe build
├── setup_exe.py # PyInstaller build script
├── requirements.txt
└── data/ # Auto-created, drop files here for auto-ingestion
Most data tools are either too heavy (full BI platforms) or too barebones (just a CSV viewer). MALLARD lives in the middle — a local warehouse that's fast, opinionated, and actually fixes your data before you analyze it.
By leveraging DuckDB's in-process OLAP engine, it allows you to process datasets that would normally crash a standard pandas/Streamlit app.
No Tableau license. No cloud upload. Just a duck and DuckDB.
Khairunnisa Maharani
MIT


