Skip to content

bangyen/wikipedia

Repository files navigation

Wikipedia Article Maturity Scoring

Tests License

FastAPI + CLI for Wikipedia article quality assessment with ML-powered feature extraction

Quickstart

Clone the repo and run with uv (recommended):

git clone https://github.com/bangyen/wikipedia.git
cd wikipedia
uv sync --all-extras     # install all dependencies (API, ML, Dev)
uv run pytest            # optional: run tests
uv run wiki-api          # start the API (or: just dashboard)

Or using standard pip:

git clone https://github.com/bangyen/wikipedia.git
cd wikipedia
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,api,ml]"
pytest
wiki-api                 # start the API

Or use the CLI: wiki-score "Albert Einstein"

Results

Validation Type Coverage Result
Unit Tests 85 tests Passing
Temporal Validation 2006-2024 Unbiased
Type Checking Full codebase mypy strict

Features

  • Maturity Scoring — Calibrated heuristic model with quality band classification.
  • FastAPI + CLI — RESTful API with automatic docs and color-coded CLI.
  • SHAP Analysis — Explainable AI for feature importance.

Repo Structure

wikipedia/
├── examples/demo.ipynb       # Interactive demo
├── scripts/                  # Validation and setup
├── tests/                    # Unit and integration tests
├── src/
│   └── wikipedia/
│       ├── api/              # FastAPI server (api.py) + CLI (wiki_score.py)
│       ├── features/         # Feature extraction
│       ├── models/           # Baseline model + weights
│       └── wiki_client.py    # Wikipedia API client
└── justfile                  # Task runner

Validation

  • ✅ Full test coverage (pytest)
  • ✅ Reproducible model weights
  • ✅ Type-safe with mypy

References

License

This project is licensed under the MIT License.

About

AI-driven article maturity scoring via robust feature extraction, LightGBM calibration, and deployment-ready FastAPI and CLI interfaces.

Topics

Resources

License

Stars

Watchers

Forks

Contributors