Skip to content

VIEWS Pipeline Core is a comprehensive machine learning pipeline designed to produce monthly predictions of future violent conflict at both country and sub-country levels.

Notifications You must be signed in to change notification settings

views-platform/views-pipeline-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VIEWS Twitter Header

VIEWS Pipeline Core Banner

GitHub License    GitHub branch check runs    GitHub Issues or Pull Requests    GitHub Release

A modular Python framework for end‑to‑end conflict forecasting: data ingestion, transformation, drift monitoring, model and ensemble management, evaluation, reconciliation, mapping, reporting, packaging, and artifact governance.

Acknowledgements

Views Funders


Table of Contents

  1. Conceptual Overview
  2. High‑Level Architecture
  3. Core Pipeline Stages
  4. Managers (Orchestration Layer)
  5. Modules (Functional Layer)
  6. Data Layer & Querysets
  7. Evaluation & Metrics
  8. Reconciliation (Hierarchical Consistency)
  9. Reporting & Mapping
  10. CLI & Argument System
  11. Configuration & Partitioning
  12. Package Management
  13. Logging & Monitoring
  14. Development Workflow
  15. Quick Start
  16. FAQ

1. Conceptual Overview

The pipeline transforms raw geo‑temporal data into validated, reconciled, and documented forecasts. Key features include:

  • Deterministic data preparation (queryset + transformation replay)
  • Strict naming & artifact conventions
  • Partition-aware evaluation (calibration/validation/forecasting)
  • Multi-model ensembling & hierarchical reconciliation
  • Automated HTML reporting and spatial visualization
  • Reproducible configuration merging and logging
  • Optional integration with Weights & Biases (WandB) and prediction store

2. High‑Level Architecture

         ┌────────────────────────────────────────┐
         │            ConfigurationManager        │
         │  (deployment + hyperparameters + meta) │
         └───────────────┬────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────┐
│                 ViewsDataLoader                  │
│  Queryset → Raw Fetch → Drift Check → Update     │
│  → Transformation Replay → Partition Slice       │
└───────────────┬──────────────────────────────────┘
                │  DataFrame (month_id, entity_id)
                ▼
      ┌─────────────────────────┐
      │     Model / Ensemble    │
      │  Training / Evaluation  │
      │  Forecasting / Reports  │
      └────────────┬────────────┘
                   │ Predictions
                   ▼
         ┌────────────────────────┐
         │ ReconciliationModule   │
         │ (Country ↔ Priogrid)   │
         └────────────┬───────────┘
                      │ Reconciled Predictions
                      ▼
         ┌───────────────────────────┐
         │ Reporting & Mapping       │
         │ HTML, Tables, Choropleths │
         └───────────────────────────┘

3. Core Pipeline Stages

Stage Output Key Component
Data Fetch Partitioned feature/target frame ViewsDataLoader
Train Artifact (model file) ForecastingModelManager / EnsembleManager
Evaluate Metrics + eval predictions Evaluation logic
Forecast Future horizon predictions ForecastingModelManager
Reconcile Grid ↔ country consistency ReconciliationModule
Report HTML summaries ReportModule + MappingModule
Package Poetry-compliant project PackageManager

4. Managers (Orchestration Layer)

Manager Purpose
ModelPathManager Path + artifact resolution for a model
ModelManager Abstract training/evaluation/forecast flow control
ForecastingModelManager Concrete forecasting implementation scaffold
EnsemblePathManager Paths for multi-model ensemble
EnsembleManager Aggregation + optional reconciliation
ExtractorPathManager External raw data ingestion paths
ExtractorManager Download → preprocess → save for external datasets
PostprocessorPathManager Downstream transformation stage paths
PostprocessorManager Read → transform → validate → save
PackageManager Create/validate Poetry packages
ConfigurationManager Merge + validate layered configuration

Each manager has accompanying documentation in its module directory.


5. Modules (Functional Layer)

Module Role
dataloaders Partition-aware data retrieval + drift detection + incremental update
transformations Dataset transformation undo/management
reconciliation Hierarchical grid ↔ country alignment
reports Tailwind-styled HTML evaluation/forecast report generation
mapping Static + interactive choropleth maps (matplotlib / Plotly)
logging Central logging configuration injection
statistics Forecast reconciliation math (proportional scaling)
wandb Alerts, artifact logging, run lifecycle
model validation Structural & logical integrity checks
ensemble validation Structural & logical integrity checks

5.1 Intermediate Modules

Module Role
cli CLI parsing and validation
dataset Spatio-temporal dataset handler with country and priogrid level support

6. Data Layer & Querysets

  • Querysets define feature/target extraction logic + transformation chains.
  • Incremental updates replace raw slices (GED / ACLED) and replay transformations (UpdateViewser).
  • MultiIndex structure: (month_id, entity_id) for time-spatial operations.
  • Data types normalized (float64 for numeric integrity).
  • Partitions defined via month ranges (train/test or forecast horizon).

7. Evaluation & Metrics

Evaluation produces:

  • Step-wise metrics (per forecast horizon)
  • Month-wise metrics (temporal slices)
  • Time-series metrics (sequence performance trajectory)

Conflict type auto-inferred from target tokens (sb / ns / os). Files named per ADR conventions (artifact/output naming).


8. Reconciliation (Hierarchical Consistency)

Ensures priogrid sums align with authoritative country totals while preserving relative spatial pattern and zero inflation. Parallelizable across countries × time × targets. Integrated into ensembles or model forecast postprocessing.


9. Reporting & Mapping

Component Feature
ReportModule Headings, paragraphs, Markdown, tables, images, grids
MappingModule Country & priogrid choropleths (static + interactive animation)
Templates Forecast + evaluation report skeletons
CSS Tailwind subset embedded for portability

Reports embed:

  • Metrics tables
  • Key–value configuration summaries
  • Spatial animations (Plotly)
  • Artifact provenance (timestamps, versions)

10. CLI & Argument System

Dataclass-driven (ForecastingModelArgs):

  • Flags: --train, --evaluate, --forecast, --report, --sweep, --prediction_store, --monthly
  • Validation prevents illegal combinations (e.g., evaluate with forecasting run type).
  • Monthly shortcut auto-configures production cycle.

11. Configuration & Partitioning

ConfigurationManager merges:

  1. Deployment
  2. Hyperparameters
  3. Meta
  4. Partition dictionary
  5. Runtime overrides (highest priority)

Forecast partitions dynamically adjusted by override_timestep. Validation enforces structural integrity and target specification.


12. Package Management

PackageManager:

  • Validates naming (organization-prefix-*)
  • Creates Poetry skeleton (Python version constraint)
  • Adds dependencies (including views-pipeline-core)
  • Fetches latest release (tags or GitHub API)
  • Runs poetry check

13. Logging & Monitoring

  • YAML-driven configuration (handlers, levels, formatters).
  • Dedicated model/ensemble logging directories.
  • Standard separation: main log, error log.
  • WandB alerts for stage transitions, failures, reconciliation completeness.

14. Development Workflow

Task Command
Run model ./run.sh --run_type calibration --train --evaluate --report --saved
Run ensemble ./run.sh --ensemble hybrid_lynx --forecast --report
Update raw data Use --update_viewser
Generate report only Use --evaluate --report or --forecast --report

Refer to documentation/development_guidelines.md for coding standards and docstring_guidelines.md for formatting.


15. Quick Start

  1. Run build_model_scaffold.py or build_ensemble_scaffold.py found in the views-models repository.

  2. Update config_deployment.py, config_hyperparameters.py, config_queryset.py, config_meta.py.

  3. Run calibration:

    python main.py --run_type calibration --train --evaluate --report
  4. Run forecasting:

    python main.py --run_type forecasting --train --forecast --report
  5. View artifacts: models/<name>/artifacts/


16. FAQ

Question Answer
Do I need WandB? Optional; disable notifications to run offline.
Can I reconcile single-model forecasts? Yes—apply ReconciliationModule manually after forecast stage.
How do I add a new transformation? Register callable in transformation mapping and ensure replay compatibility.
Are forecasts stored transformed or raw? Temporarily reversed to raw scale before saving (pending ADR finalization).
Can I aggregate probabilistic outputs? Current ensemble aggregation expects scalar or single-element lists.

About

VIEWS Pipeline Core is a comprehensive machine learning pipeline designed to produce monthly predictions of future violent conflict at both country and sub-country levels.

Resources

Stars

Watchers

Forks

Contributors 8