GitHub - views-platform/views-pipeline-core: VIEWS Pipeline Core is a comprehensive machine learning pipeline designed to produce monthly predictions of future violent conflict at both country and sub-country levels.

A modular Python framework for end‑to‑end conflict forecasting: data ingestion, transformation, drift monitoring, model and ensemble management, evaluation, reconciliation, mapping, reporting, packaging, and artifact governance.

Acknowledgements

Conceptual Overview
High‑Level Architecture
Core Pipeline Stages
Managers (Orchestration Layer)
Modules (Functional Layer)
Data Layer & Querysets
Evaluation & Metrics
Reconciliation (Hierarchical Consistency)
Reporting & Mapping
CLI & Argument System
Configuration & Partitioning
Package Management
Logging & Monitoring
Development Workflow
Quick Start
FAQ

1. Conceptual Overview

The pipeline transforms raw geo‑temporal data into validated, reconciled, and documented forecasts. Key features include:

Deterministic data preparation (queryset + transformation replay)
Strict naming & artifact conventions
Partition-aware evaluation (calibration/validation/forecasting)
Multi-model ensembling & hierarchical reconciliation
Automated HTML reporting and spatial visualization
Reproducible configuration merging and logging
Optional integration with Weights & Biases (WandB) and prediction store

2. High‑Level Architecture

         ┌────────────────────────────────────────┐
         │            ConfigurationManager        │
         │  (deployment + hyperparameters + meta) │
         └───────────────┬────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────┐
│                 ViewsDataLoader                  │
│  Queryset → Raw Fetch → Drift Check → Update     │
│  → Transformation Replay → Partition Slice       │
└───────────────┬──────────────────────────────────┘
                │  DataFrame (month_id, entity_id)
                ▼
      ┌─────────────────────────┐
      │     Model / Ensemble    │
      │  Training / Evaluation  │
      │  Forecasting / Reports  │
      └────────────┬────────────┘
                   │ Predictions
                   ▼
         ┌────────────────────────┐
         │ ReconciliationModule   │
         │ (Country ↔ Priogrid)   │
         └────────────┬───────────┘
                      │ Reconciled Predictions
                      ▼
         ┌───────────────────────────┐
         │ Reporting & Mapping       │
         │ HTML, Tables, Choropleths │
         └───────────────────────────┘

3. Core Pipeline Stages

Stage	Output	Key Component
Data Fetch	Partitioned feature/target frame	ViewsDataLoader
Train	Artifact (model file)	ForecastingModelManager / EnsembleManager
Evaluate	Metrics + eval predictions	Evaluation logic
Forecast	Future horizon predictions	ForecastingModelManager
Reconcile	Grid ↔ country consistency	ReconciliationModule
Report	HTML summaries	ReportModule + MappingModule
Package	Poetry-compliant project	PackageManager

4. Managers (Orchestration Layer)

Manager	Purpose
ModelPathManager	Path + artifact resolution for a model
ModelManager	Abstract training/evaluation/forecast flow control
ForecastingModelManager	Concrete forecasting implementation scaffold
EnsemblePathManager	Paths for multi-model ensemble
EnsembleManager	Aggregation + optional reconciliation
ExtractorPathManager	External raw data ingestion paths
ExtractorManager	Download → preprocess → save for external datasets
PostprocessorPathManager	Downstream transformation stage paths
PostprocessorManager	Read → transform → validate → save
PackageManager	Create/validate Poetry packages
ConfigurationManager	Merge + validate layered configuration

Each manager has accompanying documentation in its module directory.

5. Modules (Functional Layer)

Module	Role
dataloaders	Partition-aware data retrieval + drift detection + incremental update
transformations	Dataset transformation undo/management
reconciliation	Hierarchical grid ↔ country alignment
reports	Tailwind-styled HTML evaluation/forecast report generation
mapping	Static + interactive choropleth maps (matplotlib / Plotly)
logging	Central logging configuration injection
statistics	Forecast reconciliation math (proportional scaling)
wandb	Alerts, artifact logging, run lifecycle
model validation	Structural & logical integrity checks
ensemble validation	Structural & logical integrity checks

5.1 Intermediate Modules

Module	Role
cli	CLI parsing and validation
dataset	Spatio-temporal dataset handler with country and priogrid level support

6. Data Layer & Querysets

Querysets define feature/target extraction logic + transformation chains.
Incremental updates replace raw slices (GED / ACLED) and replay transformations (UpdateViewser).
MultiIndex structure: (month_id, entity_id) for time-spatial operations.
Data types normalized (float64 for numeric integrity).
Partitions defined via month ranges (train/test or forecast horizon).

7. Evaluation & Metrics

Evaluation produces:

Step-wise metrics (per forecast horizon)
Month-wise metrics (temporal slices)
Time-series metrics (sequence performance trajectory)

Conflict type auto-inferred from target tokens (sb / ns / os). Files named per ADR conventions (artifact/output naming).

8. Reconciliation (Hierarchical Consistency)

Ensures priogrid sums align with authoritative country totals while preserving relative spatial pattern and zero inflation. Parallelizable across countries × time × targets. Integrated into ensembles or model forecast postprocessing.

9. Reporting & Mapping

Component	Feature
ReportModule	Headings, paragraphs, Markdown, tables, images, grids
MappingModule	Country & priogrid choropleths (static + interactive animation)
Templates	Forecast + evaluation report skeletons
CSS	Tailwind subset embedded for portability

Reports embed:

Metrics tables
Key–value configuration summaries
Spatial animations (Plotly)
Artifact provenance (timestamps, versions)

10. CLI & Argument System

Dataclass-driven (ForecastingModelArgs):

Flags: --train, --evaluate, --forecast, --report, --sweep, --prediction_store, --monthly
Validation prevents illegal combinations (e.g., evaluate with forecasting run type).
Monthly shortcut auto-configures production cycle.

11. Configuration & Partitioning

ConfigurationManager merges:

Deployment
Hyperparameters
Meta
Partition dictionary
Runtime overrides (highest priority)

Forecast partitions dynamically adjusted by override_timestep. Validation enforces structural integrity and target specification.

12. Package Management

PackageManager:

Validates naming (organization-prefix-*)
Creates Poetry skeleton (Python version constraint)
Adds dependencies (including views-pipeline-core)
Fetches latest release (tags or GitHub API)
Runs poetry check

13. Logging & Monitoring

YAML-driven configuration (handlers, levels, formatters).
Dedicated model/ensemble logging directories.
Standard separation: main log, error log.
WandB alerts for stage transitions, failures, reconciliation completeness.

14. Development Workflow

Task	Command
Run model	`./run.sh --run_type calibration --train --evaluate --report --saved`
Run ensemble	`./run.sh --ensemble hybrid_lynx --forecast --report`
Update raw data	Use `--update_viewser`
Generate report only	Use `--evaluate --report` or `--forecast --report`

Refer to documentation/development_guidelines.md for coding standards and docstring_guidelines.md for formatting.

15. Quick Start

Run build_model_scaffold.py or build_ensemble_scaffold.py found in the views-models repository.
Update config_deployment.py, config_hyperparameters.py, config_queryset.py, config_meta.py.

Run calibration:

python main.py --run_type calibration --train --evaluate --report

Run forecasting:

python main.py --run_type forecasting --train --forecast --report

View artifacts: models/<name>/artifacts/

16. FAQ

Question	Answer
Do I need WandB?	Optional; disable notifications to run offline.
Can I reconcile single-model forecasts?	Yes—apply ReconciliationModule manually after forecast stage.
How do I add a new transformation?	Register callable in transformation mapping and ensure replay compatibility.
Are forecasts stored transformed or raw?	Temporarily reversed to raw scale before saving (pending ADR finalization).
Can I aggregate probabilistic outputs?	Current ensemble aggregation expects scalar or single-element lists.

Name		Name	Last commit message	Last commit date
Latest commit History 592 Commits
.github/workflows		.github/workflows
documentation		documentation
tests		tests
views_pipeline_core		views_pipeline_core
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Acknowledgements

Table of Contents

1. Conceptual Overview

2. High‑Level Architecture

3. Core Pipeline Stages

4. Managers (Orchestration Layer)

5. Modules (Functional Layer)

5.1 Intermediate Modules

6. Data Layer & Querysets

7. Evaluation & Metrics

8. Reconciliation (Hierarchical Consistency)

9. Reporting & Mapping

10. CLI & Argument System

11. Configuration & Partitioning

12. Package Management

13. Logging & Monitoring

14. Development Workflow

15. Quick Start

16. FAQ

About

Uh oh!

Releases 13

Uh oh!

Contributors 8

Uh oh!

Languages

views-platform/views-pipeline-core

Folders and files

Latest commit

History

Repository files navigation

Acknowledgements

Table of Contents

1. Conceptual Overview

2. High‑Level Architecture

3. Core Pipeline Stages

4. Managers (Orchestration Layer)

5. Modules (Functional Layer)

5.1 Intermediate Modules

6. Data Layer & Querysets

7. Evaluation & Metrics

8. Reconciliation (Hierarchical Consistency)

9. Reporting & Mapping

10. CLI & Argument System

11. Configuration & Partitioning

12. Package Management

13. Logging & Monitoring

14. Development Workflow

15. Quick Start

16. FAQ

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 13

Uh oh!

Contributors 8

Uh oh!

Languages