AIMM-X: An Explainable Market Integrity Monitoring System with Multi-Source Attention Signals and Transparent Scoring
Repository: https://github.com/chirindaopensource/explainable_market_integrity_monitoring_system
Owner: 2026 Craig Chirinda (Open Source Projects)
This repository contains an independent, professional-grade Python implementation of the research methodology from the 2026 paper entitled "An Explainable Market Integrity Monitoring System with Multi-Source Attention Signals and Transparent Scoring" by:
- Sandeep Neela (Independent Researcher)
The project provides a complete, end-to-end computational framework for replicating the paper's findings. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from the ingestion and rigorous validation of market microstructure and attention data to the detection of suspicious trading windows via hysteresis segmentation, culminating in the generation of interpretable integrity scores and factor attributions.
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callable:
run_aimm_x_pipeline - Prerequisites
- Installation
- Input Data Structure
- Usage
- Output Structure
- Project Structure
- Customization
- Contributing
- Recommended Extensions
- License
- Citation
- Acknowledgments
This project provides a Python implementation of the analytical framework presented in Neela (2026). The core of this repository is the iPython Notebook explainable_market_integrity_monitoring_system_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings. The pipeline addresses the critical challenge of market integrity monitoring by moving away from opaque, proprietary "black-box" models toward a transparent, auditable "glass-box" approach.
The paper argues that effective surveillance requires explainability—analysts must understand why a window was flagged—and accessibility to public data sources. This codebase operationalizes the proposed solution: AIMM-X, a system that:
- Validates data integrity using strict OHLC consistency checks ($H_t \ge \max(O_t, C_t)$) and precise missingness semantics (NaN vs. 0).
- Fuses multi-source attention signals (Reddit, StockTwits, News, Wikipedia, Google Trends) into a unified metric of public interest.
- Detects anomalies using a robust Hysteresis State Machine (Schmitt Trigger) that prevents alert fragmentation.
-
Scores windows using a linear Integrity Score (
$M$ ) decomposed into six interpretable factors ($\phi_1 \dots \phi_6$ ), enabling clear attribution of alerts to price shocks, volatility anomalies, or attention spikes.
The implemented methods combine techniques from Financial Econometrics, Signal Processing, and Explainable AI.
1. Multi-Source Attention Fusion (
2. Statistical Deviation Detection:
The system employs dynamic baselines to adapt to changing market regimes, computing standardized deviations (z-scores) for returns (
3. Hysteresis-Based Segmentation: To avoid "chattering" (rapid on/off switching of alerts due to noise), the system uses dual-threshold hysteresis logic:
-
Trigger: A window opens when
$s_{i,t} > \theta_{\text{high}}$ . -
Sustain: A window remains open while
$s_{i,t} > \theta_{\text{low}}$ . -
Exit: A window closes only after
$s_{i,t} \le \theta_{\text{low}}$ for a specified gap tolerance$g$ .
4. Interpretable Integrity Score (
Below is a diagram which summarizes the proposed approach:
The provided iPython Notebook (explainable_market_integrity_monitoring_system_draft.ipynb) implements the full research pipeline, including:
- Modular, Multi-Task Architecture: The pipeline is decomposed into 17 distinct, modular tasks, each with its own orchestrator function.
- Configuration-Driven Design: All study parameters (thresholds, weights, lookback windows) are managed in an external
config.yamlfile. - Rigorous Data Validation: A multi-stage validation process checks schema integrity, OHLC consistency, and exchange calendar alignment.
- Deterministic Execution: Enforces reproducibility through seed control, deterministic sorting, and rigorous logging of all stochastic outputs.
- Comprehensive Audit Logging: Generates detailed logs of every processing step, including quarantine counts and filter statistics.
- Reproducible Artifacts: Generates structured
PipelineResultobjects containing raw window lists, filtered top-N tables, and factor summary statistics.
The core analytical steps directly implement the methodology from the paper:
- Configuration & Validation (Task 1): Loads and validates the study configuration, enforcing parameter constraints and determinism requirements.
- Data Ingestion & Cleansing (Tasks 2-3): Validates panel schema, enforces OHLC consistency, and strictly handles missingness semantics (NaN vs 0).
- Calendar Enforcement (Task 4): Aligns data to the canonical NYSE/Nasdaq trading session grid.
- Attention Processing (Tasks 5-7): Aligns, normalizes, and fuses multi-source attention signals into a unified metric.
- Feature Engineering (Tasks 8-9): Computes log returns and rolling realized volatility proxies.
- Deviation Detection (Tasks 10-11): Computes rolling baselines, z-scores, and the composite strength score.
- Window Segmentation (Task 12): Applies the hysteresis state machine to detect suspicious time intervals.
-
Scoring & Attribution (Tasks 13-14): Computes
$\phi$ -factors and the composite Integrity Score$M$ with full decomposition. - Ranking & Filtering (Task 15): Ranks windows by score and applies warmup/artifact filters.
- Artifact Generation (Task 16): Produces final output tables and summary statistics.
-
Orchestration (Task 17): Unifies all components into a single
run_aimm_x_pipelinefunction.
The notebook is structured as a logical pipeline with modular orchestrator functions for each of the 17 major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
The project is designed around a single, top-level user-facing interface function:
run_aimm_x_pipeline: This master orchestrator function runs the entire automated research pipeline from end-to-end. A single call to this function reproduces the entire computational portion of the project, managing data flow between validation, cleansing, detection, scoring, and reporting modules.
- Python 3.9+
- Core dependencies:
pandas,numpy,scipy,pyyaml. - Optional dependencies:
exchange_calendars(for precise trading session generation).
-
Clone the repository:
git clone https://github.com/chirindaopensource/explainable_market_integrity_monitoring_system.git cd explainable_market_integrity_monitoring_system -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Python dependencies:
pip install pandas numpy scipy pyyaml exchange_calendars
The pipeline requires a primary DataFrame df_raw_panel with a MultiIndex (date, ticker) and the following columns:
Market Microstructure:
-
open_price: Float. -
high_price: Float,$\ge \max(Open, Close)$ . -
low_price: Float,$\le \min(Open, Close)$ . -
close_price: Float,$>0$ . -
volume: Float/Int,$>0$ .
Attention Signals (Nullable):
reddit_posts: Float (count).stocktwits_msgs: Float (count).wiki_views: Float (count).news_articles: Float (count).google_trends: Float (index).
Note: NaN in attention columns represents "No Coverage", while 0.0 represents "No Activity".
The notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell, which demonstrates how to use the top-level run_aimm_x_pipeline orchestrator:
# Final cell of the notebook
# This block serves as the main entry point for the entire project.
if __name__ == '__main__':
# 1. Load the master configuration from the YAML file.
config = load_study_configuration("config.yaml")
# 2. Load raw datasets (Example using synthetic generator provided in the notebook)
# In production, load from CSV/Parquet: pd.read_parquet(...)
df_raw_panel = generate_synthetic_panel(config)
# 3. Execute the entire replication study.
result = run_aimm_x_pipeline(df_raw_panel, config)
# 4. Access results
print(result.df_top_n.head())
print(result.audit_log)The pipeline returns a PipelineResult object containing:
-
config_snapshot: The resolved configuration dictionary used for the run. -
audit_log: A structured log of execution metadata, validation stats, and step completion. -
df_windows_raw: The complete set of detected windows with all scores and factors. -
df_windows_filtered: The subset of windows passing quality filters (warmup, artifacts). -
df_top_n: The top-ranked suspicious windows formatted for reporting. -
df_phi_summary: Summary statistics for factor contributions. -
intermediate_series: Dictionary containing computed time-series ($r$ ,$\sigma$ ,$A$ ,$s$ , z-scores) for debugging.
explainable_market_integrity_monitoring_system/
│
├── explainable_market_integrity_monitoring_system_draft.ipynb # Main implementation notebook
├── config.yaml # Master configuration file
├── requirements.txt # Python package dependencies
│
├── LICENSE # MIT Project License File
└── README.md # This file
The pipeline is highly customizable via the config.yaml file. Users can modify study parameters such as:
- Universe:
universe_tickerslist. - Detection Logic:
baseline_window_B,theta_high,theta_low,gap_tolerance_g. - Scoring Weights:
alphaweights for composite score,omegaweights for integrity score. - Filters:
exclude_warmup,max_z_score_cutoff.
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
Future extensions could include:
- High-Frequency Data: Adapting the pipeline for 5-minute or 1-minute bars.
- Real-Time API Integration: Connecting to live feeds for Reddit/Twitter data.
- Advanced Normalization: Implementing robust scalers (e.g., Median Absolute Deviation) for fat-tailed distributions.
- Causal Inference: Integrating Granger causality tests to determine lead-lag relationships between attention and price.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use this code or the methodology in your research, please cite the original paper:
@article{neela2026aimmx,
title={AIMM-X: An Explainable Market Integrity Monitoring System Using Multi-Source Attention Signals and Transparent Scoring},
author={Neela, Sandeep},
journal={arXiv preprint arXiv:2601.15304v1},
year={2026}
}For the implementation itself, you may cite this repository:
Chirinda, C. (2026). Explainable Market Integrity Monitoring System: An Open Source Implementation.
GitHub repository: https://github.com/chirindaopensource/explainable_market_integrity_monitoring_system
- Credit to Sandeep Neela for the foundational research that forms the entire basis for this computational replication.
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including Pandas, NumPy, SciPy, and PyYAML.
--
This README was generated based on the structure and content of the explainable_market_integrity_monitoring_system_draft.ipynb notebook and follows best practices for research software documentation.
