Cyber Disinformation Detection Briefing System (CDDBS)

An automated intelligence briefing system that monitors, analyzes, and summarizes media narratives to detect and explain disinformation patterns — combining LLM-driven analysis with professional intelligence community standards.

#disinformation #ai-safety #nlp #media-analysis #intelligence-briefing #osint #information-operations #llm #democratic-resilience #fact-checking #narrative-detection #telegram #twitter #media-monitoring

What is CDDBS?

CDDBS is a research and development project building a system to analyze media outlets and social media accounts for potential disinformation activity. It uses LLM-based analysis (Google Gemini) to produce structured intelligence briefings that assess:

Source credibility — behavioral indicators and outlet history
Narrative alignment — matching against 50+ known disinformation narratives across 8 categories
Cross-outlet coordination — detecting coordinated narrative pushing across outlets on the same topic
Cross-platform amplification — tracking how narratives propagate across news, Twitter/X, and Telegram
Quality scoring — 7-dimension, 70-point rubric for briefing reliability
AI trustworthiness — grounding scores, output validation, confidence calibration

The system implements a multi-stage pipeline adhering to intelligence community briefing standards studied from EUvsDisinfo, DFRLab, Bellingcat, NATO StratCom COE, and others.

Live Application

Service	URL
Frontend (Cloudflare Workers)	cddbs-frontend.projectsfiae.workers.dev
Frontend (Render)	cddbs-frontend.onrender.com
Backend API	cddbs-api.onrender.com

Wake-up sequence (Render free tier spins down after inactivity):

Wake backend: visit the API URL and wait 30–60 seconds for the status message

Open either frontend URL

Architecture & Security Model:

API keys (SerpAPI + Gemini) are stored exclusively in server environment variables
CORS hardened with explicit origin list (no wildcards)
Rate limiting on all mutation endpoints
Input sanitization against prompt injection
EU AI Act Art. 50 AI provenance disclosure on every analysis

Try It in Google Colab

No local setup required. Open any notebook directly in Colab:

Notebook	Description	Open
`CDDBS_Main.ipynb`	Full refactored pipeline (v1.0)
`CDDBS_v0.1.0_POC.ipynb`	Original proof of concept
`CDDBS_v0.2.0_enhanced.ipynb`	Enhanced pipeline (v0.2)

Colab Setup (2 steps):

Click the key icon in the left sidebar, add secrets: GOOGLE_API_KEY and SERPER_API
Run all cells, execute: run_cddbs_analysis('RT', 'rt.com', 'Russia')

See docs/API_SETUP.md for full API key setup instructions.

Project Status & Roadmap

Current Version: v0.9.0 (Sprint 9 complete — 2026-03-28)

Versioning: 0.x.y semver — major version 0 signals pre-release (personal testing + stakeholder demos). 1.0.0 will be cut when authentication exists and external testers are onboarded.

Completed Sprints

Sprint	Version	Focus	Key Deliverables
1	v0.1.0	Briefing Format Redesign	7-section briefing template, JSON Schema, system prompt v1.1
2	v0.2.0	Quality & Reliability	70-point quality rubric, 18 narratives, 41 tests
3	v0.3.0	Multi-Platform Support	Telegram analysis, platform adapters, 80 tests
4	v0.4.0	Production Integration	Quality scorer + narrative matcher in pipeline, frontend components, 136 tests
5	v0.5.0	Operational Maturity	JSON export, metrics, DEVELOPER.md, CI pipeline, 132 prod tests
6	v0.6.0	CI Compliance Pipeline	Secret scan, docs drift, branch policy, SECURITY.md
7	v0.7.0	Intelligence Layer	Event clustering, burst detection, narrative risk scoring, events API, 204 tests
8	v0.8.0	Topic Mode Innovations	Coordination signal, key claims/omissions, AI provenance, SBOM, pip-audit, 214 tests
9	v0.9.0	AI Trust & Security	Input sanitization, output validation, grounding score, rate limiting, security headers, dependency scanner, 249 tests

Upcoming

Sprint	Target	Focus
10	v0.10.0	User Authentication + CDDBS-Edge Phase 0
11	v0.11.0	Collaboration (analyst annotations, shared workspaces)
12	v0.12.0	Advanced features (ML fine-tuning, multi-language)

CDDBS-Edge — Parallel Experimental Track

"What happens when the cloud goes down, the API gets blocked, or you're a journalist in a country that restricts internet access?"

A portable, offline-capable version of CDDBS built on a Raspberry Pi 5 running a local quantized LLM (Phi-3 Mini 3.8B via Ollama), replacing all cloud API calls.

See research/cddbs_edge_concept.md for the full concept.

Architecture

Current Stack (v0.9.0)

Component	Technology
Backend	FastAPI + uvicorn + slowapi (Render)
Frontend	React 18 + TypeScript + MUI 6 + Vite (Cloudflare Workers + Render)
Database	PostgreSQL 15 (Neon managed, 12 tables)
LLM	Google Gemini 2.5 Flash via google-genai SDK
Data Sources	SerpAPI (Google News), GDELT (Cloudflare Workers proxy), RSS feeds
CI	GitHub Actions (7 workflows: lint, test, SBOM, pip-audit, dependency scanner, secret scan, docs drift)

Analysis Pipeline

Input (outlet / topic / account)
        |
        v
   [Fetch]  SerpAPI Google News / GDELT / RSS
        |
        v
   [Sanitize]  Input validation + prompt injection prevention
        |
        v
   [Analyze]  Gemini LLM — narrative evaluation, disinformation markers
        |
        v
   [Validate]  Output schema validation + grounding score computation
        |
        v
   [Score]  7-dimension quality scorer (70-point rubric)
        |
        v
   [Match]  Narrative detection (50+ known disinformation narratives)
        |
        v
   Output: Intelligence briefing + quality scorecard + narrative tags + AI provenance

Repository Structure

cddbs-research/
├── notebooks/                           # Original MVP & POC notebooks
├── research/                            # Research notebooks & design docs
│   ├── briefing_format_analysis.ipynb   # 10 professional formats analyzed
│   ├── event_intelligence_pipeline.md   # Sprint 6-7 architecture
│   ├── information_security_analysis.md # Sprint 9 security audit
│   ├── cddbs_edge_concept.md           # Offline CDDBS concept
│   └── ...
├── templates/                           # Briefing templates & system prompts
├── schemas/                             # JSON Schema for structured output
├── data/                                # Narratives DB, RSS feeds, samples
├── tools/                               # Quality scorer, platform adapters
├── tests/                               # 80 research-repo tests
├── docs/                                # Sprint backlogs & plans
│   ├── cddbs_execution_plan.md         # Full project vision & roadmap
│   ├── sprint_8_backlog.md
│   ├── sprint_9_backlog.md
│   └── ...
├── retrospectives/                      # Sprint retrospectives
│   ├── sprint_1.md through sprint_8.md
│   └── ...
├── compliance-practices/                # Compliance documentation
│   └── sprint_compliance_log.md        # Per-sprint compliance measures
├── blog/                                # Public-facing writeups
└── .github/workflows/ci.yml

Research & Writeups

Sprint documentation and research live in docs/ and research/:

Project Vision & Sprint Roadmap
Sprint 9 Backlog — AI Trust & Security
Information Security Analysis
Event Intelligence Pipeline Architecture
Briefing Format Analysis — 10 professional formats benchmarked
Sprint Retrospectives
Compliance Log — 9 sprints of compliance measures

Key Research Findings

Briefing Format Study (Sprint 1):

Only 3/10 organizations use explicit confidence signaling — a major gap
Per-finding confidence levels are a CDDBS innovation (none of the 10 benchmarked do this)
CDDBS occupies a unique niche: database consistency + policy brief depth

Security Audit (Sprint 9):

11 security issues identified across 9 dimensions (4 HIGH, 1 CRITICAL)
OWASP LLM Top 10 mapping: LLM01, LLM02, LLM04, LLM06, LLM09 applicable to CDDBS
All HIGH findings resolved; CRITICAL (no auth) deferred to Sprint 10

Compliance & Security

Framework	Measures
EU AI Act	10 measures (Art. 9, 12, 14, 50 — quality, record-keeping, oversight, transparency)
CRA	12 measures (SBOM, vulnerability scanning, dependency scanner, SHA-pinned Actions)
DSGVO	6 measures (no PII, data minimization, BYOK, secret protection)
OWASP LLM Top 10	5 risks mitigated (prompt injection, insecure output, model DoS, sensitive info, overreliance)

See compliance-practices/sprint_compliance_log.md for the full per-sprint compliance log.

Key Principles

Evidence over speed — Every claim must be traceable to evidence
Confidence transparency — Always communicate uncertainty honestly
Reproducibility — Analyses should be reproducible with the same inputs
Professional standards — Output should meet intelligence community standards
Security by default — Input validation, output validation, rate limiting from the start

Related Repositories

cddbs-prod (private) — Production application code (FastAPI backend + React frontend + PostgreSQL)

License

MIT — see LICENSE.

Collaboration

CDDBS is an open research prototype for academic and policy collaboration in disinformation analysis, media monitoring, and intelligence automation.

Researchers, journalists, or institutions interested in collaboration, methodological review, or exploring applications in democratic resilience are welcome to reach out:

Email: angaben@pm.me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cyber Disinformation Detection Briefing System (CDDBS)

What is CDDBS?

Live Application

Try It in Google Colab

Project Status & Roadmap

Completed Sprints

Upcoming

CDDBS-Edge — Parallel Experimental Track

Architecture

Current Stack (v0.9.0)

Analysis Pipeline

Repository Structure

Research & Writeups

Key Research Findings

Compliance & Security

Key Principles

Related Repositories

License

Collaboration

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
blog-series		blog-series
compliance-practices		compliance-practices
data		data
docs		docs
mockups		mockups
notebooks		notebooks
patches		patches
research		research
retrospectives		retrospectives
schemas		schemas
templates		templates
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Cyber Disinformation Detection Briefing System (CDDBS)

What is CDDBS?

Live Application

Try It in Google Colab

Project Status & Roadmap

Completed Sprints

Upcoming

CDDBS-Edge — Parallel Experimental Track

Architecture

Current Stack (v0.9.0)

Analysis Pipeline

Repository Structure

Research & Writeups

Key Research Findings

Compliance & Security

Key Principles

Related Repositories

License

Collaboration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages