SCOOP

SCOOP is a news intelligence platform (newsint) built for agent-driven news collection.

It is designed to be used by AI agents (e.g. OpenClaw): agents browse the web, build canonical news item JSON, and call SCOOP CLI commands to ingest items.

SCOOP then:

stores raw arrivals safely
normalizes them into canonical documents
generates embeddings
auto-deduplicates documents into canonical stories
serves an API + modern web UI to explore stories and their merged source items

Core Model

news item: one ingested article/document from a source
story: a deduplicated cluster of related items
collection: hard dedup boundary (for example ai_news, world_news, china_news)

In plain terms: many items can collapse into one story, while still keeping every original source item traceable.

What Works Today

CLI ingestion (scoop ingest) with JSON schema validation
Pipeline stages: normalize, embed, dedup, process
Semantic + lexical + exact-match dedup with audit signals
Echo API (/api/v1/...) with JSend responses
React + TanStack + Tailwind story explorer UI
Story/item deep-link routes (/c/<collection>/s/<story>/i/<item>)

Architecture

backend/ (Go, GORM, Echo): CLI + API + pipeline logic
frontend/ (Vite, TypeScript, React, TanStack): viewer UX
embedding-service/ (Python): embedding HTTP service used by backend pipeline

Quick Start (Local)

Start backend API:

cd backend

go run ./cmd/scoop serve --env .env --host 0.0.0.0 --port 8090

Start frontend dev server:

cd frontend

pnpm install

pnpm dev

Open the UI:

http://127.0.0.1:5173

Vite proxies /api/* to http://127.0.0.1:8090 by default.

OpenClaw -> SCOOP Flow

OpenClaw scrapes pages and builds canonical v1 news item JSON.
OpenClaw calls:

cd backend

go run ./cmd/scoop ingest --env .env --payload-file /path/to/item.json --triggered-by-topic ai_news

SCOOP processes pending items:

cd backend

go run ./cmd/scoop process --env .env

UI/API shows deduplicated stories and all merged source items.

Ranking and Weighting Customization

SCOOP is intentionally designed so feed ranking is customizable per use case.

Put source-specific signals (scores, tags, trust hints) into source_metadata.
Keep collection-specific behavior by using collection labels as dedup boundaries.
Customize story ordering/weighting in backend query logic (default is recency).
Frontend feed renders whatever ranked order the API returns.

This lets teams define their own news feed behavior for any source mix without changing the ingestion contract.

Key Docs

Backend details and full pipeline: backend/README.md
Canonical payload schema decision: backend/NEWS_ITEM_SCHEMA.md
Embedding service usage: embedding-service/README.md

License

MIT (LICENSE)

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.agent/skills/agent-browser		.agent/skills/agent-browser
.github/workflows		.github/workflows
.husky		.husky
backend		backend
docs		docs
embedding-service		embedding-service
frontend		frontend
skills		skills
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCOOP

Core Model

What Works Today

Architecture

Quick Start (Local)

OpenClaw -> SCOOP Flow

Ranking and Weighting Customization

Key Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SCOOP

Core Model

What Works Today

Architecture

Quick Start (Local)

OpenClaw -> SCOOP Flow

Ranking and Weighting Customization

Key Docs

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages