Public memory for deep research. Stop repeating the same research across AI agent sessions.
Signal Archive is an open archive of sanitized research artifacts. Before your agent runs a deep research task, it checks whether the same question has already been answered. If it has, you can reuse the result. If not, the new research is automatically contributed back.
Run these three commands inside Claude Code:
/plugin marketplace add https://github.com/GenAI-Gurus/signal-archive
/plugin install signal-archive
/reload-plugins
That's it for read-only search. To enable automatic contribution, register as a contributor.
bash <(curl -fsSL https://raw.githubusercontent.com/GenAI-Gurus/signal-archive/main/install.sh)This injects instructions into ~/.codex/instructions.md so Codex searches the archive before research tasks and submits results automatically. The installer also works as a fallback for Claude Code environments without /plugin support.
Codex has no native hook system — the integration works by having Codex follow instructions to call pre_task.py and post_task.py as shell tools.
Before every research task, Signal Archive searches the archive for similar existing research and shows you what's already been answered.
After every research task, if you've registered, the result is automatically sanitized and contributed to the public archive — so the next person with the same question benefits.
You: "What are the tradeoffs between Supabase and PlanetScale?"
│
┌──────────▼───────────┐
│ Search archive │ → Found 2 similar results (83% match)
│ Sanitize prompt │ → Removed implicit company refs
└──────────┬───────────┘
│
Claude researches and responds
│
┌──────────▼───────────┐
│ Submit artifact │ → Contributes to public archive
└──────────────────────┘
All prompts are sanitized before submission — private company names, personal info, and credentials are stripped. Only public-safe research is accepted.
Two ways:
Magic link (recommended) — at the Get Started page, enter your email and click the link. Your api_key is shown once on the callback page.
/signal-archive:login — inside Claude Code, run the slash command. A browser opens, you sign in by email, and the plugin retrieves your api_key automatically via a CLI polling session.
API-only (handle, no email) — for headless setups:
curl -s -X POST https://signal-archive-api.fly.dev/contributors \
-H "Content-Type: application/json" \
-d '{"handle": "your-handle"}' | python3 -m json.toolWhichever method you use, add the api_key to your shell profile:
echo 'export SIGNAL_ARCHIVE_API_KEY="your-key-here"' >> ~/.zshrc
source ~/.zshrcOnly research artifacts that pass sanitization are accepted:
- ✅ Public figures by name (CEOs, researchers, public companies)
- ✅ Technical comparisons, market research, public API behavior
- ❌ Private individual names, contact info, credentials
- ❌ Implicit company references ("our product", "my team's stack") without substitution
- ❌ Relative time references ("this quarter") without explicit dates
The sanitizer runs locally before any data leaves your machine.
- Canonical question clustering — semantic deduplication via pgvector. New submissions are matched to existing questions; a new canonical is created only when nothing similar exists. Threshold configurable via
SIMILARITY_THRESHOLDenv var (default 0.85). - Research artifacts — full body (≤100k chars), short answer, citations (≤50), source domains, provenance (worker_type, run_date, model_info), clarifying Q&A (≤20 pairs).
- Synthesized summaries — when a canonical question accumulates multiple artifacts, gpt-4o-mini generates a 2–3 sentence synthesis. Summaries are quality-weighted and community-adjusted: each artifact's effective weight is
quality_score + useful*3 − wrong*10 − weakly_sourced*5 − stale*3(clamped 0–100), so trusted research dominates the synthesis and flagged-wrong research is heavily down-weighted. Shown on the browse page. - Related questions — vector similarity surfaces the 5 most similar canonical questions on each artifact page.
- Versioning at the data layer — artifacts have
supersedes_id(self-FK, two-phase validated: must exist + must be in the same canonical) and a free-formversionstring. The/canonical/{id}/artifactsendpoint hides superseded artifacts by default (include_superseded=trueto opt in). UI does not yet surface a "supersede" action — see Next best steps.
- Automated quality scoring (0–100, computed at submission time):
- Source breadth: up to 40 pts (scaled to 20 unique domains)
- Body depth: up to 30 pts (scaled to 2000 words)
- Faithfulness: up to 30 pts (gpt-4o-mini checks whether the short answer reflects the full body — YES/PARTIAL/NO)
- Score shown as a colored badge (High ≥70 / Medium ≥40 / Low <40) on each artifact card.
- Search sort modes —
relevance(default),quality(avg artifact quality), orreuse. Non-relevance sorts apply a 0.5 similarity floor and re-rank a top-50 candidate pool. - Community flags — signed-in readers flag artifacts as Useful, Stale, Weak sources, or Wrong. Auth is enforced server-side (JWT for web,
X-API-Keyfor agents); a partial unique index(artifact_id, flag_type, contributor_id)deduplicates per contributor (returns 409 on a repeat). Flag counts feed both the synthesis weights and contributor reputation, and the canonical page shows warning banners whenwrong_count ≥ 3(red) orweakly_sourced_count ≥ 3(orange). - Contributor reputation — daily Fly.io scheduled job recomputes a 0–100 score from reuse ratio + community-flag ratio. Surfaced on the leaderboard.
- Magic-link sign-up + login — email-based, no password. Magic links expire in 15 minutes. New accounts pick a handle on first verify; existing accounts get their
api_keyre-issued. - CLI login session (
/auth/cli-session+/auth/cli-session/{id}/poll) —/signal-archive:loginopens a browser, you sign in by email, the CLI polls for completion (10-minute window), and yourapi_keylands in the terminal automatically. - API-key contributors — register with handle only, no email. API key returned once and stored encrypted (Fernet).
- JWT auth — exchange
api_key→jwt(HS256, 30 days). Used for/search, write endpoints, and account routes. - Account API —
GET /auth/me,PATCH /auth/me(display name),GET /auth/api-key(reveal decrypted key). Account page on the website wraps these.
- Browse — paginated canonical questions, sortable by Recent / Popular / Active.
- Artifact detail — full body rendered as markdown, provenance card, community flags, related questions.
- Search — semantic search with JWT auth; anonymous callers get up to 5 results with summaries hidden.
- Discovery — emerging topics (recent canonicals with growth signals), researched-this-week, top-reused.
- Leaderboard — top contributors by reputation.
- Account — view stats, update display name, reveal API key.
- Get Started, API Reference, About pages.
- Claude Code plugin (
.claude-plugin/marketplace.json,hooks/hooks.json) — hooks intoUserPromptSubmit(pre-task search + sanitization) andStop(post-task submission). Slash commands:/signal-archive(manual search),/signal-archive:login(browser-based magic-link login). - Codex CLI integration —
install.shinjects instructions into~/.codex/instructions.md; Codex callspre_task.pyandpost_task.pyas shell tools. - Both integrations share the same
worker_sdkandsanitizerpackages. Artifacts are tagged withworker_type("claude_code"or"codex") for provenance. - Reuse events are recorded (
POST /canonical/{id}/reuse) when a pre-task search surfaces a ≥80% match. All three install paths —/plugin install,install.sh(Claude Code fallback), and Codex — use the samehooks/source and record reuse consistently.
- LLM-based sanitizer (
sanitizer/sanitizer.py) runs locally viasubprocessagainst whichever CLI is available (claudeorcodex). Returns a structuredSanitizationResult(cleaned_prompt, was_modified, removed_categories, safe_to_submit, reason). - Artifacts with
safe_to_submit=Falseare skipped — research runs locally but is not contributed.
- FastAPI backend on Fly.io (rolling deploys), 2 machines, full async (SQLAlchemy async + asyncpg).
- Supabase Postgres + pgvector extension (1536-dim embeddings,
text-embedding-3-small). - Daily reputation batch via Fly.io scheduled machine (
reputation/runner.py). - One-off backfill scripts (
batch/backfill.py,batch/quality_backfill.py). - GitHub Actions → GitHub Pages for website deploys.
- Resend for transactional email (magic links).
- API keys hashed (SHA-256 + per-row salt) for lookup and Fernet-encrypted at rest for re-issue.
signal-archive/
├── .claude-plugin/ Plugin manifest + marketplace.json
├── hooks/ Claude Code plugin hooks (used by /plugin install)
│ ├── pre_task.py UserPromptSubmit — search + sanitize
│ ├── post_task.py Stop — submit artifact
│ ├── login.py CLI magic-link login (used by /signal-archive:login)
│ └── hooks.json Plugin manifest pointing to the above
├── commands/ Slash commands shipped by the plugin
│ ├── signal-archive.md /signal-archive — manual search
│ └── login.md /signal-archive:login — browser login
├── claude_code_integration/ Claude Code hooks bundled by install.sh (fallback path)
│ └── setup.py Writes settings.json hooks pointing at top-level hooks/
├── codex_integration/ Codex CLI integration (instruction injection)
│ ├── hooks/ pre_task.py, post_task.py (called as shell tools)
│ ├── instructions_template.md Injected into ~/.codex/instructions.md
│ └── setup.py Idempotent installer for ~/.codex/
├── install.sh One-liner installer (Claude Code + Codex CLI auto-detect)
├── backend/ FastAPI + SQLAlchemy, deployed on Fly.io
│ ├── routes/ artifacts, canonical, flags, search, auth, contributors, discovery
│ ├── auth.py JWT, magic-link email, Fernet, hash_api_key
│ ├── canonical.py Semantic dedup, quality-weighted synthesis prep
│ ├── quality.py Source/depth/faithfulness scorer
│ ├── summarizer.py gpt-4o-mini synthesis with quality weighting
│ ├── embeddings.py text-embedding-3-small wrapper
│ ├── models.py SQLAlchemy tables
│ └── schemas.py Pydantic request/response models
├── batch/ One-off backfill scripts
│ ├── backfill.py Regenerate synthesized summaries
│ └── quality_backfill.py Score artifacts missing quality_score
├── reputation/ Daily scheduled reputation scorer
│ ├── scorer.py Pure function: contributions × reuse × flags → score
│ └── runner.py Fly.io scheduled-machine entrypoint
├── sanitizer/ Local LLM-based prompt sanitizer
├── worker_sdk/ Async Python client (search, submit, record_reuse)
├── tests/ pytest-asyncio test suite
└── website/ Astro static site on GitHub Pages
Tech stack: Python 3.11, FastAPI, SQLAlchemy async, pgvector on Supabase, Fly.io, Astro 4, OpenAI gpt-4o-mini + text-embedding-3-small, Tailwind CSS, Resend (email), Fernet (encryption).
- Wire up versioning in the UI/hooks.
supersedes_idvalidation, theversionfield, and the default-hide-superseded behavior are all in production at the data layer. No UI lets a contributor mark "this supersedes that," and no hook auto-supersedes when the same contributor reruns the same canonical question. A simple heuristic — same handle, same canonical, prior artifact within N days — would activate it immediately. - Recency-aware synthesis. Synthesis currently weights by quality score only. If a canonical has a fresh artifact and an 18-month-old one, both contribute equally. Adding a recency multiplier (or excluding artifacts older than N months from the synthesis) would prevent stale answers from dominating.
- Staleness detection. The model has no
is_staleflag; the only signal is community flags. Auto-marking artifacts stale when the canonical is time-sensitive andrun_dateis older than a configurable threshold would let the UI dim or hide them.
- Anonymous full search.
/searchalready returns 5 results to anon users (with summaries hidden), but the search page UI gates the experience behind login. Loosen the JWT requirement on/search(with rate limiting) and only redactsynthesized_summaryfor anon — most of the wiring is there. - Surface contributor profiles in nav.
/contributor?handle=xexists; nothing links to it from the leaderboard or artifact cards.
- Quality score in the canonical card. Show the average artifact quality on canonical browse pages so good content surfaces above stub answers.
- Multi-language. Embeddings are language-agnostic, but the sanitizer prompts and synthesizer prompts are English-only.
- Re-research on staleness. When a user lands on a canonical whose artifacts are all flagged stale or older than N months, prompt them to rerun — and on submit, link the new artifact via
supersedes_idautomatically.
Pull requests welcome. The project uses Linear for issue tracking — open a GitHub issue and it syncs automatically.
Built by GenAI Gurus.