Releases: appcuarium/synapptic
v0.1.0b5
Bug fixes
- Test suite: hardcoded
last_seendates in fixtures caused time-based decay to apply unexpectedly as time passed — replaced with dynamic_today()so decay tests are stable regardless of when they run - Test assertions:
test_redacts_sk_keyandtest_redacts_aiza_keyhad wrong expected prefix lengths — corrected to match actual regex behavior
This is a test-only patch; no production code changed.
v0.1.0b4
What's new
Relay + session browser
See every conversation you've ever had with Claude Code — entirely on your machine.
pip install synapptic[relay]
synapptic relay enable
synapptic run claude # launch Claude through the relay- Three-panel dashboard: projects, conversation, session list
- Full markdown rendering, collapsible tool calls, token usage per turn
- Live streaming via WebSocket — active sessions glow green, clear instantly on close
- Per-session token stats and estimated cost on every session click
synapptic indexfor fast full-text search across 4000+ sessions
Benchmark improvements
- Per-provider/model rate limiting with dynamic batch sizing
- Groq added as first-class provider (5 models)
- 503 retry and daily limit detection
- Confirmation prompt before large benchmark runs
Security hardening
- Centralized secret redaction (
sk-,gsk_,AIza,Bearer) urlparse().hostnameloopback check — subdomain lookalikes blockedfile://scheme rejected in all provider calls- XML envelopes with "REFERENCE ONLY" guard in extraction and synthesis
- Per-run cryptographic nonce on benchmark envelope markers
project_slugallowlist ([a-zA-Z0-9_-]) blocks path traversal on all--projectflags- Atomic fsync+rename on all 9 output writers and
save_profile
Bug fixes
- Stable profile key uses full observation text (was truncated at 120 chars)
- Proportional truncation floor raised so
cut_pointis always ≥ 1 - Rate limit double-multiplier removed; budget floor removed
- Division-by-zero guard in chunk size estimation
Tests
~350 tests across 14 files (was 163 in b3). New: test_cli, test_benchmark_results, test_config, test_patterns, test_state, test_outputs, test_extract, test_integrate, test_synthesize, test_providers, test_scrub.
v0.1.0b3
Command rename
synapptic update→synapptic ingest- more expressive name for the full pipeline (extract → merge → synthesize → integrate)
Benchmark redesign: Regex → LLM-as-Judge
Complete rewrite of the benchmark scoring system, driven by 3 rounds of adversarial review.
Scoring: LLM-as-judge replaces regex:
judge_response()evaluates compliance via structured COMPLY/VIOLATE verdicts- Failed judge calls return UNKNOWN (excluded from scoring, not silently counted as PASS)
- Judge reasoning stored per test for auditability
--judge-provider/--judge-modelflags for separate judge model (avoids self-evaluation bias)- Warning when judge is the same model as respondent
Correct experimental design:
- Guard in archetype: WITH = full archetype, WITHOUT = archetype minus guard (ablation)
- Guard not in archetype: WITH = archetype + guard appended, WITHOUT = archetype as-is (additive test)
- Isolates each guard's individual contribution regardless of whether synthesis included it
- Guard removal handles multi-line entries (removes continuation lines at deeper indentation)
- Only "guards" dimension benchmarked (ai_failures are incident descriptions, not individually testable)
Statistical rigor:
- Default
--runs 3with majority vote (was 1) - Ties on even valid_runs → "unclear" (not silent FAIL)
- Wilson score 95% confidence intervals on pass rates (suppressed at n<5)
- CI caveat: "assumes independent tests (guards may be correlated)"
- Two-directional control tests: COMPLY control (expect redundant) + VIOLATE control (expect ineffective) — detects judge bias in either direction
Temperature support:
--temperatureflag (default 0.1) for response generation- Passed through to all providers: anthropic, ollama, openai, gemini, lmstudio, custom
- claude-cli: warns that temperature is unsupported, lists providers that support it
Guard quality:
- Test-to-guard fidelity validation (SequenceMatcher, drops hallucinated guards)
- Near-duplicate guard deduplication (Jaccard token similarity, O(n·k))
- Guards with weight < 0.3 filtered with visible count
- Balanced order randomization: forces at least 1 with-first + 1 without-first per test
Cache & storage:
- Cache format v4 with guard-list hash — profile changes auto-invalidate stale caches
- Single
benchmarks/directory (removed duplicatebenchmark_results/dir) - Result filenames include all params:
{project}_{provider}_{model}_seed{seed}_t{temp}_{timestamp}.json - Judge and storage truncation aligned at 4000 chars
- Response failures tracked separately from judge failures
Output improvements:
- "Guard compliance" / "Baseline compliance" / "Guard impact" (was "Archetype compliance")
- Net impact with gross breakdown:
+10% net (3 improved, 1 regressed) - Judge health line: failure count, failure rate, control status
- Per-test: vote counts, judge errors, response errors
results compare fixed:
- Was completely broken (referenced non-existent keys)
- Now shows guard compliance, impact delta, effective/backfire counts, winner
Integration fix
Claude Code MEMORY.md archetype placement:
- Archetype reference (
user_archetype.md) now inserted near the top of MEMORY.md instead of appended at the bottom - Claude Code truncates MEMORY.md after line 200 — previous behavior appended at the end, causing the archetype to be invisible in projects with large MEMORY.md files
- Existing projects where the reference is past line 200 are automatically fixed on next
synapptic ingest
CI and testing
- GitHub Actions workflow: runs
pyteston Python 3.10, 3.11, 3.12 on push/PR to master, develop, release branches - Tests badge added to README
- 82 tests: guard selection (15), judge response parsing (15), majority vote (6), test fidelity (5), guard dedup (5), guard removal (12), confidence intervals (5), + filter/profile tests
Hook improvements
- Only processes explicitly closed sessions (filters by
reason=prompt_input_exit|clear|logout) - Derives project from
transcript_pathin the hook JSON input (nofindneeded) - Synthesizes only global + the affected project (not all 15 projects)
- PID file guard replaces
pgrep(which falsely matched bash wrapper command strings) sleep 2before checking transcript ensureslast-promptrecord is written
New provider:
- Google Gemini (
gemini-3.1-flash-lite-preview) added as LLM provider
Provider system:
- No default provider - if unconfigured, tells user to run
synapptic init - No silent fallback between providers
- Model defaults reset when switching providers
Extraction:
- Transcript wrapped in
<transcript>tags with injection guard (prevents LLM from following instructions inside the transcript)
v0.1.0b2 - Calibration Release
Calibration release focused on extraction quality after real-world testing across 30+ sessions.
Highlights:
- Sessions processed biggest-first, skipped ones don't count against --limit
- Synthesis prompt produces scope-limited guards (prevents over-investigation)
- Hook uses pgrep to avoid conflicts with manual runs
- Better error messages when synthesis is skipped
Benchmark (same 3 test prompts, before/after):
- Summary suppression: failed -> passed
- Read-before-claiming: 26 tool calls -> 2 searches
- Pattern matching: 36 tool calls, 91K tokens -> 5 reads + clarifying question
pip install --upgrade synappticFull changelog: CHANGELOG.md
v0.1.0-beta
Initial beta release of synapptic - the missing synapse between you and your AI agents.
Install:
pip install git+https://github.com/appcuarium/synapptic.git
synapptic init
synapptic install
synapptic updateFeatures:
- Multi-provider LLM backend (Claude CLI, Anthropic, OpenAI, Ollama, LM Studio)
- Multi-platform output (Claude Code, Cursor, Copilot, Gemini)
- Per-project + global profiles with cross-project promotion
- 9 observation dimensions (user + AI failure profiling)
- Custom extraction patterns
- Automatic background processing via SessionEnd hook
- Clean install/uninstall