Skip to content

feat: adversarial red team engine, custom persona builder, claim-level extraction, engine sweep#4

Merged
marceloceccon merged 3 commits intomainfrom
features/new-enhancements
Apr 25, 2026
Merged

feat: adversarial red team engine, custom persona builder, claim-level extraction, engine sweep#4
marceloceccon merged 3 commits intomainfrom
features/new-enhancements

Conversation

@marceloceccon
Copy link
Copy Markdown
Member

  • Adversarial Red Team engine: rotating attacker stress-tests positions
    before a parallel post-stress synthesis. Attacker confidence is
    out-of-band for the consensus formula; defenders run in parallel.
  • Custom persona builder: six-axis (risk, optimism, evidence, formality,
    verbosity, contrarian) slider UI. Server composes the system prompt
    from vetted phrase fragments — no user free-text reaches the LLM.
  • Claim-level disagreement extraction: post-final LLM pass emits
    structured contradictions with verbatim quotes per side. Quotes are
    verified against actual response content; fabrications are dropped.
  • Engine sweep mode: one-click sequential run across all three engines
    with a side-by-side comparison panel and explicit Cancel Sweep.

Code-quality bundle: HMR-safe rate-limit cleanup, last-occurrence
CONFIDENCE matching, snapshot usage reconstruction, judge/claim stream
cleanup on cancel, hard-abort cost cap, tighter extractUsage type
guards, fix for mockImplementation pollution across tests.

Docs: README updated for the new engines and affordances; CHANGELOG.md
and SECURITY.md added; newfeatures.md tracks per-feature rationale and
QA notes.

Tests: 207 -> 255 (+48 covering all new features and QA bundle).

…l extraction, engine sweep

- Adversarial Red Team engine: rotating attacker stress-tests positions
  before a parallel post-stress synthesis. Attacker confidence is
  out-of-band for the consensus formula; defenders run in parallel.
- Custom persona builder: six-axis (risk, optimism, evidence, formality,
  verbosity, contrarian) slider UI. Server composes the system prompt
  from vetted phrase fragments — no user free-text reaches the LLM.
- Claim-level disagreement extraction: post-final LLM pass emits
  structured contradictions with verbatim quotes per side. Quotes are
  verified against actual response content; fabrications are dropped.
- Engine sweep mode: one-click sequential run across all three engines
  with a side-by-side comparison panel and explicit Cancel Sweep.

Code-quality bundle: HMR-safe rate-limit cleanup, last-occurrence
CONFIDENCE matching, snapshot usage reconstruction, judge/claim stream
cleanup on cancel, hard-abort cost cap, tighter extractUsage type
guards, fix for mockImplementation pollution across tests.

Docs: README updated for the new engines and affordances; CHANGELOG.md
and SECURITY.md added; newfeatures.md tracks per-feature rationale and
QA notes.

Tests: 207 -> 255 (+48 covering all new features and QA bundle).
Comment thread lib/consensus-engine.ts Fixed
marceloceccon and others added 2 commits April 25, 2026 10:35
…ntrolled format string'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@marceloceccon marceloceccon merged commit 68e596b into main Apr 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants