Responsible AI Agent Design Discussion #481

WilliamBerryiii · 2026-02-11T23:14:07Z

WilliamBerryiii
Feb 11, 2026
Maintainer

This discussion consolidates two related issues — #91 and #384 — into a unified design conversation for HVE Core's Responsible AI tooling. Both address RAI from complementary lifecycle stages, and we believe the strongest outcome is a coherent design that covers both.

Two Facets of RAI Tooling

Facet	Origin	Lifecycle Stage	Core Function
Code Compliance (#91, @niksacdev)	Engineering Agents epic (#63)	During implementation	WCAG 2.1 accessibility, bias detection, privacy controls — a development-time quality gate before `@pr-review`
Impact Assessment (#384, @jayachithra)	Standalone proposal	During planning/design	RAI Impact Assessment per Microsoft's RAI Standards — identifying misuses, stakeholder harms, and mitigations

These aren't duplicates so much as two halves of a complete RAI story: one that helps teams think through risks before building and one that validates compliance while building.

Design Questions

1. Agent architecture — one agent or two?
Should these be a single RAI agent with two modes (assessment vs. compliance), or separate agents that share a common RAI knowledge base? The code compliance facet operates on source files; the impact assessment facet operates on project descriptions and use cases. Different inputs, different outputs, but shared principles.

2. Impact Assessment workflow
@jayachithra proposed an agent that ingests a project's use case and generates a structured RAI Impact Assessment document covering intended uses, possible misuses, stakeholder harms, and mitigations — grounded in Microsoft's RAI Standards. Key questions:

Where does the assessment document live? (docs/responsible-ai/? A dedicated docs/rai-assessments/ directory?)
Should the agent ask clarifying questions interactively, or work from a structured input template?
How does the assessment feed into downstream development (e.g., informing the code compliance agent's checks)?

3. Code compliance checks
@niksacdev proposed a development-time agent covering:

WCAG 2.1 AA compliance (keyboard accessibility, screen reader compatibility, color contrast)
AI/ML bias pattern detection (OWASP LLM/ML context)
Privacy controls and data collection validation
RAI-ADR (Architecture Decision Record) generation for compliance decisions

Questions:

Should compliance checks be configurable per-project, or is a fixed framework sufficient?
How does this integrate with the existing @pr-review workflow — does it run automatically, or is it invoked explicitly?
What's the relationship to the Code Reviewer agent (feat(agents): 4 Code review agents before PR is opened #143) which also covers security concerns?

4. Platform support
Both agents need to work across Claude Code (.claude/agents/), GitHub Copilot (.github/agents/ + .github/chatmodes/), and AGENTS.md. Should they share a common RAI instructions file that both agents reference?

5. Documentation structure
Proposed layout from #91:

docs/responsible-ai/RAI-ADR-[number]-[title].md — compliance decision records
docs/responsible-ai/responsible-ai-evolution.md — tracking file
docs/templates/responsible-ai-adr-template.md

Should impact assessments live alongside ADRs, or in a separate path?

Source Material

Responsible AI Code Agent (engineering-team-agents)
Microsoft RAI Impact Assessment Guide
Microsoft RAI Standards and Principles
WCAG 2.1 Quick Reference
OWASP LLM Top 10
Parent epic: feat(agents): Engineering Agents Integration #63

Next Steps

We'd love input from @niksacdev, @jayachithra, @agreaves-ms, and the broader community on these design questions. Once we converge on architecture and scope, we'll create focused implementation issues from this discussion.

WilliamBerryiii · 2026-02-11T23:15:32Z

WilliamBerryiii
Feb 11, 2026
Maintainer Author

adding @C-Neisinger & @akzaidi to the discussion.

0 replies

WilliamBerryiii · 2026-02-11T23:31:57Z

WilliamBerryiii
Feb 11, 2026
Maintainer Author

Responsible AI System Design Proposal

This proposal addresses the five design questions from #481, drawing on patterns established by existing hve-core agents (security-plan-creator, pr-review, task-researcher) and the source material from @niksacdev's responsible-ai-code agent and @jayachithra's impact assessment proposal (#384). It also proposes a mechanism for RAI reviews to export updates to the repository's threat model, keeping RAI threats in docs/security/threat-model.md current as assessments and compliance findings evolve.

The Core Insight

The two RAI facets differ in inputs, outputs, timing, and interaction model:

Dimension	Impact Assessment	Code Compliance
Input	Project descriptions, use cases, architecture docs	Source code, PRs, configuration files
Output	Structured assessment document + threat model deltas	Inline findings, RAI-ADRs, compliance report + threat status updates
Timing	Planning and design phase	Implementation and review phase
Interaction	Conversational, iterative section generation	Autonomous analysis with targeted review
Closest hve-core analog	`security-plan-creator`	`pr-review` (analysis phases)

These are fundamentally different workflows that share a common knowledge base. That distinction drives the architecture.

1. Agent Architecture: Two Agents, One Knowledge Base

Recommendation: Two specialized agents connected by a shared instructions file.

.github/
  agents/
    rai-impact-assessment.agent.md    # Conversational, phase-based
    rai-code-compliance.agent.md      # Autonomous with review checkpoints
  instructions/
    rai-standards.instructions.md     # Shared RAI knowledge base

Why two agents instead of one:

They operate at different lifecycle stages with different inputs and outputs. A single agent with modal switching adds complexity without providing composability.
Specialized agents are easier to test, iterate, and validate independently. The prompt-builder workflow can evaluate each against its own success criteria.
This mirrors the existing pattern: task-researcher, task-planner, and task-implementor are separate agents rather than one monolithic task agent, yet they share context through .copilot-tracking/ artifacts and common instructions.
Users invoke the agent that matches their current activity, reducing cognitive load ("I'm planning" vs "I'm coding").

Why a shared instructions file:

rai-standards.instructions.md contains the RAI principles, framework references, and domain knowledge that both agents need.
Both agents reference this file (via #file: or path reference) rather than duplicating RAI content.
Community contributions to the knowledge base benefit both agents simultaneously.
The instructions file can also apply automatically (via applyTo) to relevant file types, giving passive RAI guidance even when neither agent is invoked.

The feedback loop:

The impact assessment agent produces a document that identifies project-specific risks, stakeholder harms, and required mitigations. The code compliance agent reads that assessment (when it exists) and tailors its analysis accordingly. A generic web app and a medical diagnosis system both get WCAG checks, but only the latter gets targeted bias scrutiny on clinical decision paths. This assessment-informed compliance approach makes the system more than the sum of its parts.

                 ┌──────────────────────┐
                 │ rai-standards        │
                 │ .instructions.md     │
                 │ (shared knowledge)   │
                 └──────┬───────┬───────┘
                        │       │
           references   │       │   references
                        ▼       ▼
  ┌─────────────────────┐     ┌─────────────────────┐
  │ rai-impact-          │────▶│ rai-code-            │
  │ assessment.agent.md  │     │ compliance.agent.md  │
  │ (planning phase)     │     │ (implementation)     │
  └──────┬──────┬────────┘     └──────┬──────┬────────┘
         │      │                     │      │
         ▼      ▼                     ▼      ▼
  ┌──────────┐ ┌──────────┐   ┌──────────┐ ┌──────────┐
  │ Impact   │ │ Threat   │   │ Findings │ │ Threat   │
  │ Assess-  │ │ Model    │   │ RAI-ADRs │ │ Status   │
  │ ment doc │ │ Deltas   │   │ Report   │ │ Updates  │
  └──────────┘ └─────┬────┘   └──────────┘ └─────┬────┘
                     │                           │
                     └───────────┬───────────────┘
                                 ▼
                     ┌───────────────────────┐
                     │ docs/security/        │
                     │ threat-model.md       │
                     │ (RAI Threats section) │
                     └───────────────────────┘

2. Impact Assessment Workflow

Recommendation: A conversational, phase-based agent modeled on security-plan-creator.

The security-plan-creator agent demonstrates the pattern: analyze an artifact (blueprint), assess threats, generate a structured document iteratively with user review at each section, and finalize with validation. The impact assessment agent follows the same rhythm using Microsoft's RAI Impact Assessment Guide as its framework.

Proposed Phases

Phase	Purpose	User Interaction
1. Intake	Gather project description, use case, architecture, and intended users	Conversational: clarifying questions for missing context
2. Stakeholder and Use Case Analysis	Identify intended uses, foreseeable misuses, affected stakeholders, and vulnerable populations	Present analysis for user validation
3. Harm Assessment	Map potential harms (allocation, quality of service, stereotyping, denigration, over/under-representation) to stakeholders	Iterative section review, as `security-plan-creator` does with security plan sections
4. Mitigation Planning	Propose technical and operational mitigations for each identified harm	User refinement of mitigations
5. Threat Model Export	Map identified harms to threat model entries; generate deltas for new or changed RAI threats	Review proposed threat model changes
6. Document Generation	Produce the final RAI Impact Assessment document and apply threat model deltas	Final review and sign-off

Where the assessment document lives

Recommendation: docs/responsible-ai/assessments/ for finalized assessments, with .copilot-tracking/rai/assessments/ for in-progress drafts.

This follows the established pattern: .copilot-tracking/ for working artifacts and docs/ for committed reference material. Completed assessments become part of the project's documentation and audit trail.

Interactive vs template-driven input

Recommendation: Interactive with a structured template as fallback.

The agent asks clarifying questions interactively (like security-plan-creator does in its blueprint analysis phase), but also accepts a pre-filled intake template for teams that prefer async workflows. Teams working with governance reviewers can fill out the template ahead of time; teams exploring can converse.

Downstream connection

The completed assessment document includes a machine-readable section (YAML frontmatter or a structured risks section) that the code compliance agent can parse to configure its checks. For example:

# Within the assessment document
identified_risks:
  - category: bias
    components: ["recommendation-engine", "scoring-pipeline"]
    severity: high
    threat_model_ref: RAI-1  # Maps to existing threat, updates likelihood/mitigations
  - category: accessibility
    components: ["patient-dashboard", "report-viewer"]
    severity: high
    threat_model_ref: RAI-1  # Fairness dimension
  - category: privacy
    components: ["data-ingestion", "user-profiles"]
    severity: medium
    threat_model_ref: RAI-3  # Maps to existing privacy threat
  - category: transparency
    components: ["automated-triage-engine"]
    severity: high
    threat_model_ref: null   # New threat, generates RAI-14 delta

The threat_model_ref field connects each identified risk back to the threat model. Existing references update the corresponding threat entry (likelihood, mitigations, residual risk, status). A null reference signals a new threat that needs a new entry added to the Responsible AI Threats section.

3. Code Compliance Checks

Recommendation: An autonomous agent with configurable compliance dimensions, producing findings and RAI-ADRs.

Compliance Dimensions

Rather than a fixed framework or fully open configuration, the agent ships with a set of compliance dimensions that activate based on context:

Dimension	Source Standard	Activation
Accessibility	WCAG 2.1 AA	Always active for UI code (HTML, TSX, JSX, XAML, Blazor)
AI/ML Bias	OWASP LLM Top 10 + Microsoft RAI Standards	Active when ML/AI code detected or impact assessment flags bias risks
Privacy Controls	Data minimization, consent patterns, retention policies	Active when data handling code detected or impact assessment flags privacy risks
Transparency	Explainability, disclosure requirements	Active when automated decision-making code detected

Teams can override activation through a lightweight configuration file (.github/rai-config.yml or similar) to enable/disable dimensions or adjust severity thresholds. This avoids the rigidity of a fixed framework while preventing the analysis paralysis of full configurability.

Integration with `pr-review`

Recommendation: Explicit invocation with a pathway to pr-review integration.

Phase 1 ships the compliance agent as a standalone agent. Teams invoke it directly when they want RAI review. This allows the agent to mature and the community to validate its findings before it becomes an automated gate.

Phase 2 (after stabilization) adds the compliance agent as an optional review dimension within pr-review. The pr-review agent already catalogs specialized review dimensions (security, conventions, performance). RAI compliance becomes another dimension, invoked when the PR touches files matching the compliance agent's activation patterns. This mirrors how security analysis in pr-review works today.

Relationship to Code Reviewer agent (#143)

The code reviewer agents in #143 cover broad code quality, security, and correctness. RAI compliance is a specialized domain that requires domain-specific knowledge (WCAG standards, bias testing methodology, privacy regulations). Rather than overloading the general code reviewer with RAI expertise, the RAI compliance agent operates as a focused specialist. It can be invoked alongside or after general code review, and its findings integrate into the same tracking format (in-progress-review.md items with category, severity, and suggested resolution).

RAI-ADR Generation

When the compliance agent identifies a decision point (choosing between accessibility approaches, selecting a bias mitigation strategy, making a privacy trade-off), it generates an RAI-ADR using a template. These ADRs capture why a particular RAI approach was chosen, creating an audit trail distinct from general architecture ADRs.

4. Threat Model Integration

Recommendation: Both RAI agents export structured updates to docs/security/threat-model.md, keeping the Responsible AI Threats section current as assessments and compliance findings evolve.

The existing threat model already catalogs RAI-1 through RAI-13 in its Responsible AI Threats section, each using a consistent table format (Category, Asset, Threat, Likelihood, Impact, Mitigations, Residual Risk, Status). RAI reviews are a natural source of updates to this section, and the connection should be explicit rather than left to manual synchronization.

How Updates Flow

Each RAI agent produces threat model outputs at different points in the lifecycle:

Source	Trigger	What It Produces
Impact Assessment (Phase 5)	New or revised assessment completed	New threat entries for risks without existing coverage; updated likelihood, impact, or mitigations for existing threats
Code Compliance (per-scan)	Compliance scan finds issues or confirms mitigations	Updated Status and Residual Risk fields for existing threats; new threats when novel risk patterns are detected

Delta Format

Rather than modifying the threat model directly during agent execution, both agents produce threat model deltas in a staging area. A delta is a structured description of proposed changes that a human reviews before applying.

Deltas live in .copilot-tracking/rai/threat-model-deltas/ and follow this format:

# .copilot-tracking/rai/threat-model-deltas/2026-02-11-assessment-medical-triage.md
---
source: rai-impact-assessment
assessment: docs/responsible-ai/assessments/medical-triage-assessment.md
date: 2026-02-11
---

## Proposed Threat Model Updates

### Update: RAI-1 (Fairness - Biased Code Generation Patterns)

Field changes:
- **Likelihood**: Medium → High (clinical decision paths create higher bias risk)
- **Mitigations**: Add "Bias testing for clinical scoring algorithms per assessment finding B-1"
- **Residual Risk**: Medium → Medium (new mitigations offset increased likelihood)

Rationale: The medical triage system uses ML-based scoring that directly affects
patient prioritization. The impact assessment identified disproportionate error rates
across demographic groups as a high-severity risk (finding B-1).

### New: RAI-14 (Explainability - Opaque Triage Decisions)

| Field             | Value                                                                |
|-------------------|----------------------------------------------------------------------|
| **Category**      | Transparency (Microsoft RAI Standard)                                |
| **Asset**         | Patient trust, clinical decision auditability                        |
| **Threat**        | Automated triage decisions lack explainable reasoning for clinicians |
| **Likelihood**    | High                                                                 |
| **Impact**        | High (regulatory and patient safety implications)                    |
| **Mitigations**   | Explainability layer for scoring output, decision audit log          |
| **Residual Risk** | Medium                                                               |
| **Status**        | Proposed                                                             |

Rationale: No existing threat covers explainability requirements for automated
clinical decisions. Impact assessment finding T-1 identifies this as a regulatory
requirement under applicable healthcare AI guidelines.

This format preserves the threat model's existing table structure for new entries and uses a compact field-change notation for updates. The rationale connects each change back to specific assessment findings, creating an audit trail from risk identification to threat model update.

Application Workflow

The delta-then-apply pattern keeps humans in the loop while reducing the effort to maintain threat model currency:

Agent generates deltas during assessment (Phase 5) or after a compliance scan.
Deltas accumulate in .copilot-tracking/rai/threat-model-deltas/.
Human reviews proposed changes, adjusting as needed.
Agent applies approved deltas to docs/security/threat-model.md on request, using the standard edit workflow.
Applied deltas are archived (moved to a applied/ subfolder or deleted) to prevent re-application.

This mirrors how pr-review generates findings that a human triages before acting on. The agent does the analytical work; the human owns the decision.

What Each Agent Contributes

Impact Assessment agent produces structural changes to the threat model:

New threat entries when the assessment identifies risks that no existing RAI threat covers.
Likelihood and impact adjustments when a specific project context changes the risk profile (a generic "Medium" likelihood for bias becomes "High" for a healthcare application).
New mitigations derived from the assessment's mitigation planning phase.

Code Compliance agent produces status changes to the threat model:

Status updates when a compliance scan confirms that mitigations are implemented ("Partially Mitigated" → "Mitigated").
Residual Risk adjustments when scan results reveal that current controls are insufficient.
New threats when code analysis discovers novel risk patterns not covered by existing entries (rare, but possible when scanning unfamiliar frameworks).

This division of responsibility means the threat model reflects both planned mitigations (from assessment) and verified implementation (from compliance), giving a complete picture of the RAI risk posture.

Relationship to Existing Threat Model Structure

The existing threat model organizes threats into four sections: STRIDE, Dev Container, AI-Specific, and Responsible AI. RAI agent outputs target only the Responsible AI section (and potentially the AI-Specific section for OWASP LLM Top 10 overlaps). The other sections remain the responsibility of the security-plan-creator agent and manual security review.

The threat model's Assurance Argument section includes Goal G4 ("Responsible AI principles are followed") with Evidence pointing to "Writing style guidelines, inclusive language checks, PR reviews." As the RAI system matures, this evidence list expands to include impact assessments, compliance scan results, and RAI-ADRs, strengthening the assurance case.

5. Platform Support

Recommendation: A shared rai-standards.instructions.md referenced by platform-specific agent files.

The shared instructions file is the single source of RAI knowledge. Platform-specific agent files reference it:

Platform	Agent Location	How It References Shared Knowledge
GitHub Copilot	`.github/agents/*.agent.md`	`#file:` directive or path reference to `rai-standards.instructions.md`
Claude Code	`.claude/agents/*.md`	Direct file reference in agent body
AGENTS.md	`AGENTS.md`	Section reference or file inclusion

The platform-specific agent files handle tool declarations (tools: frontmatter), handoffs, and platform conventions. The RAI knowledge, principles, checklists, and standards live in the shared instructions file. This prevents the drift that would occur if each platform maintained its own copy.

6. Documentation Structure

Recommendation: Extend the existing docs/ and docs/templates/ structure with RAI-specific paths.

docs/
  responsible-ai/
    README.md                              # RAI overview, how to use the agents, maturity model
    assessments/                           # Completed impact assessments (committed artifacts)
      {{project-name}}-assessment.md
    RAI-ADR-NNN-title.md                   # RAI architecture decision records
    responsible-ai-evolution.md            # Chronological evolution log
  security/
    threat-model.md                        # Existing; RAI Threats section updated by agents
  templates/
    rai-impact-assessment-template.md      # Impact assessment document template
    rai-adr-template.md                    # RAI architecture decision record template

.copilot-tracking/
  rai/
    assessments/                           # In-progress impact assessment drafts
    compliance/                            # Compliance scan tracking and results
    threat-model-deltas/                   # Staged threat model updates awaiting review
      applied/                             # Archived deltas after application

Impact assessments and ADRs live in separate paths.

Assessments are planning-phase documents that describe what risks exist. ADRs are implementation-phase documents that describe what decisions were made to address those risks. They serve different audiences at different times:

An impact assessment is reviewed during project planning, potentially by governance or ethics reviewers external to the engineering team.
RAI-ADRs are reviewed during code review, referenced by engineers making implementation choices.

Keeping them in separate paths (docs/responsible-ai/assessments/ vs docs/responsible-ai/RAI-ADR-NNN-title.md) mirrors this usage pattern. The evolution log (docs/responsible-ai/responsible-ai-evolution.md) from @niksacdev's proposal connects them chronologically.

Additional Design Considerations

Progressive RAI Maturity

Not every project starts with full RAI compliance. A maturity model helps teams adopt incrementally:

Level	What's Active	Effort
Awareness	Shared instructions file provides passive guidance when editing code	Zero (auto-applied via `applyTo`)
Assessment	Impact assessment agent generates a risk document	One conversation session
Compliance	Code compliance agent runs against identified risk areas	Per-PR or on-demand
Governance	Full loop: assessment informs compliance, both export to threat model, ADRs track decisions, evolution log connects them	Ongoing, integrated into workflow

Teams choose their entry point. The system is designed so each level adds value independently.

Skill Packaging for Automated Checks

Concrete automated checks (accessibility scanning, bias pattern detection) could be packaged as skills under .github/skills/. Skills bundle scripts with documentation and are invoked by agents when needed. For example:

.github/skills/accessibility-scan/ with axe-core or similar tooling
.github/skills/bias-detection/ with pattern matching for known bias indicators

This keeps agents focused on orchestration and judgment while skills handle repeatable automated analysis. Skills are independently testable, version-controlled, and community-extensible.

Adaptation from Source Material

@niksacdev's agent provides excellent domain content (bias test inputs, accessibility patterns, privacy checklists) that should inform rai-standards.instructions.md. Several adaptations are needed to fit hve-core conventions:

Convert from Claude agent format to GitHub Copilot .agent.md format with proper frontmatter.
Separate the code compliance logic from the documentation generation logic (these become two agents).
Replace team agent handoff references (UX Designer, Product Manager) with hve-core agent handoffs where applicable.
Move inline code examples into properly fenced blocks following hve-core's XML-style block standards.
Apply the prompt writing style (guidance over commands, no ALL CAPS, varied vocabulary).

Proposed Implementation Sequence

rai-standards.instructions.md - The shared knowledge base ships first. Even without the agents, this provides passive RAI guidance via applyTo patterns. Community feedback on the knowledge base informs agent design.
rai-impact-assessment.agent.md + templates - The assessment agent and its document templates ship next, including threat model delta generation in its Phase 5. This is the higher-leverage artifact: a good impact assessment shapes all downstream compliance work and establishes the first connection to the threat model.
rai-code-compliance.agent.md - The compliance agent ships after the assessment agent stabilizes, so it can consume assessment outputs from day one. Threat model status updates from compliance findings ship as part of this phase.
pr-review integration - After the compliance agent reaches maturity: stable, add RAI as an optional dimension in pr-review.
Skills for automated checks - Package reusable automated checks (accessibility scanning, bias pattern detection) as skills that either agent can invoke.
Threat model automation - After the delta format stabilizes through community use, explore automated delta application (with human approval gate) and threat model drift detection that flags when assessments and the threat model diverge.

Open Questions for Community Input

These questions remain genuinely open and would benefit from broader input:

Configuration granularity: Should .github/rai-config.yml support per-directory overrides, or is project-level configuration sufficient?
Assessment document format: Should assessments use a structured format (YAML + markdown hybrid) for machine readability, or pure markdown for accessibility? The machine-readable approach enables the compliance agent to parse risk data; the pure markdown approach is simpler for human reviewers.
Compliance finding severity model: Should the compliance agent use the same severity model as pr-review (critical/major/minor), or does RAI warrant its own scale (e.g., blocking/advisory/informational)?
Scope of accessibility checks: WCAG 2.1 AA is proposed as the baseline. Should the system support AAA compliance as an opt-in, or is AA sufficient for the initial release?
Evolution log ownership: Should the evolution log be maintained by the agents automatically, or is it a human-curated document that agents reference but don't modify?
Threat model delta granularity: Should each assessment produce a single delta file covering all proposed threat model changes, or should each proposed threat change be its own file for independent review and application? Single-file is simpler; per-change enables partial approval.
Threat model section ownership: RAI agent outputs target the Responsible AI Threats section. Should they also propose updates to the AI-Specific Threats section (OWASP LLM Top 10) when there is overlap, or should that boundary remain strict?
Cross-project threat aggregation: When multiple projects produce impact assessments, should the threat model reflect the aggregate risk posture, or should it remain generic? Aggregate gives a truer picture; generic avoids coupling the threat model to specific deployments.

0 replies

agreaves-ms · 2026-02-12T00:27:06Z

agreaves-ms
Feb 12, 2026
Maintainer

I recommend not shipping a .github/instructions/rai-standards.instructions.md file unless they're actual instructions that would make sense to an LLM when working across the SDLC, and different types of projects.

You would likely want the following structure that is subagent driven:

Define your specific workflows as custom agents, these will act as subagent orchestrators.
Define subagents that will be responsible for specific RAI work, place the RAI standards into one of these since one of the subagents would be responsible for determining if RAI standards are being followed.
Use the subagents with your different workflows.

Custom agents now supports features for making subagents only being called by agents (instead of showing up in the dropdown) with user-invokable: false

https://github.com/microsoft/vscode-docs/blob/main/docs/copilot/customization/custom-agents.md#custom-agent-file-structure

2 replies

agreaves-ms Feb 12, 2026
Maintainer

Sadly subagents aren't supported just yet, it should be in the next version of VS Code. Until then, we could continue with what's planned here, my recommendation still remains about the rai-standards.instructions.md, if we want shared instructions between the custom agents then I recommed putting it into a different folder so it's not pulled in accidentally where it wont make sense for the model to follow:

#484

niksacdev Feb 16, 2026
Collaborator

agree with subagent, btw - in Claude Code the .claude/agents are automatically picked up as subagents. I think its the same behaviour in the Copilot CLI so there should not be a code change required if you had the agents available. The main instruction files (agents.md or Claude.md) can provide "hints" to leverage the agents for specific tasks.

darfaz · 2026-02-15T05:35:14Z

darfaz
Feb 15, 2026

Great consolidation of the RAI facets. The distinction between development-time compliance gates and runtime behavior monitoring is key.

For the runtime side, one area that's often underserved: security scanning of agent inputs and outputs in production. Even with perfect code compliance, an agent can be manipulated at runtime through prompt injection, or can inadvertently leak PII/secrets in its responses.

For teams looking to add this runtime security layer, ClawMoat provides lightweight, zero-dependency scanning for AI agent content — covering prompt injection, PII detection, secret exposure, and data exfiltration patterns:

npm install -g clawmoat
clawmoat scan 'agent output text'  # instant security check

Could fit well as a runtime complement to the development-time quality gates described in #91.

0 replies

niksacdev · 2026-02-16T17:51:51Z

niksacdev
Feb 16, 2026
Collaborator

Great proposal, @WilliamBerryiii . Strong agreement on the two-agent architecture. A few thoughts from the perspective of the original responsible-ai-code agent and timing based on their usage.
RAI in the Developer Loop: The [original]([https://github.com/niksacdev/engineering-team-agents/blob/main/.claude/agents/responsible-ai-code.md] agent was designed around a core principle: responsible AI is not just about LLM governance. It covers the engineering surface, including how teams build AI systems that interact with real users through UIs, how they handle bias in data and evaluation pipelines, and how they make accessibility a first-class engineering concern.

The two-agent split strengthens this. The impact assessment agent catches risks at planning time. The code compliance agent catches them at implementation time. However I would suggest we keep the mutually exclusive so there is no dependency on compliance agent for inptus from assessment agent. While assessment agent enriches the implementation but it should not be required as they serve orthogonal purposes in my opinion.

For example, I leverage the RAI agent to be called by the PM Advisor, UX/UI Designer, and System Architecture Reviewer agents during architecture, planning and implementation and its outputs feedback back into decisions the engineering team makes to be aligned from the beggining, in many cases the assessment may not be required and I can just call compliance agent to do a check for WCAG compliance or AA compliance. This cross-agent invocation pattern follows the same model as security-plan-creator being invocable from the architecture reviewer.

I think the ship cycle can work but from a design perspective we should avoid coupling of these agents, what do you think?

0 replies

darfaz · 2026-02-16T17:57:42Z

darfaz
Feb 16, 2026

@niksacdev Strong point about keeping the two agents decoupled. That orthogonal design maps well to runtime security too — you could imagine a similar split where one agent handles threat assessment ("is this input suspicious?") and another handles policy enforcement ("block/allow/flag based on rules"), each invocable independently.

Curious how you'd see runtime scanning fit into this pattern. For example, with ClawMoat we scan agent inputs/outputs for prompt injection, PII leaks, etc. — today it's a single-pass scanner, but the two-agent model suggests splitting it into assessment vs. enforcement could make it more composable across different agent architectures. Would that kind of separation be useful in your cross-agent invocation setup?

0 replies

jayachithra · 2026-02-17T10:11:37Z

jayachithra
Feb 17, 2026
Collaborator

Thanks for the thorough design proposal, @WilliamBerryiii. Two-facet framing makes a lot of sense.

On interactive vs. template-driven input, interactive works significantly better for assessments given the dependencies between the different sections - for example Stakeholder analysis feed into harm identification which feeds into mitigations. Templates risk sections filled in isolation. That said, a pre-filled template for async reviews makes sense.

Also agree with @niksacdev that agents should be kept mutually exclusive. RAI Impact assessment agent's output can be used as an optional enrichment rather than a dependency. IA agent will often be used in the beginning for assessment during planning and later again when new features are added or new deployments planned.

Curious, how would you foresee the inputs for the impact assessment agent to be provided. Typically, the agent would require project-specific artifacts (use case docs, architecture diagrams, feature specs etc.,) as input which can be made available in the repo. For longer term, we could also perhaps consider optional integration with ADO/Github items or even Sharepoint/workIQ etc.

0 replies

Responsible AI Agent Design Discussion #481

Uh oh!

WilliamBerryiii Feb 11, 2026 Maintainer

Two Facets of RAI Tooling

Design Questions

Source Material

Next Steps

Replies: 7 comments · 2 replies

Uh oh!

Uh oh!

WilliamBerryiii Feb 11, 2026 Maintainer Author

Uh oh!

WilliamBerryiii Feb 11, 2026 Maintainer Author

Responsible AI System Design Proposal

The Core Insight

1. Agent Architecture: Two Agents, One Knowledge Base

2. Impact Assessment Workflow

Proposed Phases

Where the assessment document lives

Interactive vs template-driven input

Downstream connection

3. Code Compliance Checks

Compliance Dimensions

Integration with pr-review

Relationship to Code Reviewer agent (#143)

RAI-ADR Generation

4. Threat Model Integration

How Updates Flow

Delta Format

Application Workflow

What Each Agent Contributes

Relationship to Existing Threat Model Structure

5. Platform Support

6. Documentation Structure

Additional Design Considerations

Progressive RAI Maturity

Skill Packaging for Automated Checks

Adaptation from Source Material

Proposed Implementation Sequence

Open Questions for Community Input

Uh oh!

agreaves-ms Feb 12, 2026 Maintainer

Uh oh!

agreaves-ms Feb 12, 2026 Maintainer

Uh oh!

niksacdev Feb 16, 2026 Collaborator

Uh oh!

darfaz Feb 15, 2026

Uh oh!

Uh oh!

niksacdev Feb 16, 2026 Collaborator

Uh oh!

darfaz Feb 16, 2026

Uh oh!

jayachithra Feb 17, 2026 Collaborator

WilliamBerryiii
Feb 11, 2026
Maintainer

Replies: 7 comments 2 replies

WilliamBerryiii
Feb 11, 2026
Maintainer Author

WilliamBerryiii
Feb 11, 2026
Maintainer Author

Integration with `pr-review`

agreaves-ms
Feb 12, 2026
Maintainer

agreaves-ms Feb 12, 2026
Maintainer

niksacdev Feb 16, 2026
Collaborator

darfaz
Feb 15, 2026

niksacdev
Feb 16, 2026
Collaborator

darfaz
Feb 16, 2026

jayachithra
Feb 17, 2026
Collaborator