Responsible AI Agent Design Discussion #481
Replies: 7 comments 2 replies
-
|
adding @C-Neisinger & @akzaidi to the discussion. |
Beta Was this translation helpful? Give feedback.
-
Responsible AI System Design ProposalThis proposal addresses the five design questions from #481, drawing on patterns established by existing hve-core agents ( The Core InsightThe two RAI facets differ in inputs, outputs, timing, and interaction model:
These are fundamentally different workflows that share a common knowledge base. That distinction drives the architecture. 1. Agent Architecture: Two Agents, One Knowledge BaseRecommendation: Two specialized agents connected by a shared instructions file. Why two agents instead of one:
Why a shared instructions file:
The feedback loop: The impact assessment agent produces a document that identifies project-specific risks, stakeholder harms, and required mitigations. The code compliance agent reads that assessment (when it exists) and tailors its analysis accordingly. A generic web app and a medical diagnosis system both get WCAG checks, but only the latter gets targeted bias scrutiny on clinical decision paths. This assessment-informed compliance approach makes the system more than the sum of its parts. 2. Impact Assessment WorkflowRecommendation: A conversational, phase-based agent modeled on The Proposed Phases
Where the assessment document livesRecommendation: This follows the established pattern: Interactive vs template-driven inputRecommendation: Interactive with a structured template as fallback. The agent asks clarifying questions interactively (like Downstream connectionThe completed assessment document includes a machine-readable section (YAML frontmatter or a structured risks section) that the code compliance agent can parse to configure its checks. For example: # Within the assessment document
identified_risks:
- category: bias
components: ["recommendation-engine", "scoring-pipeline"]
severity: high
threat_model_ref: RAI-1 # Maps to existing threat, updates likelihood/mitigations
- category: accessibility
components: ["patient-dashboard", "report-viewer"]
severity: high
threat_model_ref: RAI-1 # Fairness dimension
- category: privacy
components: ["data-ingestion", "user-profiles"]
severity: medium
threat_model_ref: RAI-3 # Maps to existing privacy threat
- category: transparency
components: ["automated-triage-engine"]
severity: high
threat_model_ref: null # New threat, generates RAI-14 deltaThe 3. Code Compliance ChecksRecommendation: An autonomous agent with configurable compliance dimensions, producing findings and RAI-ADRs. Compliance DimensionsRather than a fixed framework or fully open configuration, the agent ships with a set of compliance dimensions that activate based on context:
Teams can override activation through a lightweight configuration file ( Integration with
|
| Source | Trigger | What It Produces |
|---|---|---|
| Impact Assessment (Phase 5) | New or revised assessment completed | New threat entries for risks without existing coverage; updated likelihood, impact, or mitigations for existing threats |
| Code Compliance (per-scan) | Compliance scan finds issues or confirms mitigations | Updated Status and Residual Risk fields for existing threats; new threats when novel risk patterns are detected |
Delta Format
Rather than modifying the threat model directly during agent execution, both agents produce threat model deltas in a staging area. A delta is a structured description of proposed changes that a human reviews before applying.
Deltas live in .copilot-tracking/rai/threat-model-deltas/ and follow this format:
# .copilot-tracking/rai/threat-model-deltas/2026-02-11-assessment-medical-triage.md
---
source: rai-impact-assessment
assessment: docs/responsible-ai/assessments/medical-triage-assessment.md
date: 2026-02-11
---
## Proposed Threat Model Updates
### Update: RAI-1 (Fairness - Biased Code Generation Patterns)
Field changes:
- **Likelihood**: Medium → High (clinical decision paths create higher bias risk)
- **Mitigations**: Add "Bias testing for clinical scoring algorithms per assessment finding B-1"
- **Residual Risk**: Medium → Medium (new mitigations offset increased likelihood)
Rationale: The medical triage system uses ML-based scoring that directly affects
patient prioritization. The impact assessment identified disproportionate error rates
across demographic groups as a high-severity risk (finding B-1).
### New: RAI-14 (Explainability - Opaque Triage Decisions)
| Field | Value |
|-------------------|----------------------------------------------------------------------|
| **Category** | Transparency (Microsoft RAI Standard) |
| **Asset** | Patient trust, clinical decision auditability |
| **Threat** | Automated triage decisions lack explainable reasoning for clinicians |
| **Likelihood** | High |
| **Impact** | High (regulatory and patient safety implications) |
| **Mitigations** | Explainability layer for scoring output, decision audit log |
| **Residual Risk** | Medium |
| **Status** | Proposed |
Rationale: No existing threat covers explainability requirements for automated
clinical decisions. Impact assessment finding T-1 identifies this as a regulatory
requirement under applicable healthcare AI guidelines.This format preserves the threat model's existing table structure for new entries and uses a compact field-change notation for updates. The rationale connects each change back to specific assessment findings, creating an audit trail from risk identification to threat model update.
Application Workflow
The delta-then-apply pattern keeps humans in the loop while reducing the effort to maintain threat model currency:
- Agent generates deltas during assessment (Phase 5) or after a compliance scan.
- Deltas accumulate in
.copilot-tracking/rai/threat-model-deltas/. - Human reviews proposed changes, adjusting as needed.
- Agent applies approved deltas to
docs/security/threat-model.mdon request, using the standard edit workflow. - Applied deltas are archived (moved to a
applied/subfolder or deleted) to prevent re-application.
This mirrors how pr-review generates findings that a human triages before acting on. The agent does the analytical work; the human owns the decision.
What Each Agent Contributes
Impact Assessment agent produces structural changes to the threat model:
- New threat entries when the assessment identifies risks that no existing RAI threat covers.
- Likelihood and impact adjustments when a specific project context changes the risk profile (a generic "Medium" likelihood for bias becomes "High" for a healthcare application).
- New mitigations derived from the assessment's mitigation planning phase.
Code Compliance agent produces status changes to the threat model:
- Status updates when a compliance scan confirms that mitigations are implemented ("Partially Mitigated" → "Mitigated").
- Residual Risk adjustments when scan results reveal that current controls are insufficient.
- New threats when code analysis discovers novel risk patterns not covered by existing entries (rare, but possible when scanning unfamiliar frameworks).
This division of responsibility means the threat model reflects both planned mitigations (from assessment) and verified implementation (from compliance), giving a complete picture of the RAI risk posture.
Relationship to Existing Threat Model Structure
The existing threat model organizes threats into four sections: STRIDE, Dev Container, AI-Specific, and Responsible AI. RAI agent outputs target only the Responsible AI section (and potentially the AI-Specific section for OWASP LLM Top 10 overlaps). The other sections remain the responsibility of the security-plan-creator agent and manual security review.
The threat model's Assurance Argument section includes Goal G4 ("Responsible AI principles are followed") with Evidence pointing to "Writing style guidelines, inclusive language checks, PR reviews." As the RAI system matures, this evidence list expands to include impact assessments, compliance scan results, and RAI-ADRs, strengthening the assurance case.
5. Platform Support
Recommendation: A shared rai-standards.instructions.md referenced by platform-specific agent files.
The shared instructions file is the single source of RAI knowledge. Platform-specific agent files reference it:
| Platform | Agent Location | How It References Shared Knowledge |
|---|---|---|
| GitHub Copilot | .github/agents/*.agent.md |
#file: directive or path reference to rai-standards.instructions.md |
| Claude Code | .claude/agents/*.md |
Direct file reference in agent body |
| AGENTS.md | AGENTS.md |
Section reference or file inclusion |
The platform-specific agent files handle tool declarations (tools: frontmatter), handoffs, and platform conventions. The RAI knowledge, principles, checklists, and standards live in the shared instructions file. This prevents the drift that would occur if each platform maintained its own copy.
6. Documentation Structure
Recommendation: Extend the existing docs/ and docs/templates/ structure with RAI-specific paths.
docs/
responsible-ai/
README.md # RAI overview, how to use the agents, maturity model
assessments/ # Completed impact assessments (committed artifacts)
{{project-name}}-assessment.md
RAI-ADR-NNN-title.md # RAI architecture decision records
responsible-ai-evolution.md # Chronological evolution log
security/
threat-model.md # Existing; RAI Threats section updated by agents
templates/
rai-impact-assessment-template.md # Impact assessment document template
rai-adr-template.md # RAI architecture decision record template
.copilot-tracking/
rai/
assessments/ # In-progress impact assessment drafts
compliance/ # Compliance scan tracking and results
threat-model-deltas/ # Staged threat model updates awaiting review
applied/ # Archived deltas after application
Impact assessments and ADRs live in separate paths.
Assessments are planning-phase documents that describe what risks exist. ADRs are implementation-phase documents that describe what decisions were made to address those risks. They serve different audiences at different times:
- An impact assessment is reviewed during project planning, potentially by governance or ethics reviewers external to the engineering team.
- RAI-ADRs are reviewed during code review, referenced by engineers making implementation choices.
Keeping them in separate paths (docs/responsible-ai/assessments/ vs docs/responsible-ai/RAI-ADR-NNN-title.md) mirrors this usage pattern. The evolution log (docs/responsible-ai/responsible-ai-evolution.md) from @niksacdev's proposal connects them chronologically.
Additional Design Considerations
Progressive RAI Maturity
Not every project starts with full RAI compliance. A maturity model helps teams adopt incrementally:
| Level | What's Active | Effort |
|---|---|---|
| Awareness | Shared instructions file provides passive guidance when editing code | Zero (auto-applied via applyTo) |
| Assessment | Impact assessment agent generates a risk document | One conversation session |
| Compliance | Code compliance agent runs against identified risk areas | Per-PR or on-demand |
| Governance | Full loop: assessment informs compliance, both export to threat model, ADRs track decisions, evolution log connects them | Ongoing, integrated into workflow |
Teams choose their entry point. The system is designed so each level adds value independently.
Skill Packaging for Automated Checks
Concrete automated checks (accessibility scanning, bias pattern detection) could be packaged as skills under .github/skills/. Skills bundle scripts with documentation and are invoked by agents when needed. For example:
.github/skills/accessibility-scan/with axe-core or similar tooling.github/skills/bias-detection/with pattern matching for known bias indicators
This keeps agents focused on orchestration and judgment while skills handle repeatable automated analysis. Skills are independently testable, version-controlled, and community-extensible.
Adaptation from Source Material
@niksacdev's agent provides excellent domain content (bias test inputs, accessibility patterns, privacy checklists) that should inform rai-standards.instructions.md. Several adaptations are needed to fit hve-core conventions:
- Convert from Claude agent format to GitHub Copilot
.agent.mdformat with proper frontmatter. - Separate the code compliance logic from the documentation generation logic (these become two agents).
- Replace team agent handoff references (UX Designer, Product Manager) with hve-core agent handoffs where applicable.
- Move inline code examples into properly fenced blocks following hve-core's XML-style block standards.
- Apply the prompt writing style (guidance over commands, no ALL CAPS, varied vocabulary).
Proposed Implementation Sequence
-
rai-standards.instructions.md- The shared knowledge base ships first. Even without the agents, this provides passive RAI guidance viaapplyTopatterns. Community feedback on the knowledge base informs agent design. -
rai-impact-assessment.agent.md+ templates - The assessment agent and its document templates ship next, including threat model delta generation in its Phase 5. This is the higher-leverage artifact: a good impact assessment shapes all downstream compliance work and establishes the first connection to the threat model. -
rai-code-compliance.agent.md- The compliance agent ships after the assessment agent stabilizes, so it can consume assessment outputs from day one. Threat model status updates from compliance findings ship as part of this phase. -
pr-reviewintegration - After the compliance agent reachesmaturity: stable, add RAI as an optional dimension inpr-review. -
Skills for automated checks - Package reusable automated checks (accessibility scanning, bias pattern detection) as skills that either agent can invoke.
-
Threat model automation - After the delta format stabilizes through community use, explore automated delta application (with human approval gate) and threat model drift detection that flags when assessments and the threat model diverge.
Open Questions for Community Input
These questions remain genuinely open and would benefit from broader input:
- Configuration granularity: Should
.github/rai-config.ymlsupport per-directory overrides, or is project-level configuration sufficient? - Assessment document format: Should assessments use a structured format (YAML + markdown hybrid) for machine readability, or pure markdown for accessibility? The machine-readable approach enables the compliance agent to parse risk data; the pure markdown approach is simpler for human reviewers.
- Compliance finding severity model: Should the compliance agent use the same severity model as
pr-review(critical/major/minor), or does RAI warrant its own scale (e.g., blocking/advisory/informational)? - Scope of accessibility checks: WCAG 2.1 AA is proposed as the baseline. Should the system support AAA compliance as an opt-in, or is AA sufficient for the initial release?
- Evolution log ownership: Should the evolution log be maintained by the agents automatically, or is it a human-curated document that agents reference but don't modify?
- Threat model delta granularity: Should each assessment produce a single delta file covering all proposed threat model changes, or should each proposed threat change be its own file for independent review and application? Single-file is simpler; per-change enables partial approval.
- Threat model section ownership: RAI agent outputs target the Responsible AI Threats section. Should they also propose updates to the AI-Specific Threats section (OWASP LLM Top 10) when there is overlap, or should that boundary remain strict?
- Cross-project threat aggregation: When multiple projects produce impact assessments, should the threat model reflect the aggregate risk posture, or should it remain generic? Aggregate gives a truer picture; generic avoids coupling the threat model to specific deployments.
Beta Was this translation helpful? Give feedback.
-
|
I recommend not shipping a You would likely want the following structure that is subagent driven:
Custom agents now supports features for making subagents only being called by agents (instead of showing up in the dropdown) with |
Beta Was this translation helpful? Give feedback.
-
|
Great consolidation of the RAI facets. The distinction between development-time compliance gates and runtime behavior monitoring is key. For the runtime side, one area that's often underserved: security scanning of agent inputs and outputs in production. Even with perfect code compliance, an agent can be manipulated at runtime through prompt injection, or can inadvertently leak PII/secrets in its responses. For teams looking to add this runtime security layer, ClawMoat provides lightweight, zero-dependency scanning for AI agent content — covering prompt injection, PII detection, secret exposure, and data exfiltration patterns: npm install -g clawmoat
clawmoat scan 'agent output text' # instant security checkCould fit well as a runtime complement to the development-time quality gates described in #91. |
Beta Was this translation helpful? Give feedback.
-
|
Great proposal, @WilliamBerryiii . Strong agreement on the two-agent architecture. A few thoughts from the perspective of the original responsible-ai-code agent and timing based on their usage. The two-agent split strengthens this. The impact assessment agent catches risks at planning time. The code compliance agent catches them at implementation time. However I would suggest we keep the mutually exclusive so there is no dependency on compliance agent for inptus from assessment agent. While assessment agent enriches the implementation but it should not be required as they serve orthogonal purposes in my opinion. For example, I leverage the RAI agent to be called by the PM Advisor, UX/UI Designer, and System Architecture Reviewer agents during architecture, planning and implementation and its outputs feedback back into decisions the engineering team makes to be aligned from the beggining, in many cases the assessment may not be required and I can just call compliance agent to do a check for WCAG compliance or AA compliance. This cross-agent invocation pattern follows the same model as security-plan-creator being invocable from the architecture reviewer. I think the ship cycle can work but from a design perspective we should avoid coupling of these agents, what do you think? |
Beta Was this translation helpful? Give feedback.
-
|
@niksacdev Strong point about keeping the two agents decoupled. That orthogonal design maps well to runtime security too — you could imagine a similar split where one agent handles threat assessment ("is this input suspicious?") and another handles policy enforcement ("block/allow/flag based on rules"), each invocable independently. Curious how you'd see runtime scanning fit into this pattern. For example, with ClawMoat we scan agent inputs/outputs for prompt injection, PII leaks, etc. — today it's a single-pass scanner, but the two-agent model suggests splitting it into assessment vs. enforcement could make it more composable across different agent architectures. Would that kind of separation be useful in your cross-agent invocation setup? |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the thorough design proposal, @WilliamBerryiii. Two-facet framing makes a lot of sense. On interactive vs. template-driven input, interactive works significantly better for assessments given the dependencies between the different sections - for example Stakeholder analysis feed into harm identification which feeds into mitigations. Templates risk sections filled in isolation. That said, a pre-filled template for async reviews makes sense. Also agree with @niksacdev that agents should be kept mutually exclusive. RAI Impact assessment agent's output can be used as an optional enrichment rather than a dependency. IA agent will often be used in the beginning for assessment during planning and later again when new features are added or new deployments planned. Curious, how would you foresee the inputs for the impact assessment agent to be provided. Typically, the agent would require project-specific artifacts (use case docs, architecture diagrams, feature specs etc.,) as input which can be made available in the repo. For longer term, we could also perhaps consider optional integration with ADO/Github items or even Sharepoint/workIQ etc. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This discussion consolidates two related issues — #91 and #384 — into a unified design conversation for HVE Core's Responsible AI tooling. Both address RAI from complementary lifecycle stages, and we believe the strongest outcome is a coherent design that covers both.
Two Facets of RAI Tooling
@pr-reviewThese aren't duplicates so much as two halves of a complete RAI story: one that helps teams think through risks before building and one that validates compliance while building.
Design Questions
1. Agent architecture — one agent or two?
Should these be a single RAI agent with two modes (assessment vs. compliance), or separate agents that share a common RAI knowledge base? The code compliance facet operates on source files; the impact assessment facet operates on project descriptions and use cases. Different inputs, different outputs, but shared principles.
2. Impact Assessment workflow
@jayachithra proposed an agent that ingests a project's use case and generates a structured RAI Impact Assessment document covering intended uses, possible misuses, stakeholder harms, and mitigations — grounded in Microsoft's RAI Standards. Key questions:
docs/responsible-ai/? A dedicateddocs/rai-assessments/directory?)3. Code compliance checks
@niksacdev proposed a development-time agent covering:
Questions:
@pr-reviewworkflow — does it run automatically, or is it invoked explicitly?4. Platform support
Both agents need to work across Claude Code (
.claude/agents/), GitHub Copilot (.github/agents/+.github/chatmodes/), and AGENTS.md. Should they share a common RAI instructions file that both agents reference?5. Documentation structure
Proposed layout from #91:
docs/responsible-ai/RAI-ADR-[number]-[title].md— compliance decision recordsdocs/responsible-ai/responsible-ai-evolution.md— tracking filedocs/templates/responsible-ai-adr-template.mdShould impact assessments live alongside ADRs, or in a separate path?
Source Material
Next Steps
We'd love input from @niksacdev, @jayachithra, @agreaves-ms, and the broader community on these design questions. Once we converge on architecture and scope, we'll create focused implementation issues from this discussion.
Beta Was this translation helpful? Give feedback.
All reactions