diff --git a/.copilot-tracking/IMPLEMENTATION_ARCHIVE.md b/.copilot-tracking/IMPLEMENTATION_ARCHIVE.md deleted file mode 100644 index 4b68ab6..0000000 --- a/.copilot-tracking/IMPLEMENTATION_ARCHIVE.md +++ /dev/null @@ -1,58 +0,0 @@ -# Implementation Archive - -This file contains completed implementation history for reference. See IMPLEMENTATION_PLAN.md for current work. - -**Archive Date**: 2026-01-23 - -## All Priority 0-2 Features: ✅ COMPLETE - -All critical security, data integrity, and user experience features have been successfully implemented and tested. - -### Completed Implementation (2026-01-23) - -**Security (Priority 0):** -- DoS Prevention with rate limiting -- PII Detection with warnings -- XSS Sanitization with shared validation utilities - -**Data Integrity (Priority 1):** -- Batch Validation with structured error codes -- Duplicate Detection with normalized text comparison -- Assignment Error Feedback with conflict details - -**User Experience (Priority 2):** -- Explorer State Preservation (URL-based filters) -- Keyword Search (full-text search) -- Tag Filtering (tri-state: include/exclude/neutral) -- Assignment Takeover (admin force-assignment) -- Explorer Sorting (including tag count) -- Modal Keyboard Handling -- Inspection Performance (session cache) - -**Technical Debt (Priority 3):** -- Frontend Code Quality (removed skipped tests, fixed tag glossary isolation) -- Backend Code Cleanup (removed print statements, rate limiter test isolation) -- CI Code Quality Gates (type checker clean, 0 errors) -- Pre-commit hooks for frontend - -**Documentation (Priority 4):** -- Documentation Infrastructure (MkDocs with Material theme) -- Documentation Content (guides, API docs, architecture docs) -- Tag Glossary (tooltips, full view, inline editing for custom tags) - -**Performance & Optimization:** -- Cosmos Indexing Policy optimization (ready for deployment) -- Partial Updates optimization (patch operations) -- Query Performance Monitoring infrastructure - -### Test Status (Archive Date: 2026-01-23) - -- **Backend**: 267 unit tests passing, 138 integration tests passing -- **Frontend**: 237 tests passing -- **Type Checking**: All checks passed (backend ty, frontend tsc) - -### Architecture Notes - -- **Architecture**: Well-structured with 8 specialized services (Assignment, Curation, Search, TagRegistry, Chat, Snapshot, Validation, Inference) -- **Dependency Injection**: Pragmatic hybrid approach (FastAPI Depends, Container singleton, Pydantic Settings) -- **Code Quality**: No print statements, no skipped tests, type-safe with zero type checker errors diff --git a/.copilot-tracking/changes/20260116-export-pipeline-design-changes.md b/.copilot-tracking/changes/20260116-export-pipeline-design-changes.md deleted file mode 100644 index 636a3c6..0000000 --- a/.copilot-tracking/changes/20260116-export-pipeline-design-changes.md +++ /dev/null @@ -1,45 +0,0 @@ - -# Release Changes: Export pipeline design - -**Related Plan**: 20260116-export-pipeline-design-plan.instructions.md -**Implementation Date**: 2026-01-16 - -## Summary - -Planned updates for the export pipeline design implementation. - -## Changes - -### Added - -### Modified - -* docs/computed-tags-design.md - Documented the snapshot export baseline contract for pipeline compatibility. -* docs/computed-tags-design.md - Defined the v1 export pipeline API surface and defaults. -* docs/computed-tags-design.md - Updated the pipeline entry point to reuse the snapshot POST route. -* docs/computed-tags-design.md - Added processor and formatter interface rules with determinism guidance. -* docs/computed-tags-design.md - Documented registries, config env vars, and container wiring. -* docs/computed-tags-design.md - Added execution flow, delivery modes, and initial formatter output shapes. -* docs/computed-tags-design.md - Documented export storage interface and Blob configuration strategy. -* docs/computed-tags-design.md - Selected backend streaming delivery for Blob-hosted artifacts. -* docs/computed-tags-design.md - Updated Blob authentication to managed identity only with local export warning. -* docs/computed-tags-design.md - Added test strategy and rollout guidance for the export pipeline. - -### Removed - -## Release Summary - -**Total Files Affected**: 3 - -### Files Modified (3) - -* docs/computed-tags-design.md - Added export pipeline design details across baseline, interfaces, execution, storage, and testing. -* .copilot-tracking/changes/20260116-export-pipeline-design-changes.md - Recorded implementation progress and summaries. -* .copilot-tracking/plans/20260116-export-pipeline-design-plan.instructions.md - Marked all phases and tasks complete. - -### Dependencies & Infrastructure - -* **New Dependencies**: None -* **Updated Dependencies**: None -* **Infrastructure Changes**: None -* **Configuration Updates**: None diff --git a/.copilot-tracking/changes/20260116-export-pipeline-implementation-changes.md b/.copilot-tracking/changes/20260116-export-pipeline-implementation-changes.md deleted file mode 100644 index eeb01f3..0000000 --- a/.copilot-tracking/changes/20260116-export-pipeline-implementation-changes.md +++ /dev/null @@ -1,110 +0,0 @@ ---- -title: Export pipeline implementation changes -description: Tracking updates for the export pipeline implementation work. -ms.date: 2026-01-16 ---- - - -# Release Changes: Export pipeline implementation - -**Related Plan**: 20260116-export-pipeline-implementation-plan.instructions.md -**Implementation Date**: 2026-01-16 - -## Summary - -Tracking updates for the export pipeline implementation tasks. - -## Changes - -### Added - -* backend/app/exports/__init__.py - Introduced the export pipeline package marker. -* backend/app/exports/models.py - Added request models for snapshot export defaults. -* backend/app/exports/registry.py - Added processor and formatter registries with name resolution helpers. -* backend/app/exports/processors/__init__.py - Added export processor package marker. -* backend/app/exports/processors/merge_tags.py - Added merge tags export processor. -* backend/app/exports/formatters/__init__.py - Added export formatter package marker. -* backend/app/exports/formatters/json_items.py - Added JSON items export formatter. -* backend/app/exports/formatters/json_snapshot_payload.py - Added JSON snapshot payload formatter. -* backend/tests/unit/test_export_registry.py - Added unit tests for export registry behavior. -* backend/tests/unit/test_export_formatters.py - Added unit tests for export formatter outputs. -* backend/tests/unit/test_export_processors.py - Added unit tests for export processor behavior. -* backend/app/exports/storage/__init__.py - Added export storage package marker. -* backend/app/exports/storage/base.py - Added export storage interface protocol. -* backend/app/exports/storage/local.py - Added local filesystem export storage backend. -* backend/app/exports/storage/blob.py - Added Azure Blob export storage backend. -* backend/app/exports/pipeline.py - Added pipeline delivery helpers for attachments, streams, and artifacts. -* backend/tests/unit/test_export_pipeline.py - Added unit tests for pipeline delivery behaviors. - -### Modified - -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Marked Task 1.1 complete after verifying snapshot endpoint contracts. -* backend/app/api/v1/ground_truths.py - Allowed optional snapshot request bodies while preserving legacy behavior. -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Marked Task 1.2 and Phase 1 as complete. -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Marked Task 2.1 complete after adding request models. -* backend/app/core/config.py - Added export processor order setting for pipeline configuration. -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Marked Task 2.2 and Phase 2 as complete. -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Marked Tasks 3.1-3.2 and Phase 3 as complete. -* backend/app/core/config.py - Added export storage settings and blob configuration validation. -* backend/pyproject.toml - Added Azure Blob SDK dependency. -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Marked Tasks 4.1-4.3 and Phase 4 as complete. -* backend/app/container.py - Wired export registries, storage, and pipeline into the container. -* backend/app/exports/registry.py - Added formatter factory support for contextual formatting. -* backend/app/exports/formatters/json_snapshot_payload.py - Preserved legacy filters by avoiding injected dataset names. -* backend/app/services/snapshot_service.py - Delegated snapshot payloads and artifacts to the export pipeline. -* backend/app/api/v1/ground_truths.py - Routed snapshot POST requests through the pipeline with validation. -* backend/tests/unit/test_snapshot_service.py - Updated snapshot service tests for pipeline wiring. -* backend/tests/unit/test_export_registry.py - Updated registry tests for formatter creation. -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Marked Tasks 5.1-5.3 and Phase 5 as complete. - -### Removed - -* .copilot-tracking/prompts/implement-export-pipeline-implementation.prompt.md - Removed implementation prompt after completing tasks. - -## Release Summary - -Total files affected: 26. - -### Files Created (17) - -* backend/app/exports/__init__.py - Export pipeline package marker -* backend/app/exports/models.py - Snapshot export request models -* backend/app/exports/registry.py - Export processor and formatter registries -* backend/app/exports/processors/__init__.py - Export processors package marker -* backend/app/exports/processors/merge_tags.py - Merge tags export processor -* backend/app/exports/formatters/__init__.py - Export formatters package marker -* backend/app/exports/formatters/json_items.py - JSON items formatter -* backend/app/exports/formatters/json_snapshot_payload.py - Snapshot payload formatter -* backend/app/exports/storage/__init__.py - Export storage package marker -* backend/app/exports/storage/base.py - Export storage protocol -* backend/app/exports/storage/local.py - Local export storage backend -* backend/app/exports/storage/blob.py - Azure Blob export storage backend -* backend/app/exports/pipeline.py - Export pipeline delivery helpers -* backend/tests/unit/test_export_registry.py - Export registry unit tests -* backend/tests/unit/test_export_formatters.py - Export formatter unit tests -* backend/tests/unit/test_export_processors.py - Export processor unit tests -* backend/tests/unit/test_export_pipeline.py - Export pipeline delivery unit tests - -### Files Modified (8) - -* .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md - Task progress updates -* .copilot-tracking/changes/20260116-export-pipeline-implementation-changes.md - Change tracking updates -* backend/app/api/v1/ground_truths.py - Snapshot pipeline routing and validation -* backend/app/core/config.py - Export settings and blob validation -* backend/app/container.py - Export pipeline wiring -* backend/app/services/snapshot_service.py - Pipeline-backed snapshot logic -* backend/pyproject.toml - Azure Blob SDK dependency -* backend/tests/unit/test_snapshot_service.py - Pipeline-aware snapshot tests - -### Files Removed (1) - -* .copilot-tracking/prompts/implement-export-pipeline-implementation.prompt.md - Cleanup prompt file - -### Dependencies & Infrastructure - -* New dependency azure-storage-blob -* Export storage settings and validation in backend/app/core/config.py - -### Deployment Notes - -Ensure the blob backend settings are configured before switching `GTC_EXPORT_STORAGE_BACKEND` to `blob`. diff --git a/.copilot-tracking/changes/20260123-assignment-error-feedback-changes.md b/.copilot-tracking/changes/20260123-assignment-error-feedback-changes.md deleted file mode 100644 index fddbc03..0000000 --- a/.copilot-tracking/changes/20260123-assignment-error-feedback-changes.md +++ /dev/null @@ -1,71 +0,0 @@ - -# Release Changes: Assignment Error Feedback - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 1 - Data Integrity) -**Implementation Date**: 2026-01-23 - -## Summary - -Enhanced assignment conflict responses with structured payload that includes current assignee information (`assignedTo`, `assignedAt`). When attempting to assign an item already assigned to another user, the 409 response now provides structured JSON with assignment details instead of just a plain error message, enabling better UI feedback and conflict resolution workflows. - -## Changes - -### Added - -* `backend/app/core/errors.py` - Added `AssignmentConflictError` exception class with `assigned_to` and `assigned_at` attributes -* `backend/tests/integration/test_assignments_assign_single_cosmos.py` - Added test assertions to verify structured 409 payload (lines 87-105) - -### Modified - -* `backend/app/services/assignment_service.py` - Import `AssignmentConflictError` from core.errors module -* `backend/app/services/assignment_service.py` - Changed line 207 to raise `AssignmentConflictError` instead of `ValueError` with assignment details (assigned_to, assigned_at) -* `backend/app/api/v1/assignments.py` - Import `AssignmentConflictError` and `JSONResponse` -* `backend/app/api/v1/assignments.py` - Updated `assign_item` endpoint return type to `GroundTruthItem | JSONResponse` with `response_model=None` -* `backend/app/api/v1/assignments.py` - Added `except AssignmentConflictError` handler (lines 280-293) that returns structured JSON response with `detail`, `assignedTo`, and `assignedAt` fields -* `backend/tests/integration/test_assignments_assign_single_cosmos.py` - Updated test docstring and added assertions for structured response verification - -### Removed - -* None - -## Release Summary - -**Total files affected**: 4 files modified - -**API Changes**: -- `POST /v1/assignments/{dataset}/{bucket}/{item_id}/assign` now returns structured JSON on 409 conflict: - ```json - { - "detail": "Item is already assigned to another user", - "assignedTo": "user@example.com", - "assignedAt": "2026-01-23T12:34:56.789012+00:00" - } - ``` - -**Error Response Structure**: -- `detail` (str): Human-readable error message -- `assignedTo` (str): Email/ID of user who currently has the assignment -- `assignedAt` (str | undefined): ISO 8601 timestamp of when assignment was made (if available) - -**Exception Hierarchy**: -- New `AssignmentConflictError` exception carries structured data through service → API layer -- Replaces generic `ValueError("Item is already assigned to another user")` -- Enables consistent structured responses across all assignment conflict scenarios - -**Testing**: -- All 225 unit tests passing -- Integration test updated to verify structured response fields -- Type checking passes with `ty check` - -**Backward Compatibility**: -- HTTP status code remains 409 (no change) -- Response structure changed from simple `detail` string to structured JSON object -- Clients expecting only `detail` field will still work but won't utilize new assignment info -- Frontend should update to display `assignedTo` information in conflict dialogs - -**Deployment Notes**: -- No database migrations required -- No configuration changes required -- Backend-only changes -- Recommended: Update frontend to display assignee info when assignment conflicts occur -- Enables future "Assignment Takeover" feature (force parameter) with better UX diff --git a/.copilot-tracking/changes/20260123-assignment-takeover-changes.md b/.copilot-tracking/changes/20260123-assignment-takeover-changes.md deleted file mode 100644 index 32ed22f..0000000 --- a/.copilot-tracking/changes/20260123-assignment-takeover-changes.md +++ /dev/null @@ -1,85 +0,0 @@ -# Release Changes: Assignment Takeover - -**Implementation Date**: 2026-01-23 - -## Summary - -Implemented assignment takeover functionality allowing users with `admin` or `team-lead` roles to forcefully reassign ground truth items that are currently assigned to another user in draft status. This addresses the operational need to redistribute work when team members are unavailable. - -## Changes - -### Added - -* `backend/tests/integration/test_assignments_assign_single_cosmos.py` - Added 5 new integration tests for force assignment scenarios: - - `test_force_assign_without_role_returns_403` - Validates permission denial for regular users - - `test_force_assign_with_admin_role_succeeds` - Validates successful force assignment with admin role - - `test_force_assign_with_team_lead_role_succeeds` - Validates successful force assignment with team-lead role - - `test_force_assign_unassigned_item_succeeds` - Validates force assignment on unassigned items (no-op) -* `backend/tests/integration/conftest.py` - Added `admin_headers` and `team_lead_headers` fixtures with proper role claims in X-MS-CLIENT-PRINCIPAL header -* `backend/app/api/v1/assignments.py` - Added `AssignmentItemRequest` Pydantic model with `force: bool` field for request body - -### Modified - -* `backend/app/api/v1/assignments.py` - Updated `/v1/assignments/{dataset}/{bucket}/{item_id}/assign` endpoint: - - Accepts optional request body with `force` parameter - - Passes `user.roles` to service layer for permission checking - - Handles `PermissionError` and returns HTTP 403 Forbidden - - Added comprehensive docstring explaining force assignment behavior -* `backend/app/services/assignment_service.py` - Enhanced `assign_single_item()` method: - - Added `force: bool = False` and `user_roles: list[str] | None = None` parameters - - Added `_has_takeover_permission(roles: list[str]) -> bool` helper method - - Implemented force assignment logic that clears previous assignment before reassigning - - Added cleanup of previous assignment document after successful force takeover - - Enhanced logging to record force-assign events with previous assignee information - - Added proper error handling for assignment document cleanup failures -* `backend/app/api/v1/ground_truths.py` - Fixed bug in duplicate detection: - - Changed `page_size` parameter to `limit` (correct parameter name) - - Changed `sort_field` to `sort_by` (correct parameter name) - - Fixed tuple unpacking for `list_gt_paginated` return value - - Added try-except wrapper to gracefully handle NotImplementedError in unit tests - -## Technical Details - -### Authorization Model - -- Uses existing `UserContext.roles` from Azure AD claims -- Checks for `admin` or `team-lead` role in the roles list -- Returns HTTP 403 if force assignment attempted without proper role - -### Force Assignment Flow - -1. Service layer validates user has required role -2. Stores previous assignee for cleanup -3. Clears `assignedTo` and `assigned_at` fields from the item via `upsert_gt` -4. Calls standard `assign_to` method to assign to new user -5. Cleans up previous user's assignment document -6. Logs force takeover event with previous and new assignee details - -### Error Handling - -- `PermissionError` raised if force=True without admin/team-lead role -- `AssignmentConflictError` raised if force=False and item already assigned -- Assignment document cleanup errors are logged but don't fail the request - -## Test Results - -- All 10 assignment integration tests pass -- All 253 backend unit tests pass -- New tests validate: - - Permission denial for non-privileged users (403) - - Successful force assignment with admin role - - Successful force assignment with team-lead role - - Force assignment on unassigned items (no-op) - - Assignment document cleanup - -## Deployment Notes - -- Backend changes only; frontend confirmation dialog deferred to separate implementation -- No database migrations required -- Compatible with existing assignment workflow -- Azure AD app registration must define `admin` and `team-lead` roles in manifest - -## Related Files - -- Specification: `specs/assignment-takeover.md` -- Implementation Plan: `IMPLEMENTATION_PLAN.md` (Priority 2 - User Experience) diff --git a/.copilot-tracking/changes/20260123-backend-code-cleanup-changes.md b/.copilot-tracking/changes/20260123-backend-code-cleanup-changes.md deleted file mode 100644 index d53be48..0000000 --- a/.copilot-tracking/changes/20260123-backend-code-cleanup-changes.md +++ /dev/null @@ -1,54 +0,0 @@ - -# Release Changes: Backend Code Cleanup - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 3 - Technical Debt & Code Quality) -**Implementation Date**: 2026-01-23 - -## Summary - -Replaced the last remaining print statement in the backend codebase with proper structured logging. Converted `_to_doc` from a static method to an instance method to enable access to the class logger, improving error tracking and debugging capabilities. This completes the backend code cleanup initiative to eliminate debugging artifacts. - -## Changes - -### Added - -* None - -### Modified - -* `backend/app/adapters/repos/cosmos_repo.py` - Converted `_to_doc` method from `@staticmethod` to instance method (removed decorator, added `self` parameter) -* `backend/app/adapters/repos/cosmos_repo.py` - Replaced `print(item.__repr__())` on line 401 with `self._logger.error(f"Document missing datasetName: {item!r}")` - -### Removed - -* None - -## Release Summary - -**Total files affected**: 1 file modified - -**Code Quality Improvements**: -- Eliminated last remaining `print()` statement in `backend/app/` directory -- Improved error tracking with structured logging using class logger -- Better debugging capabilities with `logger.error()` instead of console output - -**Technical Details**: -- `_to_doc` method signature changed from `_to_doc(item: GroundTruthItem)` to `_to_doc(self, item: GroundTruthItem)` -- Method still called as `self._to_doc(item)` from lines 479 and 1138, so no call-site changes required -- Error message now properly logged at ERROR level with formatted item representation - -**Testing**: -- All 226 unit tests passing -- No test changes required (method signature compatible with existing usage) -- Verified no remaining print statements with `grep -rn "print(" backend/app/` - -**Backward Compatibility**: -- Internal refactoring only, no API changes -- No behavior changes from external perspective -- Better logging output for debugging production issues - -**Deployment Notes**: -- No database migrations required -- No configuration changes required -- No frontend changes required -- Improved observability for datasetName validation errors diff --git a/.copilot-tracking/changes/20260123-batch-validation-changes.md b/.copilot-tracking/changes/20260123-batch-validation-changes.md deleted file mode 100644 index 6d33809..0000000 --- a/.copilot-tracking/changes/20260123-batch-validation-changes.md +++ /dev/null @@ -1,66 +0,0 @@ - -# Release Changes: Batch Validation Improvements - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 1 - Data Integrity) -**Implementation Date**: 2026-01-23 - -## Summary - -Enhanced bulk import validation with structured error objects that provide programmatic error handling, row-level context, and field-specific feedback. The new error format includes error codes (INVALID_TAG, DUPLICATE_ID, CREATE_FAILED), 0-based index tracking, field names, and validation summaries with total/succeeded/failed counts. - -## Changes - -### Added - -* `backend/app/domain/models.py` - Added `BulkImportError` model with structured fields (index, item_id, field, code, message) -* `backend/app/domain/models.py` - Added `ValidationSummary` model with total/succeeded/failed statistics - -### Modified - -* `backend/app/api/v1/ground_truths.py` - Updated `ImportBulkResponse` to use structured `BulkImportError` objects instead of plain strings; added `failed` count and `validation_summary` fields -* `backend/app/services/validation_service.py` - Modified `validate_ground_truth_item` to return `BulkImportError` objects with index tracking; updated function signature to accept `item_index` parameter -* `backend/app/services/validation_service.py` - Modified `validate_bulk_items` to pass item index to validator and return structured errors -* `backend/app/api/v1/ground_truths.py` - Updated `import_bulk` endpoint to convert repository errors to structured format and build validation summary -* `backend/tests/unit/test_bulk_import_tag_validation.py` - Updated test assertions to validate structured error objects (code, field, item_id, index, message) - -### Removed - -* None - -## Release Summary - -**Total files affected**: 4 files modified - -**API Changes**: -- `ImportBulkResponse` now includes: - - `failed` (int): count of failed items - - `errors` (list[BulkImportError]): structured error objects instead of strings - - `validationSummary`: statistics with total/succeeded/failed counts -- `BulkImportError` structure: - - `index` (int): 0-based position in request array - - `itemId` (str | null): ID of failed item - - `field` (str | null): field that caused error - - `code` (str): error code (INVALID_TAG, DUPLICATE_ID, CREATE_FAILED) - - `message` (str): human-readable description - -**Error Codes**: -- `INVALID_TAG`: Tag doesn't exist in registry or violates format -- `DUPLICATE_ID`: Item with this ID already exists (Cosmos 409) -- `CREATE_FAILED`: Generic persistence failure - -**Testing**: -- All 11 bulk-related unit tests passing -- Tag validation tests updated for structured error format -- DoS prevention tests passing -- Type checking passes with acceptable warnings - -**Backward Compatibility**: -- Response structure changed but maintains same HTTP status codes -- Clients expecting string errors will need to update to use structured objects -- All other fields (imported, uuids, piiWarnings) remain unchanged - -**Deployment Notes**: -- No database migrations required -- No configuration changes required -- Backend-only changes -- Clients consuming bulk import API should update error handling logic diff --git a/.copilot-tracking/changes/20260123-explorer-state-preservation-changes.md b/.copilot-tracking/changes/20260123-explorer-state-preservation-changes.md deleted file mode 100644 index cafc42b..0000000 --- a/.copilot-tracking/changes/20260123-explorer-state-preservation-changes.md +++ /dev/null @@ -1,102 +0,0 @@ - -# Release Changes: Explorer State Preservation - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 2 - User Experience, [FOUNDATION]) -**Implementation Date**: 2026-01-23 - -## Summary - -Implemented URL-based filter state persistence for the QuestionsExplorer component. Users can now bookmark filtered views and filters persist across page reloads. This is a foundational feature that enables future enhancements like keyword search and tag filtering to be URL-addressable. - -## Changes - -### Added - -* `frontend/src/types/filters.ts` - Centralized filter type definitions (FilterState, FilterType, SortColumn, SortDirection) -* `frontend/src/utils/filterUrlParams.ts` - URL parameter management utilities: - - `parseFilterStateFromUrl()` - Parse URL search params → FilterState - - `filterStateToUrlParams()` - Convert FilterState → URLSearchParams - - `updateUrlWithoutReload()` - Update browser URL via History API without reload - - `getCurrentSearch()` - Get current search parameters -* `frontend/tests/unit/utils/filterUrlParams.test.ts` - 31 comprehensive tests covering: - - URL parsing (default values, valid parameters, invalid parameters) - - Filter state to URL conversion - - URL updates without page reload - - Edge cases (empty tags, special characters, missing params) - -### Modified - -* `frontend/src/components/app/QuestionsExplorer.tsx` - Integrated URL state persistence: - - Added imports for filter utilities and types - - Initialize filter state from URL on component mount (useEffect with empty deps) - - Sync URL when appliedFilter changes (useEffect with appliedFilter dependency) - - Preserved all existing functionality and component behavior - - No breaking changes to component API - -### Removed - -* None - -## Release Summary - -**Total files affected**: 4 files (3 added, 1 modified) - -**User Experience Improvements**: -- **Bookmarkable Views**: Users can save and share specific filter combinations via URL -- **Filter Persistence**: Page reload maintains filter state (no more lost work) -- **Clean URLs**: Only non-default parameters included in URL -- **Type-Safe**: Full validation of all URL parameters with fallback to defaults - -**Technical Details**: -- Uses browser History API (`pushState`) to update URL without page reload -- URL parameters: `status`, `dataset`, `tags`, `itemId`, `refUrl`, `sortColumn`, `sortDirection` -- Tag array encoding: comma-separated list (e.g., `tags=important,validated`) -- Special character handling: proper URL encoding/decoding for refUrl and other params -- Default values: `status=all`, `dataset=all`, `tags=[]`, etc. - -**Example URLs**: -``` -Simple: /?status=approved -Complex: /?status=approved&dataset=prod&tags=important,validated&sortColumn=refs -Item ID: /?itemId=item-123 -Reference: /?refUrl=https%3A%2F%2Fexample.com%2Fpage -Default: / (no params shown) -``` - -**Testing**: -- All 226/232 frontend tests passing (6 pre-existing skipped tests unrelated) -- 31 new tests for URL filter persistence utilities (all passing) -- TypeScript build: ✅ SUCCESS -- Vite production build: ✅ SUCCESS -- No performance regression detected - -**Backward Compatibility**: -- Fully backward compatible - URLs without parameters work as before -- Component API unchanged - no breaking changes for consumers -- Graceful fallback to default values for invalid URL parameters -- Existing filter behavior preserved exactly - -**Architecture Notes**: -- **Foundation Feature**: Enables future keyword search and tri-state tag filtering to be URL-addressable -- Clean separation of concerns: filter types → utils → component integration -- Reusable utilities for future URL state management needs -- Comprehensive test coverage (31 tests) ensures reliability - -**Deployment Notes**: -- No database migrations required -- No configuration changes required -- No backend changes required -- Frontend-only enhancement -- Deploy with standard frontend build pipeline -- Recommend testing bookmarked URLs after deployment - -**Known Limitations**: -- URL does not include pagination state (currentPage, itemsPerPage) - by design, filters are more important to preserve -- Very long tag lists may make URLs unwieldy (mitigated by comma encoding) -- Browser history will contain filter changes (user can use back/forward to navigate filter history) - -**Future Enhancements Enabled**: -- Keyword search can now be added to URL (unlocked by this foundation) -- Tri-state tag filtering can be URL-encoded (unlocked by this foundation) -- Analytics tracking of popular filter combinations via URL analysis -- Deep linking into specific views from external tools/dashboards diff --git a/.copilot-tracking/changes/20260123-has-answer-sort-documentation-changes.md b/.copilot-tracking/changes/20260123-has-answer-sort-documentation-changes.md deleted file mode 100644 index c4e3a1e..0000000 --- a/.copilot-tracking/changes/20260123-has-answer-sort-documentation-changes.md +++ /dev/null @@ -1,60 +0,0 @@ - -# Release Changes: has_answer Sort Field Documentation - -**Related Plan**: IMPLEMENTATION_PLAN.md (Code Quality) -**Implementation Date**: 2026-01-23 - -## Summary - -Resolved TODO comment in `cosmos_repo.py` by documenting the design rationale for the `has_answer` sort field mapping. The TODO suggested revisiting why `has_answer` maps to `c.reviewedAt` in Cosmos DB queries. After investigation, this is the correct implementation given Cosmos DB limitations. - -The changes add comprehensive documentation explaining: -1. Why `has_answer` uses `c.reviewedAt` as a placeholder in the ORDER BY clause -2. That actual sorting happens in-memory where `has_answer` is computed as a boolean -3. This design works around Cosmos DB's inability to sort by computed/derived fields - -## Changes - -### Modified - -* `backend/app/adapters/repos/cosmos_repo.py` - Replaced TODO with detailed documentation - * Line ~760: Added multi-line comment explaining the `has_answer` mapping rationale in `_build_secure_sort_clause` - * Line ~700: Added cross-reference comment in `_sort_key` method explaining in-memory sort implementation - * Both comments clarify that this is a deliberate design decision, not a bug or incomplete implementation - -## Technical Details - -**The Design Pattern:** - -Cosmos DB SQL doesn't support sorting by computed expressions like: -```sql -ORDER BY (c.answer IS NOT NULL AND LENGTH(c.answer) > 0) -``` - -**The Solution:** - -1. **Cosmos Query Level**: Use `c.reviewedAt` as a syntactically valid placeholder in ORDER BY -2. **Python Level**: Perform actual sorting in `_sort_key` method using: - - Primary sort key: `has_answer` (1 if answer exists and non-empty, else 0) - - Secondary sort key: `reviewed_at` (or `updated_at` fallback) - - Tertiary sort key: `id` (for stable sorting) - -**Why This Works:** - -- Cosmos DB requires a valid ORDER BY clause for pagination/consistency -- The placeholder doesn't affect correctness because Python re-sorts the results -- This pattern is consistent with how `tag_count` sorting works (also in-memory) - -## Testing - -* All 26 cosmos_repo unit tests pass -* Type checking clean with `ty check` -* No functional changes, only documentation improvements - -## Release Summary - -Resolved code clarity issue by documenting existing correct implementation. No behavior changes. - -**Files affected**: 1 file modified -**Tests**: All 267 backend unit tests passing -**Type checking**: Zero errors diff --git a/.copilot-tracking/changes/20260123-implementation-plan-summary-update.md b/.copilot-tracking/changes/20260123-implementation-plan-summary-update.md deleted file mode 100644 index f45d6c4..0000000 --- a/.copilot-tracking/changes/20260123-implementation-plan-summary-update.md +++ /dev/null @@ -1,33 +0,0 @@ - -# Release Changes: Implementation Plan Summary Update - -**Related Plan**: IMPLEMENTATION_PLAN.md -**Implementation Date**: 2026-01-23 - -## Summary - -Updated the IMPLEMENTATION_PLAN.md summary section to accurately reflect the completion status of all priority 0-2 features. Replaced the outdated "Suggested Implementation Sequence" section with a comprehensive "Implementation Status Summary" that clearly shows all completed features and the few remaining optional items. - -## Changes - -### Modified - -* `IMPLEMENTATION_PLAN.md` - Replaced lines 520-568 with new "Implementation Status Summary" section: - - Listed all completed features by priority (Security, Data Integrity, UX, Technical Debt, Documentation, Performance) - - Updated test counts: 267 backend unit tests, 138 integration tests, 237 frontend tests - - Clarified that only 3 optional items remain (pre-commit hooks, CI enhancements, production profiling) - - Removed outdated notes about incomplete features - - Added "Ready for Production" status indicator - -## Release Summary - -**Files Modified**: 1 -**Documentation Status**: Implementation plan now accurately reflects 100% completion of critical features - -## Deployment Notes - -This is a documentation-only change with no code or functional changes. The implementation plan now serves as an accurate record of completed work rather than a todo list. - -## Learnings - -When an implementation plan grows large and most items are complete, the summary section becomes stale quickly. Periodic cleanup keeps the plan useful and accurate for future reference. diff --git a/.copilot-tracking/changes/20260123-pre-commit-hooks-changes.md b/.copilot-tracking/changes/20260123-pre-commit-hooks-changes.md deleted file mode 100644 index fcb3de2..0000000 --- a/.copilot-tracking/changes/20260123-pre-commit-hooks-changes.md +++ /dev/null @@ -1,88 +0,0 @@ - -# Release Changes: Pre-Commit Hooks Implementation - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 3) -**Implementation Date**: 2026-01-23 - -## Summary - -Implemented pre-commit quality checks for the frontend codebase using npm scripts. Since this repository uses Jujutsu (jj) for version control, which lacks native hook support, the solution uses npm scripts that can be run manually or integrated into CI pipelines. - -The implementation adds automated linting and type checking to catch issues before commits, improving code quality and consistency. Fixed 36 existing code formatting issues and 6 React hooks/accessibility issues during implementation. - -## Changes - -### Added - -* `frontend/docs/PRE_COMMIT_HOOKS.md` - Comprehensive documentation for pre-commit checks, including manual setup, optional git hooks, and usage examples - -### Modified - -* `frontend/package.json` - Added `pre-commit` and `lint:check` scripts - * `pre-commit`: Combines `lint:check` and `typecheck` for comprehensive validation - * `lint:check`: Non-writing lint check for CI/automation (runs `biome check` without `--write`) - -* `frontend/src/components/app/editor/TurnReferencesModal.tsx` - Fixed React hooks order violation - * Moved `useMemo` hooks before early return statement - * Complies with React Rules of Hooks - -* `frontend/src/components/modals/TagGlossaryModal.tsx` - Fixed accessibility issues - * Added proper `htmlFor` attributes to form labels - * Added keyboard event handler for modal backdrop - * Added `aria-hidden` attribute for non-interactive backdrop - -* 36 files auto-fixed by Biome: - * Import statement organization - * Code formatting (indentation, line breaks) - * Template literal usage - * Fragment simplification - -## Testing - -* All 237 frontend tests passing -* TypeScript build succeeds with no errors -* Pre-commit script successfully validates code quality - -## Usage - -Run pre-commit checks manually: - -```bash -cd frontend -npm run pre-commit -``` - -Integrate into CI pipeline (already supported): - -```bash -npm run lint:check # Non-writing check -npm run typecheck # Type validation -``` - -## Technical Details - -**Why npm scripts instead of git hooks:** -- Jujutsu (jj) version control system lacks native hook support as of 2026-01 -- npm scripts work consistently across all VCS systems -- Easier to maintain and understand than custom hook scripts -- Can be integrated into CI/CD pipelines - -**Biome Configuration:** -- Uses `@biomejs/biome` 2.1.4 for linting -- Auto-fixes safe formatting issues -- Enforces React hooks rules and accessibility standards - -## Release Summary - -Completed optional Priority 3 enhancement for frontend code quality. Implementation adds automated pre-commit validation while maintaining compatibility with the project's Jujutsu-based version control workflow. - -**Files affected**: 40 files (1 added, 39 modified) -**Tests**: All 237 frontend tests passing -**Type checking**: Zero errors - -## Future Enhancements - -Documented in `frontend/docs/PRE_COMMIT_HOOKS.md`: -1. Optional git hook installation for git command users -2. Optional Husky integration for robust hook management -3. Potential CI/CD integration for automated quality gates diff --git a/.copilot-tracking/changes/20260123-pydantic-alias-type-fix-changes.md b/.copilot-tracking/changes/20260123-pydantic-alias-type-fix-changes.md deleted file mode 100644 index b23b1c7..0000000 --- a/.copilot-tracking/changes/20260123-pydantic-alias-type-fix-changes.md +++ /dev/null @@ -1,26 +0,0 @@ - -# Release Changes: Pydantic Alias Type Checker Fix - -**Related Plan**: IMPLEMENTATION_PLAN.md (CI Code Quality Gates) -**Implementation Date**: 2026-01-23 - -## Summary - -Fixed type checker warnings for Pydantic v2 models using field aliases. When using `alias` parameter with `populate_by_name=True`, the type checker requires using the alias names (camelCase) when instantiating models, not the field names (snake_case). - -## Changes - -### Modified - -* `backend/app/services/duplicate_detection_service.py` - Changed DuplicateWarning instantiation to use camelCase alias names (itemId, duplicateId, duplicateQuestion, duplicateStatus, matchReason) instead of snake_case field names -* `backend/app/api/v1/ground_truths.py` - Changed ImportBulkResponse instantiation to use camelCase alias names (piiWarnings, duplicateWarnings, validationSummary) instead of snake_case field names - -## Release Summary - -**Type Checker Status**: All checks passed (0 diagnostics) -**Test Results**: All 267 backend unit tests pass -**Files Modified**: 2 - -## Deployment Notes - -This change fixes type checker warnings without changing runtime behavior or API contracts. The models continue to accept both snake_case and camelCase field names during validation due to `populate_by_name=True`, but the type checker requires using the alias names for proper static analysis. diff --git a/.copilot-tracking/changes/20260123-tag-definitions-storage-changes.md b/.copilot-tracking/changes/20260123-tag-definitions-storage-changes.md deleted file mode 100644 index 2732233..0000000 --- a/.copilot-tracking/changes/20260123-tag-definitions-storage-changes.md +++ /dev/null @@ -1,67 +0,0 @@ - -# Release Changes: Custom Tag Definitions Storage (TG-04) - -**Related Plan**: IMPLEMENTATION_PLAN.md (Tag Glossary - TG-04) -**Implementation Date**: 2026-01-23 - -## Summary - -Implemented database storage for SME-created custom tag definitions, enabling users to define and persist custom tags with descriptions that appear in the tag glossary alongside system-defined manual and computed tags. This completes the backend foundation for TG-04, with frontend UI (TG-06) deferred to a future increment. - -## Changes - -### Added - -* `backend/app/adapters/repos/tag_definitions_repo.py` - Repository adapter for Cosmos DB tag definitions storage with CRUD operations (get_definition, list_all, upsert, delete) -* `backend/tests/unit/test_tag_definitions_repo.py` - Unit tests for TagDefinitionsRepo (7 tests covering CRUD operations and error cases) -* `COSMOS_CONTAINER_TAG_DEFINITIONS` config in `backend/app/core/config.py` - Container name constant (default: "tag_definitions") -* `TagDefinition` domain model in `backend/app/domain/models.py` - Fields: id, tag_key (partition key), description, created_by, created_at, updated_at, doc_type -* API endpoint `POST /v1/tags/definitions` in `backend/app/api/v1/tags.py` - Create or update custom tag definition -* API endpoint `DELETE /v1/tags/definitions/{tag_key}` in `backend/app/api/v1/tags.py` - Delete custom tag definition -* Request/response models `TagDefinitionRequest` and `TagDefinitionResponse` in `backend/app/api/v1/tags.py` - -### Modified - -* `backend/app/container.py` - Wire tag_definitions_repo in container initialization and validation -* `backend/app/api/v1/tags.py` - Extended glossary endpoint to query custom definitions and merge as "custom" type group -* `backend/scripts/cosmos_container_manager.py` - Added --tag-definitions-container flag for container creation (partition key: /tag_key, Hash) -* `backend/tests/unit/test_tags_glossary.py` - Added mock for tag_definitions_repo and test for custom definitions in glossary response (4 tests total) -* `IMPLEMENTATION_PLAN.md` - Marked TG-04 complete, updated with implementation details - -### Removed - -* None - -## Release Summary - -**Total Files Affected**: 9 files (639 lines added) -- 2 new files (repository adapter + tests) -- 7 modified files (config, domain model, container, API, script, test, plan) -- 0 removed files - -**Test Coverage**: -- 267 backend unit tests pass (8 new tests: 7 for TagDefinitionsRepo, 1 for glossary endpoint) -- All type checks pass (ty check) - -**Deployment Notes**: -- New Cosmos DB container `tag_definitions` must be created using: - ```bash - uv run python scripts/cosmos_container_manager.py \ - --endpoint \ - --key \ - --db \ - --tag-definitions-container - ``` -- Container uses partition key `/tag_key` with Hash partitioning -- No frontend changes required (glossary will display empty custom group if no definitions exist) -- TG-06 (inline editing UI) deferred to future increment - -**API Changes**: -- `GET /v1/tags/glossary` now includes "custom" group with custom tag definitions from database -- New endpoints: `POST /v1/tags/definitions`, `DELETE /v1/tags/definitions/{tag_key}` -- Authentication for management endpoints uses default "system" user_id (full auth deferred to TG-06) - -**Backward Compatibility**: -- Fully backward compatible -- Glossary endpoint returns empty custom group if tag_definitions container doesn't exist -- No breaking changes to existing API contracts diff --git a/.copilot-tracking/changes/20260123-tag-glossary-changes.md b/.copilot-tracking/changes/20260123-tag-glossary-changes.md deleted file mode 100644 index 44b9720..0000000 --- a/.copilot-tracking/changes/20260123-tag-glossary-changes.md +++ /dev/null @@ -1,72 +0,0 @@ - -# Release Changes: Tag Glossary Implementation - -**Related Plan**: N/A (standalone feature) -**Implementation Date**: 2026-01-23 - -## Summary - -Implemented tag glossary system to provide human-readable descriptions for all tags via tooltip UI. Users can now hover over any TagChip to see a definition, improving tag understanding and consistency. - -## Changes - -### Added - -* `backend/app/api/v1/tags.py` - Added `/v1/tags/glossary` endpoint returning comprehensive tag definitions from manual and computed sources -* `backend/tests/unit/test_tags_glossary.py` - Unit tests for glossary endpoint (3 tests covering manual tags, computed tags, and response structure) -* `frontend/src/hooks/useTagGlossary.ts` - React hook to fetch and cache tag glossary data, providing lookup map of tag key -> description -* Tag definitions to `backend/app/domain/manual_tags.json` - Extended schema to include group descriptions and per-tag descriptions for all 6 tag groups (source, answerability, topic, intent, expertise, difficulty) - -### Modified - -* `backend/app/domain/manual_tags_provider.py` - Extended `ManualTagGroup` dataclass to support optional `description` and `tag_definitions` fields; updated parser to handle both old (string list) and new (object list) tag formats -* `backend/tests/unit/test_manual_tags_provider.py` - Updated test assertions to match new data structure with `tag_definitions` field -* `frontend/src/components/common/TagChip.tsx` - Enhanced component to fetch and display tag descriptions via CSS tooltip on hover (using `useTagDescription` hook) -* `frontend/src/api/openapi.json` - Regenerated OpenAPI spec to include glossary endpoint schema -* `frontend/src/api/generated.ts` - Regenerated TypeScript types for glossary response models - -### Removed - -* None - -## Technical Details - -### Backend Implementation - -- **Backward compatibility**: Manual tags JSON parser accepts both old format (`tags: ["value"]`) and new format (`tags: [{"value": "...", "description": "..."}]`) -- **Data model**: Extended `ManualTagGroup` with optional `description` (group-level) and `tag_definitions` (tag-level) fields -- **API response**: Glossary endpoint merges manual tags from config file and computed tags from plugin registry into unified response -- **Computed tags**: Phase 1 includes computed tags in glossary without descriptions (descriptions deferred to future phase) - -### Frontend Implementation - -- **No dependencies added**: Used native CSS tooltips instead of Radix UI to avoid adding dependencies -- **Plain React state**: Implemented `useTagGlossary` hook with useState/useEffect instead of React Query (not installed) -- **Tooltip UX**: Tooltips appear on hover with 200ms transition, positioned above tag with arrow pointer -- **Fallback behavior**: Tags without definitions show no tooltip (graceful degradation) - -## Test Coverage - -- **Backend**: 3 new unit tests for glossary endpoint, all 256 backend unit tests passing -- **Frontend**: All 226 frontend tests passing, build succeeds - -## Manual Testing Required - -1. Start dev server: `cd backend && uv run uvicorn app.main:app --reload` -2. Start frontend: `cd frontend && npm run dev` -3. Navigate to Questions Explorer -4. Hover over tags to verify tooltips appear with descriptions -5. Verify tooltips for manual tags (e.g., "source:sme") show descriptions -6. Verify computed tags show generic "no description" behavior or empty tooltip - -## Release Summary - -**Files Added**: 2 -**Files Modified**: 7 -**Files Removed**: 0 - -**Deployment Notes**: -- No database migrations required -- No environment variable changes needed -- Backend API is backward compatible (glossary endpoint is additive) -- Frontend gracefully handles missing glossary data diff --git a/.copilot-tracking/changes/20260123-tag-glossary-inline-editing-changes.md b/.copilot-tracking/changes/20260123-tag-glossary-inline-editing-changes.md deleted file mode 100644 index c69483a..0000000 --- a/.copilot-tracking/changes/20260123-tag-glossary-inline-editing-changes.md +++ /dev/null @@ -1,85 +0,0 @@ - -# Release Changes: Tag Glossary Inline Editing (TG-06) - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 4: Documentation > Tag Glossary) -**Implementation Date**: 2026-01-23 - -## Summary - -Added inline editing capabilities for custom tag definitions in the TagGlossaryModal. SMEs can now create, edit, and delete custom tag definitions directly from the glossary UI without needing to interact with the backend API manually. - -## Changes - -### Added - -* `frontend/src/services/tags.ts` - Added `createTagDefinition()` and `deleteTagDefinition()` API client functions -* `.copilot-tracking/changes/20260123-tag-glossary-inline-editing-changes.md` - This change log file - -### Modified - -* `frontend/src/hooks/useTagGlossary.ts` - Added `refresh()` method to GlossaryStore and exposed it in the hook return value -* `frontend/src/components/modals/TagGlossaryModal.tsx` - Implemented complete inline editing UI: - - Added "New Custom Tag" button to controls section - - Implemented create form with tag key and description inputs - - Added Edit (pencil) and Delete (trash) buttons for custom tags - - Implemented inline editing mode for tag descriptions - - Added state management for editing/creating operations with loading states - - Integrated with refresh() to update glossary after mutations -* `frontend/src/api/generated.ts` - Regenerated TypeScript types from updated OpenAPI spec -* `frontend/src/api/openapi.json` - Regenerated from backend API (includes tag definitions endpoints) -* `backend/pyproject.toml` - Updated (via export_openapi.py formatting) - -### Removed - -None - -## Implementation Details - -### UI Features - -1. **New Custom Tag Creation**: - - Button in controls section opens inline form - - Form validates both tag key and description required - - Cancel button discards changes - - Success refreshes glossary and closes form - -2. **Inline Editing**: - - Edit button appears only on custom tag entries - - Switches to inline textarea for description editing - - Save/Cancel buttons for inline editing mode - - Disabled state during async operations - -3. **Tag Deletion**: - - Delete button appears only on custom tag entries - - Confirmation dialog prevents accidental deletion - - Success refreshes glossary to reflect removal - -4. **Error Handling**: - - Alert dialogs for API errors with descriptive messages - - Disabled UI during async operations (submitting state) - - Form validation before submission - -### API Integration - -- `POST /v1/tags/definitions` - Create or update custom tag definition -- `DELETE /v1/tags/definitions/{tag_key}` - Delete custom tag definition -- Both endpoints use the existing backend infrastructure (TG-04) - -### Testing Results - -- All 237 frontend tests pass -- All 267 backend unit tests pass -- TypeScript build succeeds with no errors -- Frontend production build succeeds - -## Release Summary - -**Files Modified**: 5 -**Files Added**: 1 -**Files Removed**: 0 - -**Deployment Notes**: -- Frontend requires rebuild to include new inline editing UI -- Backend API endpoints already exist from TG-04 implementation -- No database migrations required -- Feature is backward compatible with existing glossary functionality diff --git a/.copilot-tracking/changes/20260123-tristate-tag-filtering-changes.md b/.copilot-tracking/changes/20260123-tristate-tag-filtering-changes.md deleted file mode 100644 index bbc2235..0000000 --- a/.copilot-tracking/changes/20260123-tristate-tag-filtering-changes.md +++ /dev/null @@ -1,61 +0,0 @@ - -# Release Changes: Tri-State Tag Filtering - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 2: Tag Filtering Enhancement) -**Implementation Date**: 2026-01-23 - -## Summary - -Implemented tri-state tag filtering with include/exclude/neutral states, enabling users to filter items by tags they want to include AND tags they want to exclude. This replaces the binary include-only filtering with a more powerful tri-state system. - -## Changes - -### Added - -* `backend/app/adapters/repos/base.py` - Added `exclude_tags` parameter to repository base interface -* `backend/app/api/v1/ground_truths.py` - Added `exclude_tags` query parameter support to GET /api/v1/ground-truths endpoint - -### Modified - -* `backend/app/adapters/repos/cosmos_repo.py` - Implemented exclude tag filtering in both SQL and in-memory paths -* `backend/tests/unit/test_cosmos_repo.py` - Added tests for exclude tag filtering -* `frontend/src/types/filters.ts` - Changed `TagFilter` type from `string[]` to `{ include: string[], exclude: string[] }` -* `frontend/src/utils/filterUrlParams.ts` - Updated URL parsing/serialization to support tri-state tag structure with `excludeTags` parameter -* `frontend/src/components/app/QuestionsExplorer.tsx` - Implemented tri-state toggle UI (Include/Exclude/Neutral) for tag filtering -* `frontend/src/services/groundTruths.ts` - Updated API service to pass exclude_tags parameter -* `frontend/tests/unit/utils/filterUrlParams.test.ts` - Updated tests to reflect new tri-state tag structure - -### Removed - -* None - -## Technical Implementation - -### Backend Changes - -1. **Repository Layer**: Added `exclude_tags` parameter to base repository interface and Cosmos implementation -2. **Query Logic**: - - SQL path: Uses `NOT (ARRAY_CONTAINS(c.manualTags, @excludeTag) OR ARRAY_CONTAINS(c.computedTags, @excludeTag))` clauses - - In-memory path: Filters out items with ANY excluded tag using set intersection -3. **API Layer**: Added `exclude_tags` query parameter (comma-separated list) to ground truths list endpoint - -### Frontend Changes - -1. **Type System**: Changed `TagFilter` from simple string array to object with `include` and `exclude` arrays -2. **URL State**: Added `excludeTags` URL parameter for bookmarkable exclude filters -3. **UI**: Implemented tri-state toggle buttons showing Include (green) / Exclude (red) / Neutral (gray) states -4. **Behavior**: Clicking cycles through states: Neutral → Include → Exclude → Neutral - -## Testing - -- **Backend**: All 236 unit tests passing, including new exclude tag filter tests -- **Frontend**: All 226 tests passing, updated filter URL parsing tests for tri-state structure -- **Integration**: Verified exclude tags work with keyword search, status filters, and URL persistence - -## Release Summary - -**Files Changed**: 9 files (8 modified, 1 test file) -**Lines Changed**: ~200 lines added/modified -**Test Coverage**: Comprehensive unit tests for both backend and frontend - -This feature enables more precise filtering in the explorer view, allowing users to find items that have certain tags while excluding items with other tags. The tri-state UI provides clear visual feedback and the URL persistence makes filtered views bookmarkable. diff --git a/.copilot-tracking/changes/20260123-type-checker-fixes-changes.md b/.copilot-tracking/changes/20260123-type-checker-fixes-changes.md deleted file mode 100644 index b5db47e..0000000 --- a/.copilot-tracking/changes/20260123-type-checker-fixes-changes.md +++ /dev/null @@ -1,70 +0,0 @@ - -# Release Changes: Type Checker Error Fixes - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 3: Technical Debt & Code Quality) -**Implementation Date**: 2026-01-23 - -## Summary - -Resolved type checker errors discovered when running `uv run ty check app/` on the backend codebase. Fixed incorrect method call signatures and added proper type ignore annotations for Pydantic v2 aliasing patterns. - -## Changes - -### Modified - -* `backend/app/adapters/repos/cosmos_repo.py` - Fixed `_build_query_filter` method calls: - - Line 1083-1089: Added missing `exclude_tags` parameter (None) in emulator path - - Line 1115-1121: Added missing `exclude_tags` parameter (None) in stats count path - - Line 1150-1156: Added missing `exclude_tags` parameter (None) in base count path -* `backend/app/api/v1/ground_truths.py` - Fixed duplicate detection query: - - Line 202: Changed `status=[GroundTruthStatus.approved]` to `status=GroundTruthStatus.approved` (method expects single value, not list) - - Line 231-239: Added `# type: ignore[call-arg,misc]` for Pydantic v2 aliasing (populate_by_name pattern) -* `backend/app/core/rate_limiter.py` - Added type ignore for FastAPI exception handler: - - Line 69: Added `# type: ignore[arg-type]` for exception handler signature (type checker limitation with FastAPI's ExceptionHandler type) -* `backend/app/services/duplicate_detection_service.py` - Added type ignore for Pydantic aliasing: - - Line 114: Added `# type: ignore[call-arg]` for DuplicateWarning constructor (populate_by_name pattern) -* `.copilot-tracking/changes/20260123-type-checker-fixes-changes.md` - This change log file - -### Removed - -None - -### Added - -None - -## Implementation Details - -### Type Checker Errors Fixed - -1. **Missing `exclude_tags` parameter**: The `_build_query_filter` method signature was updated in a previous commit to include `exclude_tags` parameter, but three call sites were not updated. All three paths (emulator, stats count, base count) pass `None` for this parameter as tag exclusion is not needed in those contexts. - -2. **List vs single value for status**: The `list_gt_paginated` method accepts a single `GroundTruthStatus` value, not a list. Fixed the duplicate detection query to pass the enum directly. - -3. **Pydantic v2 aliasing**: Pydantic v2's `populate_by_name=True` configuration allows using both Python field names and aliases in constructors. However, the type checker doesn't understand this pattern, so we added targeted `type: ignore` comments. This is a known limitation and the runtime behavior is correct. - -4. **FastAPI exception handler signature**: FastAPI's `add_exception_handler` has complex union types that the type checker interprets strictly. Added `type: ignore[arg-type]` as the runtime signature is correct but doesn't match the type checker's expectations. - -### Testing Results - -- All 267 backend unit tests pass -- All 237 frontend tests pass -- `uv run ty check app/` shows 0 errors (only warnings about Pydantic aliasing) - -## Release Summary - -**Files Modified**: 4 -**Files Added**: 1 -**Files Removed**: 0 - -**Deployment Notes**: -- No functional changes, only type annotations -- No database migrations required -- No API changes -- Changes improve type safety and catch potential bugs during development - -## Learnings - -- Always run `uv run ty check app/` after making changes to catch type errors early -- Pydantic v2's `populate_by_name=True` requires `type: ignore` comments for the type checker -- The `_build_query_filter` method signature should be checked at all call sites when modifying parameters diff --git a/.copilot-tracking/changes/20260123-xss-sanitization-changes.md b/.copilot-tracking/changes/20260123-xss-sanitization-changes.md deleted file mode 100644 index 1865d0e..0000000 --- a/.copilot-tracking/changes/20260123-xss-sanitization-changes.md +++ /dev/null @@ -1,49 +0,0 @@ - -# Release Changes: XSS Sanitization - -**Related Plan**: IMPLEMENTATION_PLAN.md (Priority 0 - Security) -**Implementation Date**: 2026-01-23 - -## Summary - -Extracted URL validation logic to a shared utility module and applied it consistently across all reference URL handlers in the frontend. This ensures that malicious URL schemes (javascript:, data:, vbscript:, etc.) are blocked before opening, protecting users from XSS attacks even if backend data is compromised. Also updated all external link rel attributes to use "noopener noreferrer" for complete protection against tabnapping attacks. - -## Changes - -### Added - -* `frontend/src/utils/urlValidation.ts` - New shared utility module containing `validateReferenceUrl` function that blocks unsafe URL protocols and malicious patterns - -### Modified - -* `frontend/src/components/modals/InspectItemModal.tsx` - Replaced inline `validateReferenceUrl` function with import from shared utility -* `frontend/src/demo.tsx` - Added URL validation to `onOpenRef` function before opening references; added error toast for invalid URLs -* `frontend/src/components/app/editor/TurnReferencesModal.tsx` - Updated 2 anchor tags to use `rel="noopener noreferrer"` instead of just `rel="noreferrer"` -* `frontend/src/components/app/ReferencesPanel/SelectedTab.tsx` - Updated anchor tag to use `rel="noopener noreferrer"` -* `frontend/src/components/app/InstructionsPane.tsx` - Updated anchor tag to use `rel="noopener noreferrer"` -* `frontend/src/components/common/MarkdownRenderer.tsx` - Updated anchor tags in both `mdComponents` and `compactComponents` to use `rel="noopener noreferrer"` -* `IMPLEMENTATION_PLAN.md` - Marked XSS Sanitization task as ✅ IMPLEMENTED with implementation details - -### Removed - -* None - -## Release Summary - -**Total files affected**: 8 files (1 added, 7 modified) - -**Security Impact**: -- All reference URL handlers now validate URLs before opening -- Blocked schemes: javascript:, data:, vbscript:, about:, blob: -- All external links now protected against tabnapping with "noopener noreferrer" -- User-facing error message when attempting to open invalid/unsafe URLs - -**Testing**: -- All 195 frontend unit tests passing -- TypeScript compilation successful -- No breaking changes to existing functionality - -**Deployment Notes**: -- No database migrations required -- No configuration changes required -- Frontend changes only - no backend impact diff --git a/.copilot-tracking/details/20260116-export-pipeline-design-details.md b/.copilot-tracking/details/20260116-export-pipeline-design-details.md deleted file mode 100644 index e9ae0b4..0000000 --- a/.copilot-tracking/details/20260116-export-pipeline-design-details.md +++ /dev/null @@ -1,353 +0,0 @@ ---- -description: Implementation details for export pipeline design in Ground Truth Curator -ms.date: 2026-01-16 ---- - - -# Task Details: Export Pipeline Design - -## Research Reference - -Source research: `.copilot-tracking/research/20260116-export-pipeline-design-research.md` - -## Phase 1: Confirm export requirements and compatibility targets - -### Task 1.1: Document the current export behavior baseline - -Capture the observable behaviors that must remain stable: - -- `POST /v1/ground-truths/snapshot` writes per-item JSON artifacts plus `manifest.json` under `exports/snapshots/{ts}/` -- `GET /v1/ground-truths/snapshot` returns a downloadable attachment with `Content-Disposition` -- Frontend download behavior depends on `Content-Disposition` parsing - -Files: - -- `backend/app/services/snapshot_service.py` -- `backend/app/api/v1/ground_truths.py` -- `frontend/src/services/groundTruths.ts` - -Success: - -- A short “baseline contract” section exists in the design notes (what stays stable, what can change) -- The plan identifies what the new pipeline must not break - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 31-70) - Verified baseline behaviors for snapshot write/download and frontend expectations - -Dependencies: - -- None - -### Task 1.2: Decide the v1 export pipeline API surface - -Choose a minimal, forward-compatible API for pipeline-based exports. - -Recommended options: - -- Option A: `GET /v1/exports/ground-truths` with query params for filters and format selection -- Option B: `POST /v1/exports/ground-truths` with a request body describing filters and options - -Decide and document: - -- Supported formats for the initial milestone (at least JSON) -- Supported filters for the initial milestone (dataset, status; tags optional) -- Whether exports always operate on approved items or can be generalized - -Files: - -- New router file (proposed): `backend/app/api/v1/exports.py` -- Existing snapshot routes: `backend/app/api/v1/ground_truths.py` - -Success: - -- The plan includes a clear route definition and request/response shape -- Backward compatibility for snapshot endpoints is explicitly preserved - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 154-160) - API surface recommendations for pipeline-based exports - -Dependencies: - -- Task 1.1 completion - -## Phase 2: Define the export pipeline abstractions (processors, formatters, registry) - -### Task 2.1: Specify processor and formatter interfaces - -Define interfaces aligned to the existing design in `docs/computed-tags-design.md`: - -- Export processors: `List[dict] -> List[dict]` -- Export formatters: `List[dict] -> bytes | str` - -Document required properties: - -- Stable `name`/`format_name` identifiers -- Deterministic behavior requirements for tests -- Error handling conventions (raise vs collect errors) - -Files: - -- Proposed new module(s): - - `backend/app/exports/processors/base.py` - - `backend/app/exports/formatters/base.py` - -Success: - -- Interfaces are defined in the plan with method signatures and naming rules -- The registry approach (discover/register) is specified - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 138-153) - Proposed pipeline architecture and core concept definitions -- `docs/computed-tags-design.md` (Export pipeline architecture section) - -Dependencies: - -- Task 1.2 completion - -### Task 2.2: Specify registries and configuration strategy - -Define: - -- `ExportProcessorRegistry` to register processors and prevent name collisions -- `ExportFormatterRegistry` to register formatters and resolve requested formats -- Configuration via environment variable(s), e.g. `GTC_EXPORT_PROCESSOR_ORDER="merge_tags,anonymize"` - -Decide: - -- How unknown processor/formatter names fail (400 with clear error) -- Defaults when env vars are empty or unset - -Files: - -- Proposed new module(s): - - `backend/app/exports/registry.py` - - `backend/app/core/config.py` (new settings fields) - -Success: - -- Plan documents exact env var names, defaults, and validation rules -- Plan specifies how registries are wired in `backend/app/container.py` - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 97-105) - Existing docs mention processor ordering via env var -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 183-191) - Naming conventions and determinism guidance for registry/config behavior - -Dependencies: - -- Task 2.1 completion - -## Phase 3: Define the execution flow (load, process, format, deliver) - -### Task 3.1: Specify export execution orchestration - -Design an `ExportService` (or extend `SnapshotService`) that: - -1. Loads items from `GroundTruthRepo` using the selected filters -2. Converts items to export records (`model_dump(..., by_alias=True)`) -3. Applies the configured processor chain -4. Formats output using the selected formatter -5. Delivers output either as: - - A generated artifact (FileResponse), or - - An in-memory attachment (JSONResponse), or - - A streaming response for large payloads - -Files: - -- Proposed new service: `backend/app/services/export_service.py` -- Existing snapshot service: `backend/app/services/snapshot_service.py` - -Success: - -- The plan contains a step-by-step flow diagram (text is sufficient) -- The plan specifies where `Content-Disposition` filename is generated - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 161-176) - Streaming and file download guidance (FileResponse/StreamingResponse) - -Dependencies: - -- Phase 2 completion - -### Task 3.2: Define initial processors and formatters - -Initial candidates (minimum viable set): - -- Processor: `merge_tags` (construct `tags = unique(manualTags + computedTags)` and/or enforce export union contract) -- Formatter: `json_snapshot_payload` (preserve current `GET /snapshot` payload shape) -- Formatter: `json_items` (export list of items only) - -Document: - -- Exact JSON shapes -- How schemaVersion is set -- Whether manifest is included and what fields it contains - -Files: - -- Proposed new modules under `backend/app/exports/` - -Success: - -- The plan has JSON examples for each formatter output -- The plan explicitly preserves current snapshot payload keys used by the frontend - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 118-124) - Computed tags/export compatibility considerations -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 146-152) - Export record/processor/formatter contract - -Dependencies: - -- Task 3.1 completion - -## Phase 4: Storage targets (multi-backend) and artifact strategy - -### Task 4.1: Define a multi-backend export storage interface - -Define an `ExportStorage` (or similarly named) abstraction that the pipeline-based export endpoint will use to write artifacts. - -Design goals: - -- Support multiple backends behind a stable interface. -- Make Azure Blob the initial concrete implementation. -- Optionally keep a local filesystem implementation for dev/test. - -Decide whether: - -- `SnapshotStorage` is generalized to `ExportStorage` and snapshot uses it, or -- Snapshot remains its own service, while the new export pipeline uses a separate storage abstraction - -If generalized, define: - -- Methods required beyond `write_json` (e.g., `write_bytes`, `open_read`, `list_prefix`) -- A minimal “artifact key” strategy (prefix + timestamp + logical filename) - -Files: - -- `backend/app/adapters/storage/base.py` -- `backend/app/adapters/storage/local_fs.py` - -Success: - -- The plan includes a clear abstraction boundary and migration steps -- The plan identifies the minimal method set required for Blob and local FS - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 72-81) - Existing storage adapter building blocks and current bypass -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 177-181) - Evolution plan for integrating generalized storage (Blob-first) - -Dependencies: - -- Phase 3 completion - -### Task 4.2: Specify Azure Blob configuration and authentication strategy - -Document settings required for Blob-first storage: - -- Container name -- Storage account URL (or connection string, if preferred) -- Authentication approach: - - Recommended: Managed Identity / `DefaultAzureCredential` via `azure-identity` - - Alternative: connection string via environment variable - -Also document required dependency changes: - -- Add `azure-storage-blob` to backend runtime dependencies - -Files: - -- `backend/app/core/config.py` (new settings fields) -- `backend/pyproject.toml` (dependency addition) - -Success: - -- Plan lists exact env var names and the auth priority order -- Plan notes settings strictness (`extra="forbid"`) and the need to add fields explicitly - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 83-96) - Verified Blob readiness gaps + existing `azure-identity` dependency -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 132-136) - Additional gaps for Blob-first implementation - -Dependencies: - -- Task 4.1 completion - -### Task 4.3: Define delivery strategy for Blob-hosted artifacts - -Decide what the export endpoint returns when using Blob storage: - -- Option A: Backend streams content (downloads from Blob and proxies to client) while preserving `Content-Disposition` -- Option B: Backend returns a short-lived SAS URL (client downloads directly) -- Option C: Backend returns an export “job id” and a separate download endpoint - -Document: - -- Security expectations (who can access the artifact, TTL, auditing) -- Frontend changes required (if any) based on chosen option - -Files: - -- Proposed router file: `backend/app/api/v1/exports.py` - -Success: - -- Plan selects one option for the initial milestone and documents the rationale -- Plan preserves existing snapshot download behavior - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 62-70) - Frontend depends on `Content-Disposition` filename parsing -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 161-176) - FileResponse/StreamingResponse guidance - -Dependencies: - -- Task 4.2 completion - -## Phase 5: Testing, observability, and rollout - -### Task 5.1: Add test strategy for pipeline configuration and outputs - -Plan tests covering: - -- Registry duplicate name protections -- Processor order configuration parsing -- JSON output shape compatibility with existing snapshot download tests -- Large payload path choice (artifact vs streaming) is at least unit-tested via a small fake dataset - -Files: - -- `backend/tests/unit/` (new tests) -- Existing snapshot tests for reference: - - `backend/tests/unit/test_snapshot_service.py` - - `backend/tests/integration/ground_truths/test_snapshot_download_endpoint.py` - -Success: - -- Tests are identified by file and target behavior -- The plan includes a rollout step that does not break existing endpoints - -Research references: - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` (Lines 188-191) - Determinism guidance for stable tests - -Dependencies: - -- Phase 4 completion - -## Dependencies - -- Backend: FastAPI + Pydantic v2 (already in repo) -- Storage: multi-backend interface with Azure Blob as initial concrete implementation (local filesystem optional for dev/test) - -## Success Criteria - -- A clear design exists for processors/formatters/registries, aligned to existing snapshot behavior -- Backward compatibility for snapshot endpoints is preserved -- The plan includes a minimal initial implementation slice (JSON export) and a growth path (additional formats and storage targets) diff --git a/.copilot-tracking/details/20260116-export-pipeline-implementation-details.md b/.copilot-tracking/details/20260116-export-pipeline-implementation-details.md deleted file mode 100644 index 879a42c..0000000 --- a/.copilot-tracking/details/20260116-export-pipeline-implementation-details.md +++ /dev/null @@ -1,304 +0,0 @@ ---- -description: Implementation details for building the export pipeline into the backend codebase -ms.date: 2026-01-16 ---- - - -# Task Details: Export Pipeline Implementation - -## Research Reference - -**Source Research**: .copilot-tracking/research/20260116-export-pipeline-implementation-research.md - -## Phase 1: Lock down compatibility contract - -### Task 1.1: Confirm snapshot endpoint contracts (write + download) - -Ensure the implementation preserves the behaviors that are already tested and used by the frontend: - -- `POST /v1/ground-truths/snapshot` continues to write per-item JSON artifacts plus `manifest.json` under `exports/snapshots/{ts}/` and returns JSON with `snapshotDir`, `count`, and `manifestPath`. -- `GET /v1/ground-truths/snapshot` continues to return an `application/json` attachment with `Content-Disposition` containing a filename and stable payload keys. - -* **Files**: - - backend/app/api/v1/ground_truths.py - - backend/app/services/snapshot_service.py - - backend/tests/integration/test_snapshot_artifacts_cosmos.py - - backend/tests/integration/ground_truths/test_snapshot_download_endpoint.py - - frontend/src/services/groundTruths.ts - -* **Success**: - - Existing snapshot unit + integration tests remain the baseline acceptance gate. - - Frontend download behavior remains unchanged (filename derived from header). - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 27-52) - Verified snapshot behaviors and frontend coupling - -* **Dependencies**: - - None - -### Task 1.2: Define compatibility-safe defaults for pipeline adoption - -Decide how the pipeline will be introduced without changing existing behavior: - -- Treat an omitted/empty request body for `POST /v1/ground-truths/snapshot` as the legacy behavior (artifact write + manifest). -- Use the new pipeline request model only when request fields are provided. -- Keep the `GET /v1/ground-truths/snapshot` behavior stable, but allow its internal implementation to be pipeline-driven. - -* **Files**: - - docs/computed-tags-design.md (Section 4.4) - - backend/app/api/v1/ground_truths.py - -* **Success**: - - A clear decision is written into the implementation as code-level defaults. - - No existing callers must change. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 85-108) - Design requirements and compatibility rule - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 158-160) - Compatibility traps - -* **Dependencies**: - - Task 1.1 completion - -## Phase 2: Build pipeline core (registries + request models) - -### Task 2.1: Add export pipeline request/option models - -Implement Pydantic models for the pipeline request body (v1) and internal options, aligned to the design: - -- `format` (initial: `json_snapshot_payload`, `json_items`) -- `filters` (initial: `datasetNames`, `status` with default `approved`) -- `processors` (optional override list) -- `delivery.mode` (initial support: `attachment`, `artifact`, `stream`) - -* **Files**: - - backend/app/exports/models.py (new) - -* **Success**: - - Request validation errors map to 400 with clear messages for unknown formats/processors. - - Defaults preserve legacy snapshot behavior when request is missing. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 85-108) - Interfaces, config, and compatibility requirements - -* **Dependencies**: - - Phase 1 completion - -### Task 2.2: Implement processor and formatter registries - -Create registries consistent with repo patterns (like computed-tags registry), supporting: - -- register-by-name with duplicate rejection -- resolve-by-name with clear error messages -- resolve ordered processor chain from: - - request override - - or `GTC_EXPORT_PROCESSOR_ORDER` default - -* **Files**: - - backend/app/exports/registry.py (new) - - backend/app/core/config.py (add `EXPORT_PROCESSOR_ORDER` setting) - -* **Success**: - - Registry unit tests cover duplicate registration and missing name resolution. - - Environment parsing is deterministic and whitespace-tolerant. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 71-83) - Existing registry patterns - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 96-98) - `GTC_EXPORT_PROCESSOR_ORDER` - -* **Dependencies**: - - Task 2.1 completion - -## Phase 3: Implement initial processors and formatters - -### Task 3.1: Implement processor `merge_tags` - -Add a processor that derives a `tags` field as the unique union of `manualTags` and `computedTags`. - -- Input/output records must remain JSON-serializable dictionaries. -- Preserve `manualTags` and `computedTags` as-is. - -* **Files**: - - backend/app/exports/processors/merge_tags.py (new) - -* **Success**: - - Unit tests verify order stability (e.g., sorted output) and correct union behavior. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 98-101) - Initial pipeline features - -* **Dependencies**: - - Phase 2 completion - -### Task 3.2: Implement formatters `json_items` and `json_snapshot_payload` - -Add formatters: - -- `json_items`: returns a JSON array of export records -- `json_snapshot_payload`: returns the stable snapshot payload envelope (`schemaVersion`, `snapshotAt`, `datasetNames`, `count`, `filters`, `items`) - -* **Files**: - - backend/app/exports/formatters/json_items.py (new) - - backend/app/exports/formatters/json_snapshot_payload.py (new) - - backend/app/services/snapshot_service.py (delegate payload assembly as needed) - -* **Success**: - - Formatter outputs match existing snapshot payload expectations. - - Tests compare parsed JSON objects (not raw strings) for stability. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 31-45) - Current payload keys and tests - -* **Dependencies**: - - Task 3.1 completion - -## Phase 4: Storage backends and delivery modes - -### Task 4.1: Define and implement an export storage interface - -Create an export storage protocol matching the design: - -- `write_json(key: str, obj: dict) -> None` -- `write_bytes(key: str, data: bytes, content_type: str) -> None` -- `open_read(key: str)` (for streaming reads) -- `list_prefix(prefix: str)` (optional, for artifact discovery) - -* **Files**: - - backend/app/exports/storage/base.py (new) - - backend/app/exports/storage/local.py (new) - -* **Success**: - - Local storage implementation supports the existing snapshot directory layout. - - Storage key layout follows `exports/snapshots/{timestamp}/{filename}`. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 53-63) - Existing storage abstraction and current bypass - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 101-105) - Storage interface and delivery modes - -* **Dependencies**: - - Phase 3 completion - -### Task 4.2: Add Azure Blob storage backend - -Implement Blob storage backend using managed identity (`DefaultAzureCredential`) and the Azure Blob SDK. - -- Add dependency: `azure-storage-blob` -- Add explicit settings to `Settings` (due to `extra="forbid"`): - - `EXPORT_STORAGE_BACKEND` (`local|blob`) - - `EXPORT_BLOB_ACCOUNT_URL` - - `EXPORT_BLOB_CONTAINER` - -* **Files**: - - backend/pyproject.toml - - backend/app/core/config.py - - backend/app/exports/storage/blob.py (new) - -* **Success**: - - Blob backend can write and read artifacts using async client (`azure.storage.blob.aio`). - - Settings validation fails fast with clear errors when backend is `blob` but configuration is missing. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 64-70) - Dependency/config constraints - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 142-155) - SDK and local dev operational notes - -* **Dependencies**: - - Task 4.1 completion - -### Task 4.3: Implement delivery modes (attachment/artifact/stream) - -Implement delivery behavior in the pipeline service: - -- `attachment`: return JSON payload bytes with `Content-Disposition` filename -- `artifact`: write artifacts + `manifest.json` and return legacy `snapshotDir`/`manifestPath` response -- `stream`: return a `StreamingResponse` over bytes (for large payloads or Blob reads) - -* **Files**: - - backend/app/exports/pipeline.py (new) - - backend/app/api/v1/ground_truths.py - -* **Success**: - - `GET /v1/ground-truths/snapshot` preserves current attachment semantics. - - `POST /v1/ground-truths/snapshot` preserves current artifact-write behavior by default. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 133-140) - FastAPI response patterns - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 158-160) - Compatibility traps - -* **Dependencies**: - - Task 4.2 completion - -## Phase 5: Wire into container, API, and tests - -### Task 5.1: Wire registries, storage, and pipeline via container - -Add pipeline service wiring in the singleton container so routers and services can depend on it. - -* **Files**: - - backend/app/container.py - -* **Success**: - - Pipeline dependencies are constructed once per app lifecycle (consistent with other services). - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 123-129) - Router/service integration expectations - -* **Dependencies**: - - Phase 4 completion - -### Task 5.2: Update snapshot service and routes to use pipeline internally - -Implement delegation so: - -- `SnapshotService.build_snapshot_payload()` and `export_json()` route through the pipeline logic. -- API routes remain compatible but gain pipeline support when request parameters are provided. - -* **Files**: - - backend/app/services/snapshot_service.py - - backend/app/api/v1/ground_truths.py - -* **Success**: - - Existing snapshot tests pass without modification. - - Pipeline unit tests pass. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 31-45) - Existing endpoint contracts - -* **Dependencies**: - - Task 5.1 completion - -### Task 5.3: Add new unit tests for pipeline components - -Add tests for: - -- registry behaviors (duplicate and missing names) -- `merge_tags` processor correctness -- formatter outputs -- delivery mode selection (unit-test level) - -* **Files**: - - backend/tests/unit/test_export_registry.py (new) - - backend/tests/unit/test_export_pipeline.py (new) - -* **Success**: - - New unit tests provide fast validation of pipeline semantics. - - Existing integration snapshot tests continue to pass. - -* **Research References**: - - .copilot-tracking/research/20260116-export-pipeline-implementation-research.md (Lines 162-166) - Suggested verification approach - -* **Dependencies**: - - Task 5.2 completion - -## Dependencies - -- Python 3.11 (repo requirement) -- FastAPI + Starlette responses already in use -- Azure SDK dependencies: - - `azure-identity` (already present) - - `azure-storage-blob` (to be added) - -## Success Criteria - -- Snapshot endpoints remain backward compatible (tests and frontend behavior) -- Pipeline components exist (registries, processor, formatter, delivery) -- Local storage works as today; Blob storage works behind a feature flag -- Tests cover pipeline logic and existing snapshot tests continue passing diff --git a/.copilot-tracking/details/20260116-manual-tags-design-details.md b/.copilot-tracking/details/20260116-manual-tags-design-details.md deleted file mode 100644 index 2246817..0000000 --- a/.copilot-tracking/details/20260116-manual-tags-design-details.md +++ /dev/null @@ -1,196 +0,0 @@ ---- -title: Manual Tags Design Details -description: Detailed specifications and execution notes for manual tags design work -ms.date: 2026-01-16 ---- - - -# Task Details: Manual Tags Design - -## Research Reference - -* Source research: `.copilot-tracking/research/20260116-manual-tags-design-research.md` -* Design context: `docs/computed-tags-design.md` -* Existing tagging constraints: `backend/docs/tagging_plan.md` - -## Phase 1: Confirm requirements and align policy - -### Task 1.1: Decide manual tag validation mode(s) (MVP) - -MVP requirement (confirmed): enforce **mutual exclusivity within a tag group**, and make that enforcement **configurable (true/false)**. - -Define the desired validation policy for manual tags across these write paths: - -* Interactive edits (`PUT /v1/ground-truths/...`) -* Assignment updates (`PUT /v1/assignments/...`) -* Bulk import validation - -Current state is already exclusivity-aware for **known** groups via `TAG_SCHEMA` + `ExclusiveGroupRule`, but enforcement is not currently configurable. - -* Model validation is relaxed (unknown groups/values allowed). -* Bulk import uses a strict allow-set from the global tag registry. - -Specify the exclusivity enforcement semantics: - -* Scope of the toggle: - * Global toggle (recommended MVP): enable/disable enforcement of `exclusive=True` groups - * Optional later: per-group overrides (in `TAG_SCHEMA` or config) -* Default: - * Recommended: `true` (keep current correctness; allow disabling in dev/experiments) -* Contract with the frontend: - * Decide whether `/v1/tags/schema` should continue to report per-group `exclusive` even when enforcement is disabled server-side. - * Recommended: keep reporting `exclusive` so the UI can still guide the user, while backend can be relaxed if needed. - -* Files: - * `backend/app/domain/validators.py` (model-level coercion + validation) - * `backend/app/services/validation_service.py` (bulk import strict validation) - * `backend/app/api/v1/ground_truths.py` and `backend/app/api/v1/assignments.py` (write paths) -* Success: - * A single documented policy for exclusivity enforcement (enabled/disabled) - * A clear statement whether the backend or frontend is authoritative when the toggle is off -* Research references: - * `.copilot-tracking/research/20260116-manual-tags-design-research.md` (Current behavior summary + decision points) - -### Task 1.2: Decide the source of truth for “allowed manual tags” (optional / follow-up) - -This is useful, but not required for the exclusivity MVP. Capture the intended direction so future work is scoped. - -Choose the authoritative source for the tag picker (manual tags list): - -* Static config (`GTC_ALLOWED_MANUAL_TAGS`) -* Global registry (Cosmos-backed list) -* A combined approach (seed from schema, allow extensions via registry) - -Also decide whether the source must vary by dataset/tenant in the future. - -* Files: - * `backend/app/api/v1/tags.py` (current: allowlist overrides registry) - * `backend/app/core/config.py` (existing settings surface) - * `backend/app/adapters/repos/tags_repo.py` (Cosmos shape for global registry) -* Success: - * A single selection mechanism that the backend implements and the frontend can rely on - * Clear behavior when `GTC_ALLOWED_MANUAL_TAGS` is set - -## Phase 2: Implement configurable exclusivity (backend) - -### Task 2.1: Add a config flag to enable/disable exclusivity enforcement - -Add a settings flag (e.g., `GTC_TAGS_ENFORCE_EXCLUSIVITY: bool`) that controls whether the backend enforces mutual exclusivity for groups marked `exclusive=True`. - -Implementation approach options: - -* Toggle rule execution: - * Build `RULES` dynamically at runtime based on settings - * Or: keep `RULES` constant but gate `ExclusiveGroupRule.check()` behind the flag - -Ensure the flag is applied consistently anywhere `validate_tags()` is used for manual tags. - -* Files: - * `backend/app/core/config.py` (new setting) - * `backend/app/domain/tags.py` (rule wiring / schema) - * `backend/app/services/tagging_service.py` (validation flow) - * `backend/app/domain/validators.py` (model validation uses `validate_tags()`) - * `backend/app/api/v1/ground_truths.py` and `backend/app/api/v1/assignments.py` (write path validation behavior) -* Success: - * When enabled: multiple tags from an exclusive group are rejected everywhere manual tags are accepted - * When disabled: multiple tags from an exclusive group are accepted (still requiring canonical `group:value` format) - -### Task 2.2: Document and expose the exclusivity flag (if needed) - -Decide whether the flag should be: - -* Backend-only (env setting), or -* Also exposed to the frontend via `/v1/config` so the UI can mirror server policy. - -* Files: - * `backend/app/core/config.py` - * `backend/app/api/v1/config.py` (if exposing to frontend) -* Success: - * The configured behavior is discoverable and documented - -## Phase 3: Validation and normalization improvements - -### Task 3.1: Ensure exclusivity toggle does not affect computed-tags invariants - -Ensure the exclusivity flag only controls exclusivity checks, and does not regress: - -* Canonicalization (`group:value`, lowercase, etc.) -* Computed-tags stripping from manual tags (write path) - -* Files: - * `backend/app/services/tagging_service.py` - * `backend/app/services/validation_service.py` (bulk import) -* Success: - * Exclusivity can be toggled independently without breaking other tag guarantees - -### Task 3.2: Confirm computed tag stripping remains authoritative - -Ensure that manual tags cannot be used to persist computed tags: - -* Manual tags are cleaned via computed-tag registry matching before save -* Reject client writes to `computedTags` - -This should remain true for all write paths. - -* Files: - * `backend/app/services/tagging_service.py` - * `backend/app/api/v1/ground_truths.py` - * `backend/app/api/v1/assignments.py` -* Success: - * Tests cover attempts to submit computed tags in `manualTags` - -## Phase 4: API contracts and frontend expectations - -### Task 4.1: Confirm `/v1/tags/schema` and frontend behavior under the toggle - -When exclusivity enforcement is disabled server-side, decide whether the frontend should: - -* Still enforce exclusivity based on `/v1/tags/schema` (recommended default), or -* Also disable client-side exclusivity when the backend flag is off (requires exposing flag to frontend) - -* Files: - * `backend/app/api/v1/tags.py` (schema endpoint) - * `frontend/src/services/tags.ts` (exclusive-group validation) - * `backend/app/api/v1/config.py` (if exposing flag) -* Success: - * Frontend and backend behave consistently (or intentionally diverge, but documented) - -### Task 4.2: Confirm `/v1/tags` response contract (optional / follow-up) - -This is not required for the exclusivity MVP, but keep this as a follow-up if API payload cleanup is desired. - -* Files: - * `backend/app/api/v1/tags.py` - * `backend/app/domain/tags.py` - * `frontend/src/services/tags.ts` -* Success: - * No frontend breaking changes when new tags are introduced - -## Phase 5: Verification and documentation - -### Task 5.1: Add/adjust tests for exclusivity toggle - -Cover the chosen policy with tests: - -* Unit tests for exclusivity enabled vs disabled -* Integration tests for write paths (ground truths + assignments) when exclusivity is disabled -* Frontend test/behavior note: ensure UX does not unexpectedly diverge from backend - -* Files: - * `backend/tests/unit/` - * `backend/tests/integration/` -* Success: - * Tests clearly assert which flows allow unknown tags vs require registry membership - -### Task 5.2: Document configuration and operations - -Add docs describing: - -* How to configure allowed manual tags -* How tag registry should be managed in dev/test/prod -* How strict validation affects bulk import and interactive edits - -* Files: - * `backend/README.md` (or a new doc under `backend/docs/`) -* Success: - * A new team member can configure tags without reading code diff --git a/.copilot-tracking/plans/20260116-export-pipeline-design-plan.instructions.md b/.copilot-tracking/plans/20260116-export-pipeline-design-plan.instructions.md deleted file mode 100644 index 80c8c89..0000000 --- a/.copilot-tracking/plans/20260116-export-pipeline-design-plan.instructions.md +++ /dev/null @@ -1,97 +0,0 @@ ---- -applyTo: '.copilot-tracking/changes/20260116-export-pipeline-design-changes.md' -description: Task checklist for designing an export pipeline (processors/formatters) for Ground Truth Curator -ms.date: 2026-01-16 ---- - - -# Task Checklist: Export Pipeline Design - -## Overview - -Design a pluggable export pipeline (processors + formatters) that preserves current snapshot export behaviors while enabling additional formats and a multi-backend storage interface (Azure Blob as the initial concrete backend). - -Follow the repository workflow guidance in `AGENTS.md` (Jujutsu commit workflow) and keep a running record of work in `.copilot-tracking/changes/20260116-export-pipeline-design-changes.md` during implementation. - -## Objectives - -- Preserve the existing snapshot write and download behaviors while introducing an extensible pipeline for export transforms and formats -- Define processor/formatter/registry abstractions, configuration, and a minimal initial export slice (JSON) -- Define an export storage interface that supports multiple backends, with Azure Blob Storage as the first implementation target - -## Research Summary - -### Project files - -- `backend/app/services/snapshot_service.py` - current snapshot artifact writer and in-memory snapshot payload builder -- `backend/app/api/v1/ground_truths.py` - snapshot routes (write + downloadable attachment) -- `backend/app/adapters/storage/base.py` and `backend/app/adapters/storage/local_fs.py` - existing (currently underused) storage abstraction -- `frontend/src/services/groundTruths.ts` - frontend download expectations for `Content-Disposition` -- `docs/computed-tags-design.md` - proposes processor/formatter export pipeline architecture -- `docs/json-export-migration-plan.md` - documents JSON (not JSONL) export expectations - -### External references - -- `.copilot-tracking/research/20260116-export-pipeline-design-research.md` - verified repo findings and proposed pipeline shape -- FastAPI custom responses (StreamingResponse/FileResponse): https://fastapi.tiangolo.com/advanced/custom-response/ -- Azure Storage Blobs client library for Python (auth patterns, async clients): https://learn.microsoft.com/en-us/python/api/overview/azure/storage-blob-readme -- Azure Blob Storage Python quickstart (managed identity / DefaultAzureCredential): https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python - -### Standards references - -- `AGENTS.md` - repo workflow and expectations -- `backend/CODEBASE.md` - backend layering and conventions - -## Implementation Checklist - -### [ ] Phase 1: Requirements and compatibility - -- [ ] Task 1.1: Document the current export behavior baseline - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 15-40) - -- [ ] Task 1.2: Decide the v1 export pipeline API surface - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 42-73) - -### [ ] Phase 2: Pipeline abstractions - -- [ ] Task 2.1: Specify processor and formatter interfaces - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 77-108) - -- [ ] Task 2.2: Specify registries and configuration strategy - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 110-141) - -### [ ] Phase 3: Execution flow - -- [ ] Task 3.1: Specify export execution orchestration - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 145-174) - -- [ ] Task 3.2: Define initial processors and formatters - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 176-206) - -### [ ] Phase 4: Storage targets (multi-backend) - -- [ ] Task 4.1: Define a multi-backend export storage interface - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 210-247) - -- [ ] Task 4.2: Specify Azure Blob configuration and authentication strategy - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 249-280) - -- [ ] Task 4.3: Define delivery strategy for Blob-hosted artifacts - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 282-311) - -### [ ] Phase 5: Tests and rollout - -- [ ] Task 5.1: Add test strategy for pipeline configuration and outputs - - Details: `.copilot-tracking/details/20260116-export-pipeline-design-details.md` (Lines 315-342) - -## Dependencies - -- Python 3.11 + FastAPI + Pydantic v2 (already present) -- Existing GroundTruthRepo data access patterns and snapshot tests -- Azure Blob dependencies and configuration when implementing the Blob backend (e.g., `azure-storage-blob` + `azure-identity`) - -## Success Criteria - -- The export pipeline design is documented with clear interfaces, configuration, and a minimal initial format set (JSON) -- Snapshot endpoints remain backward compatible -- The design includes a clear path to add processors, new formats, and new storage targets without rewriting the core flow diff --git a/.copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md b/.copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md deleted file mode 100644 index 7c9b7ce..0000000 --- a/.copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md +++ /dev/null @@ -1,94 +0,0 @@ ---- -applyTo: '.copilot-tracking/changes/20260116-export-pipeline-implementation-changes.md' ---- - -# Task Checklist: Export Pipeline Implementation - -## Overview - -Implement the export pipeline architecture from the designs and wire it into the existing snapshot endpoints without breaking backward compatibility. - -Follow repository workflow guidance from #file:../../AGENTS.md - -## Objectives - -* Preserve existing snapshot endpoint behavior (routes, payload keys, and `Content-Disposition` semantics). -* Introduce export pipeline components (models, registries, processors, formatters, delivery modes, and storage backends) with unit test coverage. -* Add Blob storage support behind explicit settings and a backend selector, without impacting local defaults. - -## Research Summary - -### Project Files - -* .copilot-tracking/research/20260116-export-pipeline-implementation-research.md - Verified current snapshot contracts, coupling to frontend download behavior, and concrete code touchpoints. -* backend/app/api/v1/ground_truths.py - Snapshot endpoints that must remain backward compatible. -* backend/app/services/snapshot_service.py - Current snapshot artifact write and payload build behavior. -* backend/app/core/config.py - Settings strictness (`extra="forbid"`) that requires explicit new env vars. -* frontend/src/services/groundTruths.ts - Parses `Content-Disposition` to derive download filename. -* docs/computed-tags-design.md - Export pipeline architecture requirements. - -### External References - -* .copilot-tracking/research/20260116-export-pipeline-implementation-research.md - Captures SDK usage patterns and response behavior references. - -## Implementation Checklist - -### [x] Phase 1: Lock down compatibility contract - -* [x] Task 1.1: Confirm snapshot endpoint contracts (write + download) - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 15-37) - -* [x] Task 1.2: Define compatibility-safe defaults for pipeline adoption - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 39-60) - -### [x] Phase 2: Build pipeline core (registries + request models) - -* [x] Task 2.1: Add export pipeline request/option models - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 64-84) - -* [x] Task 2.2: Implement processor and formatter registries - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 86-109) - -### [x] Phase 3: Implement initial processors and formatters - -* [x] Task 3.1: Implement processor `merge_tags` - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 113-130) - -* [x] Task 3.2: Implement formatters `json_items` and `json_snapshot_payload` - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 132-152) - -### [x] Phase 4: Storage backends and delivery modes - -* [x] Task 4.1: Define and implement an export storage interface - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 156-178) - -* [x] Task 4.2: Add Azure Blob storage backend - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 180-204) - -* [x] Task 4.3: Implement delivery modes (attachment/artifact/stream) - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 206-227) - -### [x] Phase 5: Wire into container, API, and tests - -* [x] Task 5.1: Wire registries, storage, and pipeline via container - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 231-245) - -* [x] Task 5.2: Update snapshot service and routes to use pipeline internally - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 247-266) - -* [x] Task 5.3: Add new unit tests for pipeline components - * Details: .copilot-tracking/details/20260116-export-pipeline-implementation-details.md (Lines 268-289) - -## Dependencies - -* Backend Python environment (per backend/pyproject.toml) -* Azure SDK packages: - * `azure-identity` (already present) - * `azure-storage-blob` (to add for Blob backend) -* Local dev: Cosmos Emulator and existing integration test environment (as already used by repo) - -## Success Criteria - -* Existing snapshot unit/integration tests pass unchanged. -* Export pipeline core exists and is covered by new unit tests. -* Blob backend is selectable via explicit settings and does not impact default local behavior. diff --git a/.copilot-tracking/plans/20260116-manual-tags-design-plan.instructions.md b/.copilot-tracking/plans/20260116-manual-tags-design-plan.instructions.md deleted file mode 100644 index b0b1069..0000000 --- a/.copilot-tracking/plans/20260116-manual-tags-design-plan.instructions.md +++ /dev/null @@ -1,91 +0,0 @@ ---- -applyTo: '.copilot-tracking/changes/20260116-manual-tags-design-changes.md' ---- - -# Task Checklist: Manual Tags Design - -## Overview - -Define and implement a consistent, testable manual-tags design focused on configurable mutual exclusivity within tag groups, while keeping API contracts stable for the frontend. - -Follow all instructions from #file:../../.github/instructions/task-implementation.instructions.md -If that file is not present in this repository, follow `AGENTS.md` for the repo workflow and the workspace-wide instructions configured in VS Code. - -## Objectives - -* Define the manual tag validation policy for interactive writes vs bulk import -* Implement a backend configuration flag to enable/disable exclusivity enforcement for `exclusive=True` groups -* Ensure manual tags cannot collide with computed tags and that API contracts remain stable for the frontend - -## Research Summary - -### Project Files - -* `.copilot-tracking/research/20260116-manual-tags-design-research.md` - Verified current behavior, key files, and decision points -* `backend/app/api/v1/tags.py` - `/v1/tags` and `/v1/tags/schema` behavior and allowlist override -* `backend/app/services/tagging_service.py` - Tag normalization and validation helpers -* `backend/app/services/validation_service.py` - Bulk import strict validation with a registry-derived allow-set -* `frontend/src/services/tags.ts` - Frontend expectations for schema and tag list payloads -* `docs/computed-tags-design.md` - Overall tags split model and computed tag constraints -* `backend/docs/tagging_plan.md` - Original intent and constraints for tag schema/rules - -### External References - -* `.copilot-tracking/research/20260116-manual-tags-design-research.md` (Lines 160-169) - Reference links -* - Pydantic v2 field validator patterns -* - FastAPI response model behavior -* - Cosmos DB partitioning constraints relevant to the global tags container - -## Implementation Checklist - -### [ ] Phase 1: Confirm requirements and align policy - -* [ ] Task 1.1: Decide manual tag validation mode(s) (MVP) - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 18-53) - -* [ ] Task 1.2: Decide the source of truth for “allowed manual tags” (optional / follow-up) - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 54-73) - -### [ ] Phase 2: Implement configurable exclusivity (backend) - -* [ ] Task 2.1: Add a config flag to enable/disable exclusivity enforcement - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 76-97) - -* [ ] Task 2.2: Document and expose the exclusivity flag (if needed) - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 98-110) - -### [ ] Phase 3: Validation and normalization improvements - -* [ ] Task 3.1: Ensure exclusivity toggle does not affect computed-tags invariants - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 113-125) - -* [ ] Task 3.2: Confirm computed tag stripping remains authoritative - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 126-141) - -### [ ] Phase 4: API contracts and frontend expectations - -* [ ] Task 4.1: Confirm `/v1/tags/schema` and frontend behavior under the toggle - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 144-157) - -* [ ] Task 4.2: Confirm `/v1/tags` response contract (optional / follow-up) - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 158-168) - -### [ ] Phase 5: Verification and documentation - -* [ ] Task 5.1: Add/adjust tests for exclusivity toggle - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 171-184) - -* [ ] Task 5.2: Document configuration and operations - * Details: `.copilot-tracking/details/20260116-manual-tags-design-details.md` (Lines 185-196) - -## Dependencies - -* Python 3.11, `uv`, FastAPI, Pydantic v2 -* Azure Cosmos DB (or emulator) when validating tags registry persistence -* Frontend build toolchain (Vite) to verify tag-picker behavior end-to-end - -## Success Criteria - -* Manual tag policy is explicitly defined and implemented consistently across all write paths -* The `/v1/tags` and `/v1/tags/schema` contracts remain stable and match frontend expectations -* Provider selection is test-covered and computed-tag collisions are prevented at startup and on write diff --git a/.copilot-tracking/prompts/implement-export-pipeline-design.prompt.md b/.copilot-tracking/prompts/implement-export-pipeline-design.prompt.md deleted file mode 100644 index cb7bb1b..0000000 --- a/.copilot-tracking/prompts/implement-export-pipeline-design.prompt.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -description: Implementation prompt for executing the export pipeline design plan -ms.date: 2026-01-16 ---- - - -# Implementation Prompt: Export Pipeline Design - -## Implementation Instructions - -### Step 1: Create changes tracking file - -You WILL create `20260116-export-pipeline-design-changes.md` in `.copilot-tracking/changes/` if it does not exist. - -### Step 2: Execute implementation - -You WILL follow the repository workflow guidance in `AGENTS.md` (Jujutsu commit workflow). - -You WILL systematically implement `../plans/20260116-export-pipeline-design-plan.instructions.md` task-by-task. - -CRITICAL: If ${input:phaseStop:true} is true, you WILL stop after each Phase for user review. - -CRITICAL: If ${input:taskStop:false} is true, you WILL stop after each Task for user review. - -### Step 3: Cleanup - -When ALL Phases are checked off (`[x]`) and completed you WILL do the following: - -1. You WILL provide a markdown style link and a summary of all changes from #file:../changes/20260116-export-pipeline-design-changes.md to the user: - - You WILL keep the overall summary brief - - You WILL add spacing around any lists - - You MUST wrap any reference to a file in a markdown style link - -2. You WILL provide markdown style links to: - - `.copilot-tracking/plans/20260116-export-pipeline-design-plan.instructions.md` - - `.copilot-tracking/details/20260116-export-pipeline-design-details.md` - - `.copilot-tracking/research/20260116-export-pipeline-design-research.md` - -3. MANDATORY: You WILL attempt to delete `.copilot-tracking/prompts/implement-export-pipeline-design.prompt.md` - -## Success Criteria - -- [ ] Changes tracking file created -- [ ] All plan items implemented with working code -- [ ] All detailed specifications satisfied -- [ ] Snapshot endpoints remain backward compatible -- [ ] Changes file updated continuously diff --git a/.copilot-tracking/prompts/implement-export-pipeline-implementation.prompt.md b/.copilot-tracking/prompts/implement-export-pipeline-implementation.prompt.md deleted file mode 100644 index d6bc129..0000000 --- a/.copilot-tracking/prompts/implement-export-pipeline-implementation.prompt.md +++ /dev/null @@ -1,38 +0,0 @@ - -# Implementation Prompt: Export Pipeline Implementation - -## Implementation Instructions - -### Step 1: Create Changes Tracking File - -You WILL create `20260116-export-pipeline-implementation-changes.md` in `.copilot-tracking/changes/` if it does not exist. - -### Step 2: Execute Implementation - -You WILL follow repository workflow guidance from #file:../../AGENTS.md -You WILL systematically implement #file:../plans/20260116-export-pipeline-implementation-plan.instructions.md task-by-task -You WILL follow ALL project standards and conventions - -**CRITICAL**: If ${input:phaseStop:true} is true, you WILL stop after each Phase for user review. -**CRITICAL**: If ${input:taskStop:false} is true, you WILL stop after each Task for user review. - -### Step 3: Cleanup - -When ALL Phases are checked off (`[x]`) and completed you WILL do the following: - -1. You WILL provide a markdown style link and a summary of all changes from #file:../changes/20260116-export-pipeline-implementation-changes.md to the user: - * You WILL keep the overall summary brief - * You WILL add spacing around any lists - * You MUST wrap any reference to a file in a markdown style link - -2. You WILL provide markdown style links to .copilot-tracking/plans/20260116-export-pipeline-implementation-plan.instructions.md, .copilot-tracking/details/20260116-export-pipeline-implementation-details.md, and .copilot-tracking/research/20260116-export-pipeline-implementation-research.md documents. - -3. **MANDATORY**: You WILL attempt to delete .copilot-tracking/prompts/implement-export-pipeline-implementation.prompt.md - -## Success Criteria - -* [ ] Changes tracking file created -* [ ] All plan items implemented with working code -* [ ] All detailed specifications satisfied -* [ ] Project conventions followed -* [ ] Changes file updated continuously diff --git a/.copilot-tracking/prompts/implement-manual-tags-design.prompt.md b/.copilot-tracking/prompts/implement-manual-tags-design.prompt.md deleted file mode 100644 index b6e2902..0000000 --- a/.copilot-tracking/prompts/implement-manual-tags-design.prompt.md +++ /dev/null @@ -1,44 +0,0 @@ ---- -title: Implementation Prompt - Manual Tags Design -description: Execution prompt for implementing the manual tags design plan -ms.date: 2026-01-16 ---- - -# Implementation Prompt: Manual Tags Design - -## Implementation Instructions - -### Step 1: Create Changes Tracking File - -You WILL create `20260116-manual-tags-design-changes.md` in `.copilot-tracking/changes/` if it does not exist. - -### Step 2: Execute Implementation - -You WILL follow #file:../../.github/instructions/task-implementation.instructions.md -If that file is not present in this repository, you WILL follow `AGENTS.md` for the repo workflow and the workspace-wide instructions configured in VS Code. -You WILL systematically implement #file:../plans/20260116-manual-tags-design-plan.instructions.md task-by-task -You WILL follow ALL project standards and conventions - -CRITICAL: If ${input:phaseStop:true} is true, you WILL stop after each Phase for user review. -CRITICAL: If ${input:taskStop:false} is true, you WILL stop after each Task for user review. - -### Step 3: Cleanup - -When ALL Phases are checked off (`[x]`) and completed you WILL do the following: - -1. You WILL provide a markdown style link and a summary of all changes from #file:../changes/20260116-manual-tags-design-changes.md to the user: - * You WILL keep the overall summary brief - * You WILL add spacing around any lists - * You MUST wrap any reference to a file in a markdown style link - -2. You WILL provide markdown style links to `.copilot-tracking/plans/20260116-manual-tags-design-plan.instructions.md`, `.copilot-tracking/details/20260116-manual-tags-design-details.md`, and `.copilot-tracking/research/20260116-manual-tags-design-research.md` documents. You WILL recommend cleaning these files up as well. - -3. MANDATORY: You WILL attempt to delete `.copilot-tracking/prompts/implement-manual-tags-design.prompt.md` - -## Success Criteria - -* [ ] Changes tracking file created -* [ ] All plan items implemented with working code -* [ ] All detailed specifications satisfied -* [ ] Project conventions followed -* [ ] Changes file updated continuously diff --git a/.copilot-tracking/research/20260116-export-pipeline-design-research.md b/.copilot-tracking/research/20260116-export-pipeline-design-research.md deleted file mode 100644 index 02e8339..0000000 --- a/.copilot-tracking/research/20260116-export-pipeline-design-research.md +++ /dev/null @@ -1,221 +0,0 @@ ---- -description: Research findings to support an export pipeline design plan for Ground Truth Curator -ms.date: 2026-01-16 ---- - - -# Research: Export Pipeline Design - -## Tooling notes (how findings were verified) - -- Workspace search: `file_search` and `grep_search` were used to locate existing snapshot/export routes, services, and tests. -- File inspection: `read_file` was used to review the current implementations and confirm actual behavior. -- External references: `fetch_webpage` was used to pull verified FastAPI documentation for streaming/file download responses. - -## Scope - -Define an export pipeline architecture that supports: - -- The existing snapshot export behaviors (write artifacts + download as attachment) -- Multiple output formats (at least JSON; optionally CSV/JSONL later) -- Pluggable transformations (processors) and final serialization (formatters) -- Future storage targets (local filesystem today; Blob later) - -Additional requirement (user-provided): - -- The export endpoint (pipeline-based export) should support multiple backends via an interface/adapter layer. -- The initial concrete storage backend should point to Azure Blob Storage. - -## Verified repo findings (current state) - -### Existing export behaviors - -- There is a snapshot export write path: - - Endpoint: `POST /v1/ground-truths/snapshot` - - Implementation: `SnapshotService.export_json()` writes per-item JSON files and a `manifest.json` under `./exports/snapshots/{ts}/`. - - Source: `backend/app/services/snapshot_service.py` - -Minimal excerpt (write artifacts): - -```python -async def export_json(self) -> dict[str, str | int]: - ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ") - out_dir = self.base_dir / ts - out_dir.mkdir(parents=True, exist_ok=True) -``` - -- There is a snapshot download/read path: - - Endpoint: `GET /v1/ground-truths/snapshot` - - Implementation: builds an in-memory payload containing `{ schemaVersion, snapshotAt, datasetNames, count, filters, items }` and returns it as a JSON attachment (`Content-Disposition`). - - Source: `backend/app/api/v1/ground_truths.py` - -Minimal excerpt (attachment header): - -```python -return JSONResponse( - content=payload, - media_type="application/json", - headers={"Content-Disposition": f'attachment; filename="{filename}"'}, -) -``` - -- The frontend expects `Content-Disposition` for snapshot downloads and derives a filename from it. - - Source: `frontend/src/services/groundTruths.ts` - -Minimal excerpt (derives filename from header): - -```ts -const cd = res.headers.get("Content-Disposition") || res.headers.get("content-disposition") || ""; -const match = cd.match(/filename\*?=(?:UTF-8''|")?([^";]+)"?/i); -``` - -### Existing “storage adapter” building blocks - -- A `SnapshotStorage` protocol exists with `write_json(path, obj)`. - - Source: `backend/app/adapters/storage/base.py` - -- A local filesystem implementation exists. - - Source: `backend/app/adapters/storage/local_fs.py` - -- The current `SnapshotService` bypasses the `SnapshotStorage` abstraction and writes directly via `pathlib.Path`. - - Source: `backend/app/services/snapshot_service.py` - -### Azure Blob readiness (verified) - -- The backend configuration currently does not define any Blob-related settings (no `BLOB_*` fields). - - Source: `backend/app/core/config.py` - -- The backend dependency set currently does not include `azure-storage-blob` in `pyproject.toml`. - - Source: `backend/pyproject.toml` - -- The backend already includes `azure-identity` as a dependency, which can be used to authenticate to Azure Blob via `DefaultAzureCredential`. - - Source: `backend/pyproject.toml` - -- Project documentation anticipates Azure Blob support (account URL + container) and a future adapter module. - - Source: `backend/docs/fastapi-implementation-plan.md` - -### Existing docs influencing export - -- `docs/computed-tags-design.md` proposes an export processor / formatter pipeline: - - Processors: list-in/list-out transforms (merge tags, anonymize, split/explode, etc.) - - Formatters: final conversion to bytes/string (CSV, JSON) - - Configuration: env var ordering (e.g., `EXPORT_PROCESSOR_ORDER`) - -- `docs/json-export-migration-plan.md` documents a prior decision to move away from JSONL assumptions and keep snapshot artifacts as JSON. - -## Current code patterns that the export pipeline should align with - -### Serialization conventions - -- The backend consistently uses Pydantic v2 with `model_dump(mode="json", by_alias=True, exclude_none=True)`. - - Example usage in `SnapshotService.build_snapshot_payload()` and `SnapshotService.export_json()`. - -### Container wiring - -- Services are constructed in `backend/app/container.py` and injected through a singleton `container` referenced by routers. - - Snapshot route calls `container.snapshot_service.*`. - -### Computed tags and export - -- Computed tags are applied on write paths via `apply_computed_tags()`. - - This suggests exports should be explicit about whether they export: - - Raw stored fields (`manualTags`, `computedTags`), or - - A merged/derived field (`tags`) for downstream consumer compatibility. - -## Gaps / constraints - -- There is no generalized export endpoint or service beyond “snapshot”; exports are coupled to approved items only. -- There is no generic way to chain transformations (processors) or select output formats. -- The storage abstraction exists but is not currently used by `SnapshotService`. -- Download snapshot currently builds the full payload in memory; for large exports, streaming (or generating an artifact and returning it) may be preferable. - -Additional gaps (for Blob-first implementation): - -- No Azure Blob adapter implementation exists in `backend/app/` today. -- Settings are strict (`extra="forbid"`), so Blob env vars must be explicitly added to `Settings` before they can be used. -- `azure-storage-blob` must be added as a runtime dependency before implementing the adapter. - -## Proposed export pipeline architecture (evidence-based) - -This design combines: - -- The plugin-based processor/formatter approach from `docs/computed-tags-design.md` -- The concrete snapshot behaviors already implemented (`SnapshotService`) -- Standard FastAPI patterns for file downloads and streaming - -### Core concepts - -- **ExportJob input**: filters (dataset/status/tags), format selection, and processor list. -- **ExportRecord**: a dict-like representation (or a strongly typed DTO) produced from `GroundTruthItem.model_dump(..., by_alias=True)`. -- **ExportProcessor**: `List[dict] -> List[dict]` transformations. -- **ExportFormatter**: `List[dict] -> bytes | str` final serialization. -- **ExportTarget/Storage**: writes artifacts (local fs today; Blob later). - -### API surface recommendations - -- Keep the existing snapshot routes stable for backward compatibility. -- Add a new export endpoint that makes the pipeline explicit, e.g.: - - `GET /v1/exports/ground-truths?format=json&dataset=...&status=approved&processors=merge_tags,anonymize` - - or `POST /v1/exports/ground-truths` with a request body defining filters and options. - -### Streaming / large payload guidance - -FastAPI supports returning file-like responses without buffering whole payloads. - -- `FileResponse` can stream a generated artifact and sets `Content-Disposition` when `filename=` is provided. -- `StreamingResponse` can stream bytes from a generator if you want to avoid writing to disk first. - -External reference: -- FastAPI “Custom Response - HTML, Stream, File, others” (`StreamingResponse`, `FileResponse`): - - https://fastapi.tiangolo.com/advanced/custom-response/ - -Verified examples from the FastAPI docs (high level): - -- `StreamingResponse(generator(), media_type=...)` for streaming bytes from an iterator/generator. -- `FileResponse(path, filename=...)` for sending a file with `Content-Disposition`. - -## Compatibility and evolution plan - -- Phase 1: implement processors/formatters and keep output JSON-compatible with current snapshot payload and/or current per-item JSON artifacts. -- Phase 2: integrate a generalized export storage interface with Azure Blob as the initial concrete implementation (optionally keep local filesystem for dev/test). -- Phase 3: optional support for asynchronous/batched exports for very large datasets (queue + polling). - -## Concrete implementation guidance (what we should standardize) - -- Naming conventions: - - Processor names and formatter names should be lowercase and stable (e.g., `merge_tags`, `anonymize`, `json_items`, `json_snapshot_payload`). - -- Deterministic output (testability): - - Prefer stable key ordering for manifests where it matters; otherwise rely on JSON comparison via parsed objects. - - Avoid non-deterministic timestamps in unit tests by injecting a clock or allowing `snapshotAt` override. - -- Cosmos query considerations (future): - - Filter by dataset/bucket when possible to avoid cross-partition scans. - - If exports become “all datasets”, make it explicit and guarded. - -## Files most relevant to this task - -- `backend/app/services/snapshot_service.py` -- `backend/app/api/v1/ground_truths.py` -- `backend/app/adapters/storage/base.py` -- `backend/app/adapters/storage/local_fs.py` -- `docs/computed-tags-design.md` -- `docs/json-export-migration-plan.md` -- `frontend/src/services/groundTruths.ts` - -## External references: Azure Blob Storage (Python SDK) - -These references support the Blob-first storage backend plan and provide verified SDK/auth patterns. - -- Azure Storage Blobs client library for Python (overview + credential options + async notes): - - https://learn.microsoft.com/en-us/python/api/overview/azure/storage-blob-readme - -- Quickstart (managed identity / `DefaultAzureCredential` example): - - https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Cazure-portal - -Key takeaways for this repo’s planned adapter: - -- Client creation supports AAD token credentials (e.g., `DefaultAzureCredential`) with `account_url`, which aligns with using Managed Identity in production. -- The SDK provides async clients under `azure.storage.blob.aio`, but requires an async transport (commonly `aiohttp`) to be installed. -- A Blob-first delivery option can be implemented either by proxying downloads via the backend (preserve `Content-Disposition`) or by returning a SAS URL (client downloads directly). - diff --git a/.copilot-tracking/research/20260116-export-pipeline-implementation-research.md b/.copilot-tracking/research/20260116-export-pipeline-implementation-research.md deleted file mode 100644 index 706e3e3..0000000 --- a/.copilot-tracking/research/20260116-export-pipeline-implementation-research.md +++ /dev/null @@ -1,166 +0,0 @@ ---- -description: Research findings to support implementing the export pipeline design in code (backend-first) -ms.date: 2026-01-16 ---- - - -# Research: Export Pipeline Implementation - -## Tooling notes (how findings were verified) - -- Workspace search: `file_search`, `grep_search`, and `semantic_search` were used to locate snapshot/export routes, services, storage adapters, and tests. -- File inspection: `read_file` was used to confirm current endpoint behavior and service implementations. -- External references: `fetch_webpage` was used to pull verified guidance for FastAPI `FileResponse`/`StreamingResponse` and Azure Blob Storage Python SDK usage patterns. - -## Scope - -Implement the export pipeline architecture described in `docs/computed-tags-design.md` (Section 4.4) into the backend codebase while preserving existing snapshot endpoint behaviors. - -Out of scope for the first milestone (unless needed for compatibility): - -- New export endpoints beyond the existing snapshot routes -- Export job orchestration (async background jobs, polling endpoints) -- Additional formats (CSV/JSONL/ZIP) beyond the initial JSON formats described in the design - -## Verified repo findings (current state) - -### Snapshot routes and stable behaviors - -Backend snapshot endpoints exist and are currently relied upon by tests and the frontend: - -- `POST /v1/ground-truths/snapshot` - - Implementation: calls `SnapshotService.export_json()` - - Writes per-item JSON artifacts and a `manifest.json` under `exports/snapshots/{ts}/` - - Source: `backend/app/api/v1/ground_truths.py`, `backend/app/services/snapshot_service.py` - -- `GET /v1/ground-truths/snapshot` - - Implementation: returns an `application/json` payload with `Content-Disposition: attachment; filename="ground-truth-snapshot-.json"` - - Payload shape includes `schemaVersion`, `snapshotAt`, `datasetNames`, `count`, `filters`, `items` - - Source: `backend/app/api/v1/ground_truths.py`, `backend/app/services/snapshot_service.py` - -These behaviors are verified by tests: - -- Artifact write verification: `backend/tests/integration/test_snapshot_artifacts_cosmos.py` -- Download endpoint behavior: `backend/tests/integration/ground_truths/test_snapshot_download_endpoint.py` -- Payload shape/unit behavior: `backend/tests/unit/test_snapshot_service.py` - -### Frontend coupling - -Frontend snapshot download depends on the backend providing `Content-Disposition` for a filename. - -- Source: `frontend/src/services/groundTruths.ts` (parses `Content-Disposition` to derive filename) - -### Existing storage abstraction (partial) - -There is a small storage protocol already: - -- `SnapshotStorage` protocol with `write_json(path, obj)` - - Source: `backend/app/adapters/storage/base.py` -- `LocalFilesystemStorage` implementation - - Source: `backend/app/adapters/storage/local_fs.py` - -However, current snapshot code writes directly to disk via `pathlib.Path` and does not use the storage protocol. - -### Dependency and configuration constraints - -- Backend settings enforce `extra="forbid"`, so any new env vars must be explicitly added. - - Source: `backend/app/core/config.py` -- Backend dependencies include `azure-identity` but do not include `azure-storage-blob`. - - Source: `backend/pyproject.toml` - -### Existing "registry" patterns to follow - -Computed tags are implemented with: - -- Interface + registry (`ComputedTagPlugin`, `TagPluginRegistry`) -- Auto-discovery of plugin implementations via module scanning - -Sources: - -- `backend/app/plugins/base.py` -- `backend/app/plugins/registry.py` - -This is a good local precedent for the export processor/formatter registries. - -## Design requirements to implement (source of truth) - -The implementation should follow `docs/computed-tags-design.md` Section 4.4, including: - -- Processor and formatter interfaces: - - `ExportProcessor`: list-in/list-out deterministic transforms - - `ExportFormatter`: list-in -> `bytes|str` serialization -- Registries: - - Resolve processors and formatters by stable names - - Reject duplicates - - Unknown names produce a clear 400 error at the API -- Configuration: - - `GTC_EXPORT_PROCESSOR_ORDER` controls default processor order -- Initial pipeline features: - - Processor: `merge_tags` (derive `tags = union(manualTags, computedTags)`) - - Formatters: `json_snapshot_payload`, `json_items` -- Storage interface: - - Multi-backend export storage with `local` default and `blob` as initial cloud backend - - Stable artifact key layout: `exports/snapshots/{timestamp}/{filename}` -- Delivery modes: - - `attachment`, `artifact`, `stream` (backend sets `Content-Disposition`) -- Compatibility rule: - - Snapshot endpoints must remain backward compatible (payload keys and behavior expectations) - -## Implementation mapping (repo-aligned) - -### Recommended package layout (new) - -Create a backend package (example): - -- `backend/app/exports/` - - `models.py` (request DTOs, export record type aliases) - - `processors/` (merge_tags) - - `formatters/` (json_snapshot_payload, json_items) - - `registry.py` (processor/formatter registries) - - `storage/` (local + blob backends) - - `pipeline.py` (execution flow: load -> process -> format -> deliver) - -### Router/service integration - -- Keep router thin in `backend/app/api/v1/ground_truths.py`. -- Wire pipeline services through the singleton `container` in `backend/app/container.py`, similar to other services. -- Update `SnapshotService` to delegate to the pipeline for: - - building the snapshot payload - - writing artifacts - -## External references (verified) - -### FastAPI response types - -FastAPI (Starlette) supports streaming and file responses for download behavior. - -- `StreamingResponse` can stream from an iterator/generator or async generator. -- `FileResponse` can stream a local file and can set `Content-Disposition` using `filename=...`. - -Source: https://fastapi.tiangolo.com/advanced/custom-response/ - -### Azure Blob Storage SDK (Python) - -- `azure-storage-blob` is required for Blob operations; `azure-identity` provides `DefaultAzureCredential`. -- Async clients exist under `azure.storage.blob.aio` and are intended for use with `asyncio`. - -Sources: - -- https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python -- https://learn.microsoft.com/en-us/azure/developer/python/sdk/azure-sdk-library-usage-patterns#async - -Operational note for local dev: - -- Developers typically need the Storage Blob Data Contributor role for their identity to read/write blobs in a dev container. - -## Risks and compatibility traps - -- Changing the default behavior of `POST /v1/ground-truths/snapshot` could break integration tests (and any external automation). -- Changing `GET /v1/ground-truths/snapshot` headers or payload keys will break frontend download logic and snapshot tests. -- Introducing new env vars without updating `Settings` will fail app startup due to `extra="forbid"`. - -## Suggested verification approach - -- Run unit tests for pipeline components (registries, processors, formatters). -- Re-run existing snapshot integration tests to ensure the snapshot endpoints remain compatible. -- Ensure OpenAPI generation remains consistent if request models change (frontend uses generated client types). diff --git a/.copilot-tracking/research/20260116-manual-tags-design-research.md b/.copilot-tracking/research/20260116-manual-tags-design-research.md deleted file mode 100644 index 2ec4c17..0000000 --- a/.copilot-tracking/research/20260116-manual-tags-design-research.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -title: Manual Tags Design Research -description: Verified findings and references for implementing manual-tags design in GroundTruthCurator -ms.date: 2026-01-16 ---- - - -## Scope - -This research covers the current and intended design for **manual tags** in Ground Truth Curator, including: - -* Storage shape and validation rules -* Manual tag discovery (schema + registry + optional allowlist) -* API surface consumed by the frontend -* Cosmos DB persistence model for global tag registry -* Known interaction points with computed tags - -## Workspace reconnaissance (verified) - -### Tool usage (evidence collection) - -The findings above were collected using repository-wide searches and direct file inspection: - -* `grep_search` for `manualTags`, `computedTags`, `ALLOWED_MANUAL_TAGS`, `TagValidator`, and related symbols -* `read_file` of the concrete implementations and tests listed below -* `fetch_webpage` for the external FastAPI/Pydantic/Cosmos DB references - -### Key backend files - -* `backend/app/domain/models.py` - * `GroundTruthItem.manual_tags` stored as `manualTags`. - * `GroundTruthItem.computed_tags` stored as `computedTags`. - * `GroundTruthItem.tags` is a computed union for reads. - -* `backend/app/domain/validators.py` - * Pydantic v2 field validators coerce `manual_tags` and validate via `validate_tags()`. - * `computed_tags` are coerced only (no user validation). - -* `backend/app/services/tagging_service.py` - * Canonicalization rules (`normalize_tag`) enforce `group:value` format. - * `validate_tags()` enforces exclusivity/dependency rules for **known** groups. - * Unknown groups/values are allowed (format still required). - * `validate_tags_with_cache()` provides a stricter mode: manual tags must exist in a provided allow-set. - -* `backend/app/domain/tags.py` - * Defines `TAG_SCHEMA` for known groups and value sets. - * Defines rule plugins (`ExclusiveGroupRule`, `DependencyRule`) applied by `validate_tags()`. - -* `backend/app/api/v1/tags.py` - * `GET /v1/tags/schema` returns `TAG_SCHEMA` for frontend rendering and client-side validation. - * `GET /v1/tags` returns manual tags in `tags` plus computed tag keys in `computedTags`. - * When `GTC_ALLOWED_MANUAL_TAGS` is set, `GET /v1/tags` uses it as the manual-tag source-of-truth. - -* `backend/app/services/tag_registry_service.py` - * Implements add/remove/list over a single global tag list. - -* `backend/app/adapters/repos/tags_repo.py` - * Cosmos implementation stores a single document `id="tags|global"` in the tags container. - * Partition key `/pk` uses constant value `"global"`. - -* `backend/app/main.py` - * Startup fails fast if `GTC_ALLOWED_MANUAL_TAGS` overlaps static computed tag keys. - -### Key frontend files - -* `frontend/src/services/tags.ts` - * Fetches `GET /v1/tags/schema` and validates exclusive groups client-side. - * Fetches `GET /v1/tags` and uses `tags` as manual tags and `computedTags` as computed tags. - -### Tests demonstrating current behavior - -* `backend/tests/unit/test_groundtruthitem_tags_validation.py` - * Confirms unknown groups are allowed for `manualTags`. - * Confirms exclusive groups (e.g., `source:*`) reject multiple values. - -* `backend/app/services/validation_service.py` - * Bulk import validation uses `validate_tags_with_cache()` and the tag registry as the allow-set. - -## Current behavior summary (evidence-based) - -### Code excerpts (current patterns) - -Pydantic v2 validators on `manual_tags` enforce normalization + rule checks: - -```python -@field_validator("manual_tags", mode="before") -@classmethod -def _coerce_manual_tags(_cls, v: Any) -> list[str]: - return coerce_tags(v) - -@field_validator("manual_tags", mode="after") -@classmethod -def _validate_manual_tags(_cls, v: list[str]) -> list[str]: - return validate_tags(v) -``` - -The tags API returns manual tags and computed tag keys separately, with an env override for manual tags: - -```python -if settings.ALLOWED_MANUAL_TAGS: - manual_tags = [t.strip() for t in settings.ALLOWED_MANUAL_TAGS.split(",") if t and t.strip()] -else: - manual_tags = await container.tag_registry_service.list_tags() - -computed_tag_keys = sorted(get_default_registry().get_static_keys()) -return TagListResponse(tags=sorted(manual_tags), computedTags=computed_tag_keys) -``` - -### Canonical format - -* Tags must be `group:value`. -* Canonicalization lowercases, trims whitespace, normalizes `group : value` to `group:value`, and removes empty group/value. - -### Validation policy (two-tier) - -* **Default API/model validation (relaxed):** - * Accepts unknown groups and unknown values. - * Enforces exclusivity/dependency rules only for known groups in `TAG_SCHEMA`. - -* **Bulk import validation (strict allow-set):** - * Requires all manual tags to exist in the global tag registry set. - * Still enforces exclusivity/dependency rules. - -### Manual tag discovery sources - -Manual tags shown to the UI come from one of: - -* `GTC_ALLOWED_MANUAL_TAGS` (CSV) when set. -* Otherwise, the global tag registry (`TagRegistryService` backed by memory or Cosmos). - -Known schema groups/values are also provided independently via `GET /v1/tags/schema`. - -### Global tag registry storage - -* Cosmos tags container stores a single global doc: - * `id = "tags|global"` - * `pk = "global"` - * `tags = ["group:value", ...]` - -This is intentionally simple and matches current API semantics (global tags, not per-dataset). - -## Gaps / decision points to resolve in the manual-tags “design” - -These are the key choices that affect implementation work: - -1. **Should runtime writes (PUT ground truths / assignments) be strict allow-set, or remain relaxed?** - * Current behavior is relaxed for normal writes, strict for bulk import. - -2. **What is the long-term source of truth for “allowed manual tags”?** - * Current options: env allowlist or global registry. - * A provider abstraction is partially implemented via `GTC_ALLOWED_MANUAL_TAGS` override, but not expressed as a formal interface. - -3. **Do we need per-dataset or per-tenant tag registries?** - * Current registry is global. - -4. **How should manual tags interact with computed tags?** - * Startup checks prevent allowlist collisions with computed tags. - * Write path strips computed tags from manual tags during `apply_computed_tags()`. - -## External references (for implementation correctness) - -* Pydantic v2 validators (`field_validator`, before/after modes): - * - -* FastAPI `response_model` behavior and filtering: - * - -* Cosmos DB partitioning and logical partition limits (relevant for global tags container design): - * diff --git a/.copilot-tracking/research/20260121-cosmos-repo-refactor-research.md b/.copilot-tracking/research/20260121-cosmos-repo-refactor-research.md deleted file mode 100644 index 782256e..0000000 --- a/.copilot-tracking/research/20260121-cosmos-repo-refactor-research.md +++ /dev/null @@ -1,180 +0,0 @@ - -# Task Research: Cosmos Repo / Service Layer Refactor - -Build refactoring research for: - -* Logic currently in `cosmos_repo.py` that should live in the service layer instead. -* Logic currently in API routes/handlers that should live in the service layer instead. -* A new `cosmos_emulator.py` that inherits from (or wraps) `cosmos_repo.py` and overrides emulator-specific behavior, instead of intermixing emulator conditionals inside `cosmos_repo.py`. - -## Task Implementation Requests - -* Identify and classify responsibilities currently in `cosmos_repo.py` (pure persistence vs domain/service logic vs emulator quirks). -* Identify API-layer business logic candidates to move into services. -* Propose a repo/service/emulator class/module structure, including the specific seams to override for the emulator. -* Provide actionable refactor steps with exact file references (paths and line ranges). - -## Scope and Success Criteria - -* Scope: - * Backend Python code only. - * Focus on Cosmos DB repository + emulator behavior + API handlers. -* Out of scope: - * Frontend changes. - * Large behavioral changes; this is a refactor plan. -* Assumptions: - * There is an existing Cosmos repository abstraction used by services and API. - * Emulator-specific behavior is currently mixed into production Cosmos codepaths. -* Success Criteria: - * A concrete, evidence-backed map of what to move and where. - * One recommended design for `cosmos_repo.py` + `cosmos_emulator.py` and service boundaries. - * Refactor steps that minimize risk and avoid breaking dependency injection. - -## Outline - -1. Convention discovery (repo-specific guidelines, layering conventions) -2. Current-state inventory - * `cosmos_repo.py` responsibilities - * Emulator-specific branching points - * API endpoints containing business logic - * Service layer responsibilities today -3. Target architecture - * Repository interface vs implementation - * Emulator-specific subclass/adapter - * Service boundaries and orchestration -4. Migration plan - * Mechanical steps - * High-risk areas - * Suggested tests/verification steps - -## Research Executed - -### Project Conventions - -* Layering is documented as API → Services → Repos/Adapters, composed via a singleton container ([backend/CODEBASE.md](backend/CODEBASE.md#L20-L29)). -* DI wiring follows a global `container` with an async `startup_cosmos()` initialization path ([backend/app/container.py](backend/app/container.py#L83-L161)). -* There is existing, explicitly documented emulator/conditional behavior inside the Cosmos repo (e.g., the conditional patch implementation for `assign_to`) ([backend/CONDITIONAL_PATCH_IMPLEMENTATION.md](backend/CONDITIONAL_PATCH_IMPLEMENTATION.md#L11-L22), [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1409-L1609)). -* Emulator limitations are already recognized and sometimes require alternate query behavior (notably `ARRAY_CONTAINS`) ([backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md#L5-L36)). -* Emulator Unicode/backslash issues are handled via a feature flag (base64 encoding of `refs[*].content`) ([backend/docs/cosmos-emulator-unicode-workaround.md](backend/docs/cosmos-emulator-unicode-workaround.md#L35-L39)). - -### File Analysis - -* Repository implementation: - * Cosmos repo: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L389-L443) - * Repo interface/base: [backend/app/adapters/repos/base.py](backend/app/adapters/repos/base.py#L1-L55) -* API endpoints with notable workflow logic: - * Assignments: [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L78-L232) - * Ground truths: [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L105-L154), [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L232-L369) -* Existing service boundary: - * Assignment service: [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py#L44-L146) -* DI wiring: - * Container composition: [backend/app/container.py](backend/app/container.py#L83-L161) - -### Code Search Results - -* Emulator/compat toggles and fallbacks exist in the repo and influence query shape and/or write behavior: - * Pagination/query logic and limitations: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L660-L911) - * Unicode/emulator workarounds and retry behavior are present in the repo write-paths (see transform/retry regions): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L500-L590) -* Conditional patch vs read-modify-replace assignment semantics are implemented in the repo today: - * Assignment patch implementation: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1409-L1609) - * Assignment fallback path: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1649-L1680) - -## Key Discoveries - -## Research Inputs - -* Conventions: [.copilot-tracking/subagent/20260121/conventions-research.md](.copilot-tracking/subagent/20260121/conventions-research.md) -* API hotspots: [.copilot-tracking/subagent/20260121/api-logic-research.md](.copilot-tracking/subagent/20260121/api-logic-research.md) -* Cosmos repo deep dive: [.copilot-tracking/subagent/20260121/cosmos-repo-research.md](.copilot-tracking/subagent/20260121/cosmos-repo-research.md) -* Consolidated synthesis: [.copilot-tracking/subagent/20260121/synthesis-notes.md](.copilot-tracking/subagent/20260121/synthesis-notes.md) - -### Project Structure - -* The backend already has an explicit `services/` layer, but some orchestration/workflow logic remains in routers and in the Cosmos repo. -* The Cosmos repo currently contains both production Cosmos behavior and emulator compatibility behavior. - -### Implementation Patterns - -* API handlers perform multi-step update workflows (parse → read existing → compute changes → write → post-processing) that are better owned by services to keep business rules testable and reusable. -* The repo includes conditional patch logic for assignments (optimized for Cosmos) that is known to be incompatible with emulator behavior; this is the clearest subclass override seam. - -### Emulator Split Findings - -The currently mixed emulator-specific behavior clusters into three themes: - -* Query limitations (emulator does not support some predicates/constructs): [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md#L5-L36) -* Write-path transforms to avoid Unicode/backslash issues: [backend/docs/cosmos-emulator-unicode-workaround.md](backend/docs/cosmos-emulator-unicode-workaround.md#L35-L39) -* Assignment update semantics (patch vs read-modify-replace): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1409-L1609) - -## Technical Scenarios - -### Scenario: Split persistence vs service logic - -**Requirements:** - -* Keep persistence code (query building, paging, RU/diagnostics, container interactions) in repo. -* Move domain decisions, validation, and orchestration to services. - -**Preferred Approach:** - -* Keep `cosmos_repo.py` as the production implementation of the existing repo interface. -* Move workflow/domain decisions into services (thin repo; services orchestrate). -* Add `cosmos_emulator.py` that subclasses the production repo and overrides only emulator-specific seams. - -Recommended override seams for `cosmos_emulator.py` (inheritance-based): - -* `is_cosmos_emulator_in_use()` -* `list_gt_paginated(...)` (force emulator-safe filtering strategy) ([backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L660-L911)) -* `assign_to(...)` (force read-modify-replace; avoid patch predicates) ([backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1409-L1609), [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1649-L1680)) -* `upsert_gt(...)` / `delete_dataset(...)` (centralize emulator-specific retry policy) -* `_transform_doc_for_write(...)` and `_transform_doc_for_read(...)` (unicode/base64 workaround seam) - -Target file tree (conceptual): - -```text -backend/app/adapters/repos/ - base.py - cosmos_repo.py # production implementation - cosmos_emulator.py # emulator implementation (subclass) -``` - -_TBD once we see the actual code structure._ - -#### Considered Alternatives - -* Keep emulator conditionals in `cosmos_repo.py` with flags: - * Pros: fewer new files/classes. - * Cons: continued intermixing; harder to reason about production behavior and to test. -* Strategy object injected into repo (instead of subclass): - * Pros: explicit seam without inheritance. - * Cons: more plumbing and indirection than needed if only a handful of methods differ. - -### Scenario: Move API logic to services - -**Requirements:** - -* API handlers should do: auth/identity extraction, request parsing/validation, response shaping. -* Services should do: cross-entity workflows, domain decisions, idempotency semantics, event-ish side effects. - -Current hotspots (examples) where routers exceed orchestration: - -* Assignments workflow logic in the router: [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L78-L232) -* Ground truth update workflow logic in the router: [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L232-L369) -* Ground truth list/import validation and workflow logic: [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L105-L154) - -Proposed service extraction: - -* Introduce a `GroundTruthUpdateService` responsible for the end-to-end update workflow used by multiple endpoints (read, validate, normalize, write, post-process). -* Move assignment selection/sampling rules fully into the assignment service layer (building on [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py#L44-L146)). - -## Recommended Migration Plan (Low-Risk) - -1) Introduce typed domain exceptions (stable API-level mapping). -2) Add `GroundTruthUpdateService` with a single “update workflow” entrypoint. -3) Switch routers to call the service (handlers become thin orchestration). -4) Extract request parsing helpers into a shared module (router/service reuse). -5) Move assignment sampling/selection logic out of the repo into services. -6) Move derived-field computation (e.g., `totalReferences`) out of the repo into services/domain normalization. -7) Add `cosmos_emulator.py` (subclass) and select it in the container wiring ([backend/app/container.py](backend/app/container.py#L83-L161)). -8) Centralize document transforms behind `_transform_doc_for_write/_transform_doc_for_read` seam. -9) Update tests to target seams (behavior-preserving refactor first). diff --git a/.copilot-tracking/research/20260121-high-level-requirements-research.md b/.copilot-tracking/research/20260121-high-level-requirements-research.md deleted file mode 100644 index b1b5fc8..0000000 --- a/.copilot-tracking/research/20260121-high-level-requirements-research.md +++ /dev/null @@ -1,141 +0,0 @@ - -# Task Research: High-Level Requirements Extraction (Frontend + Backend) - -Extract product/system requirements that match the *existing system* and keep them high-level (behavioral), avoiding implementation details. Cover both frontend and backend. - -## Task Implementation Requests - -* Extract high-level requirements already present in the repo (docs + PRD artifacts) -* Ensure requirements reflect current frontend and backend capabilities -* Avoid implementation-specific constraints (frameworks, file structure, concrete endpoints) unless required for behavior - -## Scope and Success Criteria - -* Scope: Requirements derived from existing repo artifacts (PRD JSON/TXT, README/CODEBASE docs, backend docs, frontend docs). -* Exclusions: New feature ideation not supported by evidence in repo; low-level implementation steps. -* Success Criteria: - * Requirements are grouped (Product, Frontend UX, Backend/API, Data/Storage, Export, Observability, Testing/Quality) - * Each requirement is backed by at least one repo source reference (file + line range) - * Requirements are written in “shall/should/may” language and are implementation-agnostic - -## Outline - -1. Evidence log (what was read) -2. Consolidated requirement set -3. Gaps/ambiguities where docs conflict -4. Recommended next validation questions - -## Supporting Research - -Detailed extractions and audits used to build this document: - -- PRD extraction + match-to-system flags: [.copilot-tracking/subagent/20260121/prd-requirements-research.md](.copilot-tracking/subagent/20260121/prd-requirements-research.md) -- Frontend capability extraction: [.copilot-tracking/subagent/20260121/frontend-requirements-research.md](.copilot-tracking/subagent/20260121/frontend-requirements-research.md) -- Backend capability extraction: [.copilot-tracking/subagent/20260121/backend-requirements-research.md](.copilot-tracking/subagent/20260121/backend-requirements-research.md) -- Repo conventions + sources-of-truth: [.copilot-tracking/subagent/20260121/conventions-and-sources-research.md](.copilot-tracking/subagent/20260121/conventions-and-sources-research.md) -- Requirements synthesis working doc: [.copilot-tracking/subagent/20260121/consolidated-requirements-synthesis.md](.copilot-tracking/subagent/20260121/consolidated-requirements-synthesis.md) -- Citation validation for this note: [.copilot-tracking/subagent/20260121/citation-validation.md](.copilot-tracking/subagent/20260121/citation-validation.md) -- Reference audit (present vs linked): [.copilot-tracking/subagent/20260121/subagent-reference-audit.md](.copilot-tracking/subagent/20260121/subagent-reference-audit.md) - -### Potential Next Research - -* Identify which PRD items are intentionally deferred vs removed - * Reasoning: PRD contains capabilities not currently reflected in frontend/backend docs - * Reference: [prd.json](prd.json) - -## Research Executed - -### Evidence log (sources reviewed) -- Primary requirements sources: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md), [prd.json](prd.json), [prd-genericize.json](prd-genericize.json), [ralph/ralph-prd.txt](ralph/ralph-prd.txt), [BUSINESS_VALUE.md](BUSINESS_VALUE.md) -- Frontend behavior and UX invariants: [frontend/CODEBASE.md](frontend/CODEBASE.md#L70-L180), [frontend/README.md](frontend/README.md#L25-L92), [frontend/IMPLEMENTATION_SUMMARY.md](frontend/IMPLEMENTATION_SUMMARY.md#L84-L165) -- Backend behavior and API semantics: [backend/CODEBASE.md](backend/CODEBASE.md#L14-L35), [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md#L7-L113), [backend/docs/api-write-consolidation-plan.v2.md](backend/docs/api-write-consolidation-plan.v2.md#L62-L67) -- Assignment workflow (single-item + materialized assignment doc): [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md#L20-L95), [backend/app/adapters/repos/base.py](backend/app/adapters/repos/base.py#L47-L65) -- Multi-turn backend compatibility: [backend/docs/multi-turn-refs.md](backend/docs/multi-turn-refs.md#L5-L75) -- Export/snapshot behavior: [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md#L24-L40), [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md#L72-L93), [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md#L117-L127) -- Tag rules and normalization: [backend/docs/tagging_plan.md](backend/docs/tagging_plan.md#L5-L13), [backend/docs/tagging_plan.md](backend/docs/tagging_plan.md#L54-L61) -- Cosmos emulator operational constraints + workarounds: [backend/app/main.py](backend/app/main.py#L60-L82), [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md#L1-L25), [backend/docs/cosmos-emulator-unicode-workaround.md](backend/docs/cosmos-emulator-unicode-workaround.md#L35-L38) -- Observability/telemetry expectations: [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L13-L17), [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L79-L86) -- Dev user simulation header: [backend/README.md](backend/README.md#L336-L338), [frontend/README.md](frontend/README.md#L27-L32) - -### Research executed summary -- Extracted behavioral requirements from primary requirement sources and current codebase docs (frontend + backend) and selected “contract” docs in backend `docs/`. -- Validated concurrency, assignment, and emulator constraints against code-level sources where available (repo protocol + app startup). -- Identified doc conflicts where frontend requirements docs diverge from current implemented flows. - -## Consolidated Requirements - -### Product / User Goals -- The system shall support an assignment-based curation workflow where users work from a queue of assigned items and can request more assignments (“self-serve”). [frontend/CODEBASE.md](frontend/CODEBASE.md#L124-L149) -- The system should support explicitly assigning a specific item to oneself, including conflict protection when another user already holds a draft assignment. [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md#L20-L39) -- The system shall support both single-turn (Q/A) and multi-turn (conversation history) ground-truth editing while preserving backward compatibility for existing item shapes. [frontend/IMPLEMENTATION_SUMMARY.md](frontend/IMPLEMENTATION_SUMMARY.md#L104-L165), [backend/docs/multi-turn-refs.md](backend/docs/multi-turn-refs.md#L5-L75) - -### Frontend UX Requirements -- The UI shall provide a single-page curation workspace with distinct queue, editor/actions, and references areas. [frontend/CODEBASE.md](frontend/CODEBASE.md#L79-L80) -- The UI shall gate approval on reference completeness: at least one selected reference; all references visited; selected references include a key paragraph with minimum length (≥40 chars); deleted items cannot be approved. [frontend/CODEBASE.md](frontend/CODEBASE.md#L79-L79), [frontend/CODEBASE.md](frontend/CODEBASE.md#L119-L122), [frontend/src/components/app/defaultCurateInstructions.md](frontend/src/components/app/defaultCurateInstructions.md#L1-L4) -- The UI shall support reference workflows including search, adding selected references, URL de-duplication, visited tracking (open-in-new-tab), and key-paragraph editing with a counter. [frontend/CODEBASE.md](frontend/CODEBASE.md#L141-L143), [frontend/CODEBASE.md](frontend/CODEBASE.md#L152-L165) -- The UI should support removing a reference with an undo window and provide toast-based feedback for key actions and failures. [frontend/CODEBASE.md](frontend/CODEBASE.md#L136-L136), [frontend/CODEBASE.md](frontend/CODEBASE.md#L164-L165) -- The UI shall support soft delete + restore semantics and prevent approval of deleted items. [frontend/CODEBASE.md](frontend/CODEBASE.md#L147-L147), [frontend/CODEBASE.md](frontend/CODEBASE.md#L79-L79) -- The UI should detect no-op saves and report “No changes” rather than issuing an update that changes nothing. [frontend/CODEBASE.md](frontend/CODEBASE.md#L145-L145) -- The UI shall support snapshot export by downloading a backend-provided JSON snapshot. [frontend/CODEBASE.md](frontend/CODEBASE.md#L146-L146) -- The UI shall support multi-turn editing features (timeline, turn add/delete/edit, mode toggle), plus multi-turn approval constraints requiring reference relevance marking and key-paragraph constraints for “relevant” references. [frontend/IMPLEMENTATION_SUMMARY.md](frontend/IMPLEMENTATION_SUMMARY.md#L86-L151) -- The UI should support a demo mode that disables or safely no-ops telemetry and can use mock providers. [frontend/README.md](frontend/README.md#L73-L92), [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L13-L17) -- The UI should support dataset-level curation instructions fetch/update (including concurrency via ETag on update). [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L15-L18) - -### Backend / API Requirements -- The backend shall expose a health endpoint at `GET /healthz`. [backend/CODEBASE.md](backend/CODEBASE.md#L14-L15), [backend/app/main.py](backend/app/main.py#L147-L149) -- The backend shall accept both snake_case and camelCase inputs and always emit camelCase outputs. [backend/CODEBASE.md](backend/CODEBASE.md#L31-L32) -- The backend shall enforce optimistic concurrency on write paths using ETags: updates require `If-Match` (or equivalent request ETag) and return HTTP 412 on missing/mismatch with stable error semantics. [backend/CODEBASE.md](backend/CODEBASE.md#L33-L33), [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md#L75-L113) -- Assignment mutation endpoints shall enforce assignment ownership and return a stable ownership error when violated. [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md#L82-L86) -- Assignment state transitions (approve/skip/delete) shall clear assignment fields atomically with the status change, and assignment timestamps shall be timezone-aware UTC. [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md#L7-L14), [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md#L154-L156) -- Assignment list responses shall include `etag` in the JSON body (even if per-item `ETag` headers are optional). [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md#L31-L35) -- The backend shall provide a single-item self-assign flow where assignment sets status to draft (even from approved/deleted/skipped) and rejects assignment of items draft-assigned to a different user. [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md#L29-L39) -- The backend should maintain a secondary assignment document (materialized view) keyed for fast per-user assignment queries. [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md#L88-L95), [backend/app/adapters/repos/base.py](backend/app/adapters/repos/base.py#L55-L65) -- Ground-truth item writes should be consolidated into SME PUT and Curator PUT flows (with import remaining create-only). [backend/docs/api-write-consolidation-plan.v2.md](backend/docs/api-write-consolidation-plan.v2.md#L62-L65) - -### Data & Storage Requirements -- The backend shall abstract persistence behind a repository protocol to support multiple backends (Cosmos as production backend). [backend/app/adapters/repos/base.py](backend/app/adapters/repos/base.py#L17-L45), [backend/CODEBASE.md](backend/CODEBASE.md#L24-L30) -- The backend shall support local development using the Cosmos DB Emulator and should not block startup if Cosmos initialization fails (e.g., emulator not ready). [backend/app/main.py](backend/app/main.py#L60-L82) -- The system shall account for Cosmos DB Emulator query limitations (e.g., lack of `ARRAY_CONTAINS`) by adjusting behavior and/or skipping incompatible tests. [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md#L5-L25) -- The system may support a Cosmos emulator-specific Unicode escape workaround when configured (to avoid emulator-only invalid escape failures). [backend/docs/cosmos-emulator-unicode-workaround.md](backend/docs/cosmos-emulator-unicode-workaround.md#L35-L38) - -### Export / Snapshot Requirements -- The backend shall support snapshot export in `attachment` (single JSON) and `artifact` (per-item JSON + manifest) modes with defined defaults when no request body is provided. [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md#L24-L33) -- The snapshot download endpoint shall return a JSON document payload (not artifacts). [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md#L34-L40) -- Artifact exports shall include a manifest with a stable `schemaVersion` and related snapshot metadata. [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md#L81-L93) -- Export processors shall run before formatting and may merge tag fields into a single exported `tags` array. [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md#L117-L127) - -### Observability & Operations Requirements -- Client telemetry shall be opt-in, disabled by default, and safe-by-default (no-op in demo mode or when configuration is missing). [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L13-L17), [frontend/README.md](frontend/README.md#L82-L92) -- The UI shall provide an error boundary that catches rendering errors and renders a user-friendly fallback (and may integrate with telemetry when enabled). [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L79-L86) - -### Security & Privacy Requirements -- In development, the system should support user simulation via an `X-User-Id` header to drive per-user assignment behavior and testing. [backend/README.md](backend/README.md#L336-L338), [frontend/README.md](frontend/README.md#L27-L32) - -### Quality / Testing Requirements -- Tag normalization should be deterministic (normalize + deduplicate + sort) to ensure stable storage and comparisons. [backend/docs/tagging_plan.md](backend/docs/tagging_plan.md#L54-L57) -- Emulator-incompatible tests (or behaviors) should be gated or skipped to avoid false failures in local/emulator workflows. [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md#L9-L12) - -## Gaps and Conflicts - -- Reference search capability conflicts across frontend docs: MVP doc claims no backend search API endpoint, while the codebase guide describes a backend `searchReferences` flow used by the UI. [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L27-L31), [frontend/CODEBASE.md](frontend/CODEBASE.md#L141-L142) -- Tag semantics/validation scope is ambiguous between “canonical `group:value` tags” and per-history optional tags (unclear whether the same normalization/validation rules apply). [backend/docs/tagging_plan.md](backend/docs/tagging_plan.md#L5-L13), [backend/docs/history-tags-feature.md](backend/docs/history-tags-feature.md#L3-L6) -- Tag registry write expectations conflict: MVP doc states “allow the user to create new tags” while also stating “no write endpoints for tags.” [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L22-L24) -- Cosmos emulator Unicode workaround coverage may be inconsistent: workaround doc claims it is applied to tag storage, but the tags repo upsert path shown does not indicate any special encoding/sanitization. [backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md](backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md#L111-L123), [backend/app/adapters/repos/tags_repo.py](backend/app/adapters/repos/tags_repo.py#L131-L141) - -## Next Validation Questions - -- Should reference search be treated as a required capability (backend API exists/should exist), or is it optional/stubbed for now? [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L27-L31), [frontend/CODEBASE.md](frontend/CODEBASE.md#L141-L142) -- For tags: are users allowed to create new tags end-to-end, and if so, what is the intended write path (if “no write endpoints” remains true)? [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L22-L24), [backend/app/adapters/repos/tags_repo.py](backend/app/adapters/repos/tags_repo.py#L131-L154) -- For multi-turn: is backend persistence expected to include reference relevance fields (relevant/neutral/irrelevant), or is that currently frontend-only state? [frontend/IMPLEMENTATION_SUMMARY.md](frontend/IMPLEMENTATION_SUMMARY.md#L93-L151) -- For assignments: confirm intended semantics for listing “my assignments” (draft-only vs broader statuses) and how single-item assignment should interact with those semantics. [backend/CODEBASE.md](backend/CODEBASE.md#L154-L156), [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md#L29-L39) -- For Cosmos emulator Unicode handling: should tag registry writes also apply the configured workaround (as docs imply), or should the docs be updated to reflect current behavior? [backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md](backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md#L111-L123), [backend/app/adapters/repos/tags_repo.py](backend/app/adapters/repos/tags_repo.py#L131-L141) - -## PRD Items Not Yet Supported (Tracked Separately) - -> These appear in PRD artifacts but are not clearly supported by the existing frontend/backend system behaviors today. - -- AI-powered reference retrieval (attach/detach, query orchestration) and LLM-generated artifacts. Source PRD artifacts: [prd.json](prd.json), [ralph/ralph-prd.txt](ralph/ralph-prd.txt) -- Dedicated tag administration endpoints/UI beyond current normalization + selection behaviors. Source PRD artifacts: [prd.json](prd.json) -- Full auth/RBAC integration (e.g., Entra) beyond the dev `X-User-Id` simulation mechanism. Source PRD artifacts: [prd.json](prd.json) - -For a fuller breakdown (with evidence + “matches existing system” flags), see: [.copilot-tracking/subagent/20260121/prd-requirements-research.md](.copilot-tracking/subagent/20260121/prd-requirements-research.md) diff --git a/.copilot-tracking/spec-sessions/jtbd-001-current-state.state.json b/.copilot-tracking/spec-sessions/jtbd-001-current-state.state.json deleted file mode 100644 index 15ae312..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-001-current-state.state.json +++ /dev/null @@ -1,72 +0,0 @@ -{ - "jtbdId": "JTBD-001", - "jtbdStatement": "Help curators review and approve ground-truth data items through an assignment-based workflow", - "lastAccessed": "2026-01-22T00:00:00Z", - "currentPhase": "handoff", - "completedPhases": [ - "jtbd-discovery", - "topic-decomposition", - "topic-research", - "spec-generation" - ], - "topics": [ - { - "name": "assignment-workflow", - "description": "The assignment workflow manages how curators receive, claim, and complete work items.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/assignment-workflow-research.md", - "specFile": "specs/assignment-workflow.md", - "status": "complete" - }, - { - "name": "explorer-view", - "description": "The explorer view allows curators to browse and filter ground-truth items outside the assigned queue.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/explorer-view-research.md", - "specFile": "specs/explorer-view.md", - "status": "complete" - }, - { - "name": "curation-editor", - "description": "The curation editor enables viewing and editing ground-truth content, including tags.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/curation-editor-research.md", - "specFile": "specs/curation-editor.md", - "status": "complete" - }, - { - "name": "reference-management", - "description": "The reference management system supports adding, visiting, and annotating supporting sources.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/reference-management-research.md", - "specFile": "specs/reference-management.md", - "status": "complete" - }, - { - "name": "export-snapshots", - "description": "The export system generates downloadable JSON snapshots of curated data.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/export-snapshots-research.md", - "specFile": "specs/export-snapshots.md", - "status": "complete" - }, - { - "name": "data-persistence", - "description": "The persistence layer abstracts storage behind repositories with Cosmos DB as the primary backend.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/data-persistence-research.md", - "specFile": "specs/data-persistence.md", - "status": "complete" - }, - { - "name": "observability-operations", - "description": "The observability and operations system provides opt-in telemetry, error handling, and health status.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/observability-operations-research.md", - "specFile": "specs/observability-operations.md", - "status": "complete" - } - ], - "openQuestions": [], - "nextActions": [] -} diff --git a/.copilot-tracking/spec-sessions/jtbd-002-sme-curation.state.json b/.copilot-tracking/spec-sessions/jtbd-002-sme-curation.state.json deleted file mode 100644 index bfc1b0f..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-002-sme-curation.state.json +++ /dev/null @@ -1,82 +0,0 @@ -{ - "jtbdId": "JTBD-002", - "jtbdStatement": "Help SMEs curate ground truth items effectively (enhancements)", - "lastAccessed": "2026-01-22T00:00:00Z", - "currentPhase": "spec-generation", - "completedPhases": [ - "jtbd-discovery", - "topic-decomposition", - "topic-research" - ], - "topics": [ - { - "name": "assignment-error-feedback", - "description": "The assignment error feedback system displays specific, actionable messages when assignment operations fail due to conflicts.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/assignment-error-feedback-research.md", - "specFile": "specs/assignment-error-feedback.md", - "status": "draft" - }, - { - "name": "assignment-takeover", - "description": "The assignment takeover system allows SMEs to reassign items currently assigned to others with appropriate confirmation.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/assignment-takeover-research.md", - "specFile": "specs/assignment-takeover.md", - "status": "draft" - }, - { - "name": "explorer-state-preservation", - "description": "The explorer state preservation system maintains filter and view state when users perform actions that navigate away.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/explorer-state-preservation-research.md", - "specFile": "specs/explorer-state-preservation.md", - "status": "draft" - }, - { - "name": "draft-duplicate-detection", - "description": "The draft duplicate detection system warns SMEs when draft items appear to duplicate approved items.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/draft-duplicate-detection-research.md", - "specFile": "specs/draft-duplicate-detection.md", - "status": "draft" - }, - { - "name": "modal-keyboard-handling", - "description": "The modal keyboard handling system ensures keyboard shortcuts do not unexpectedly close or interfere with modal interactions.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/modal-keyboard-handling-research.md", - "specFile": "specs/modal-keyboard-handling.md", - "status": "draft" - }, - { - "name": "validation-error-clarity", - "description": "The validation error clarity system translates backend validation errors into user-friendly messages with remediation guidance.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/validation-error-clarity-research.md", - "specFile": "specs/validation-error-clarity.md", - "status": "draft" - }, - { - "name": "inspection-performance", - "description": "The inspection performance system caches and memoizes data to improve responsiveness when viewing ground truth items.", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/inspection-performance-research.md", - "specFile": "specs/inspection-performance.md", - "status": "draft" - } - ], - "openQuestions": [ - "What UI surface should the View assignee action open by default?", - "Which Explorer filters and UI controls should be included in URL state for v1?", - "What is the first duplicate matching heuristic for v1?", - "Which non-Escape keys should be stopped from propagating for all modals?", - "Which field name is canonical for the key paragraph in the API?", - "What TTL is appropriate for the inspect cache if we cannot reliably invalidate on all edits?" - ], - "nextActions": [ - "Review specs for accuracy against current codebase", - "Resolve open questions and update specs", - "Hand off to Planning mode to generate an implementation plan" - ] -} diff --git a/.copilot-tracking/spec-sessions/jtbd-003-find-filter.state.json b/.copilot-tracking/spec-sessions/jtbd-003-find-filter.state.json deleted file mode 100644 index ddf0dfc..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-003-find-filter.state.json +++ /dev/null @@ -1,52 +0,0 @@ -{ - "jtbdId": "JTBD-003", - "jtbdStatement": "Help users find and filter ground truth items (enhancements)", - "lastAccessed": "2026-01-22T12:00:00Z", - "currentPhase": "handoff", - "completedPhases": [ - "jtbd-discovery", - "topic-decomposition", - "topic-research", - "spec-generation" - ], - "topics": [ - { - "name": "keyword-search", - "description": "The keyword search system enables users to find ground truth items by searching text across all multi-turn history", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/keyword-search-research.md", - "specFile": "specs/keyword-search.md", - "status": "complete", - "stories": ["SA-828"] - }, - { - "name": "tag-filtering", - "description": "The tag filtering system allows users to include, exclude, or apply boolean logic to filter items by tags", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/tag-filtering-research.md", - "specFile": "specs/tag-filtering.md", - "status": "complete", - "stories": ["SA-363"] - }, - { - "name": "explorer-sorting", - "description": "The Explorer sorting system handles column sort order, sort direction indicators, and tag-count sorting", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/explorer-sorting-research.md", - "specFile": "specs/explorer-sorting.md", - "status": "complete", - "stories": ["SA-684", "SA-361"] - } - ], - "excludedStories": [ - { - "story": "SA-463", - "reason": "Bug fix - Explorer layout overflow. Too small for full spec, treat as quick fix." - } - ], - "openQuestions": [], - "nextActions": [ - "Hand off to Planning Mode to generate IMPLEMENTATION_PLAN.md", - "Or continue with another JTBD specification" - ] -} diff --git a/.copilot-tracking/spec-sessions/jtbd-004-data-integrity-security.state.json b/.copilot-tracking/spec-sessions/jtbd-004-data-integrity-security.state.json deleted file mode 100644 index db79c44..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-004-data-integrity-security.state.json +++ /dev/null @@ -1,47 +0,0 @@ -{ - "jtbdId": "JTBD-004", - "jtbdStatement": "Help administrators ensure data integrity and security", - "lastAccessed": "2026-01-22T12:00:00Z", - "currentPhase": "handoff", - "completedPhases": ["jtbd-discovery", "topic-decomposition", "topic-research", "spec-generation"], - "topics": [ - { - "name": "pii-detection", - "description": "The PII detection system scans ground truth content during import to identify and flag personally identifiable information", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/pii-detection-research.md", - "specFile": "specs/pii-detection.md", - "status": "complete", - "stories": ["SA-669"] - }, - { - "name": "dos-prevention", - "description": "The DoS prevention system enforces batch size limits and rate limiting on bulk import endpoints", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/dos-prevention-research.md", - "specFile": "specs/dos-prevention.md", - "status": "complete", - "stories": ["SA-409"] - }, - { - "name": "xss-sanitization", - "description": "The XSS sanitization system cleanses user-generated content to prevent script injection attacks", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/xss-sanitization-research.md", - "specFile": "specs/xss-sanitization.md", - "status": "complete", - "stories": ["SA-565"] - }, - { - "name": "batch-validation", - "description": "The batch validation system provides detailed error feedback and proper data integrity checks during bulk imports", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/batch-validation-research.md", - "specFile": "specs/batch-validation.md", - "status": "complete", - "stories": ["SA-241"] - } - ], - "openQuestions": ["Should SA-565 be updated to reflect URL validation gap instead of textarea XSS?"], - "nextActions": ["Ready for handoff to Planning Mode"] -} diff --git a/.copilot-tracking/spec-sessions/jtbd-005-code-quality.state.json b/.copilot-tracking/spec-sessions/jtbd-005-code-quality.state.json deleted file mode 100644 index db6c523..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-005-code-quality.state.json +++ /dev/null @@ -1,64 +0,0 @@ -{ - "jtbdId": "JTBD-005", - "jtbdStatement": "Help developers maintain GTC code quality", - "lastAccessed": "2026-01-22T12:00:00Z", - "currentPhase": "handoff", - "completedPhases": [ - "jtbd-discovery", - "topic-decomposition", - "topic-research", - "spec-generation" - ], - "stories": [ - "SA-746", - "SA-424", - "SA-745", - "SA-238", - "SA-249", - "SA-250", - "SA-245" - ], - "topics": [ - { - "name": "architecture-refactoring", - "description": "The architecture refactoring extracts duplicate API logic into services and splits the repository layer into focused modules", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/architecture-refactoring-research.md", - "specFile": "specs/architecture-refactoring.md", - "status": "complete", - "stories": ["SA-746", "SA-424"] - }, - { - "name": "dependency-injection", - "description": "The dependency injection refactoring adopts FastAPI's DI patterns for configuration and services", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/dependency-injection-research.md", - "specFile": "specs/dependency-injection.md", - "status": "complete", - "stories": ["SA-238"] - }, - { - "name": "ci-code-quality", - "description": "The CI code quality enforcement adds linting, formatting, and pre-push hooks with drift reconciliation", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/ci-code-quality-research.md", - "specFile": "specs/ci-code-quality.md", - "status": "complete", - "stories": ["SA-745"] - }, - { - "name": "code-conventions", - "description": "The code conventions standardize Pydantic model usage, exception handling, and logging patterns", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/code-conventions-research.md", - "specFile": "specs/code-conventions.md", - "status": "complete", - "stories": ["SA-249", "SA-250", "SA-245"] - } - ], - "openQuestions": [], - "nextActions": [ - "Hand off to Planning Mode to generate IMPLEMENTATION_PLAN.md", - "Run task-planner for gap analysis against existing code" - ] -} diff --git a/.copilot-tracking/spec-sessions/jtbd-006-documentation.state.json b/.copilot-tracking/spec-sessions/jtbd-006-documentation.state.json deleted file mode 100644 index b3d8757..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-006-documentation.state.json +++ /dev/null @@ -1,57 +0,0 @@ -{ - "jtbdId": "JTBD-006", - "jtbdStatement": "Help teams understand GTC through documentation", - "stories": ["SA-835", "SA-422", "SA-205"], - "lastAccessed": "2026-01-22T12:00:00Z", - "currentPhase": "handoff", - "completedPhases": [ - "jtbd-discovery", - "topic-decomposition", - "topic-research", - "spec-generation" - ], - "topics": [ - { - "name": "docs-infrastructure", - "description": "The docs infrastructure provides MkDocs setup with build/serve commands and navigation structure", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/docs-infrastructure-research.md", - "specFile": "specs/docs-infrastructure.md", - "status": "complete" - }, - { - "name": "docs-content-strategy", - "description": "The content strategy defines audience-specific documentation organization, migration paths, and drift reconciliation workflows", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/docs-content-strategy-research.md", - "specFile": "specs/docs-content-strategy.md", - "status": "complete" - }, - { - "name": "tag-glossary", - "description": "The tag glossary surfaces tag definitions to users through the UI and allows definitions to be managed", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/tag-glossary-research.md", - "specFile": "specs/tag-glossary.md", - "status": "complete" - } - ], - "decisions": [ - { - "question": "Where should MkDocs be set up?", - "answer": "Backend Python environment", - "date": "2026-01-22" - }, - { - "question": "How should tag glossary definitions be stored?", - "answer": "Hybrid: config for system tags, database for SME-created tags", - "date": "2026-01-22" - } - ], - "openQuestions": [], - "nextActions": [ - "Hand off to Planning Mode to generate IMPLEMENTATION_PLAN.md", - "Planning Mode performs gap analysis against existing code", - "Building Mode implements tasks from plan" - ] -} diff --git a/.copilot-tracking/spec-sessions/jtbd-007-chunked-references.state.json b/.copilot-tracking/spec-sessions/jtbd-007-chunked-references.state.json deleted file mode 100644 index 01fa35f..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-007-chunked-references.state.json +++ /dev/null @@ -1,27 +0,0 @@ -{ - "jtbdId": "JTBD-007", - "jtbdStatement": "Help GTC handle chunked document references correctly", - "lastAccessed": "2026-01-22T12:00:00Z", - "currentPhase": "spec-generation", - "completedPhases": ["jtbd-discovery", "topic-decomposition", "topic-research", "spec-generation"], - "topics": [ - { - "name": "reference-identity", - "description": "The reference identity system uses chunk ID from the search index as the primary uniqueness key instead of URL", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/reference-identity-research.md", - "specFile": "specs/reference-identity.md", - "status": "specified" - } - ], - "openQuestions": [], - "resolvedQuestions": [ - "UI should display chunk ID and allow users to view chunk text content", - "Identity key is chunk ID alone; references are stored per-turn in history[].refs[]" - ], - "stories": ["SA-821", "SA-257"], - "excludedStories": { - "SA-447": "Moved to separate JTBD for export/split-tags" - }, - "nextActions": ["Hand off to Planning Mode to generate IMPLEMENTATION_PLAN.md"] -} diff --git a/.copilot-tracking/spec-sessions/jtbd-008-cosmos-performance.state.json b/.copilot-tracking/spec-sessions/jtbd-008-cosmos-performance.state.json deleted file mode 100644 index 15f72d8..0000000 --- a/.copilot-tracking/spec-sessions/jtbd-008-cosmos-performance.state.json +++ /dev/null @@ -1,49 +0,0 @@ -{ - "jtbdId": "JTBD-008", - "jtbdStatement": "Help optimize GTC performance and Cosmos usage", - "lastAccessed": "2026-01-22T12:30:00Z", - "currentPhase": "spec-generation", - "completedPhases": ["jtbd-discovery", "topic-decomposition", "topic-research", "spec-generation"], - "topics": [ - { - "name": "cosmos-indexing", - "description": "The indexing strategy limits indexed fields to reduce write RU costs", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/cosmos-indexing-research.md", - "specFile": "specs/cosmos-indexing.md", - "status": "specified", - "stories": ["SA-242"] - }, - { - "name": "partial-updates", - "description": "The partial update system patches only changed fields instead of replacing entire documents", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/partial-updates-research.md", - "specFile": "specs/partial-updates.md", - "status": "specified", - "stories": ["SA-244"] - }, - { - "name": "query-optimization", - "description": "The query optimization effort replaces expensive cross-partition queries with efficient patterns", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/query-optimization-research.md", - "specFile": "specs/query-optimization.md", - "status": "specified", - "stories": ["SA-247", "SA-248"] - }, - { - "name": "concurrency-control", - "description": "The concurrency control mechanism prevents race conditions during simultaneous updates", - "scopeValidated": true, - "researchFile": ".copilot-tracking/subagent/20260122/concurrency-control-research.md", - "specFile": "specs/concurrency-control.md", - "status": "specified", - "stories": ["SA-246"] - } - ], - "openQuestions": [], - "nextActions": [ - "Hand off to Planning Mode for implementation plan generation" - ] -} diff --git a/.copilot-tracking/subagent/20260121/api-logic-research.md b/.copilot-tracking/subagent/20260121/api-logic-research.md deleted file mode 100644 index 8fd05fc..0000000 --- a/.copilot-tracking/subagent/20260121/api-logic-research.md +++ /dev/null @@ -1,224 +0,0 @@ ---- -title: API logic research -description: Candidates for moving business logic out of FastAPI handlers into service-layer modules -author: GitHub Copilot -ms.date: 2026-01-21 -ms.topic: reference -keywords: - - fastapi - - service layer - - refactor - - concurrency - - etag - - tags -estimated_reading_time: 8 ---- - -## Goal - -Identify backend API endpoints that contain business logic beyond orchestration, and map that logic to service-layer boundaries. - -## Summary of findings - -* Several handlers in `backend/app/api/v1/*` perform domain workflows directly against `container.repo`. -* The heaviest duplication centers on: - * Partial-update semantics across multiple fields - * ETag enforcement and error mapping - * Tag constraints and computed-tag recomputation - * History parsing (including references embedded in history) -* Services already exist for several chunks of this logic (`AssignmentService`, `TaggingService`, `ValidationService`, `TagRegistryService`, `SnapshotService`, `ChatService`), but handlers still own cross-cutting workflow steps. - -## Service boundary guidance - -* API layer responsibilities - * Authenticate and authorize - * Parse inputs, perform lightweight request-shape validation - * Translate service errors to HTTP status codes -* Service layer responsibilities - * Domain workflows and state transitions - * Concurrency rules (ETag requirements) and retryable failures - * Tag normalization, manual tag constraints, computed-tag recomputation - * Shared parsing/normalization of payload fields that appear across endpoints - -## Endpoint candidates - -### 1) SME assignment update workflow - -File: `backend/app/api/v1/assignments.py` - -Excerpt: [backend/app/api/v1/assignments.py#L72-L255](backend/app/api/v1/assignments.py#L72-L255) - -What is happening in the handler: - -* Joins multiple concerns: - * Ownership enforcement (`assignedTo` must match caller) - * Partial-update semantics driven by `model_fields_set` - * Approval/status transitions that clear assignment and set review metadata - * Parsing and validating `history` with embedded `refs` - * ETag enforcement via `If-Match` or body `etag` and mapping mismatch to HTTP 412 - * Computed tag application before persisting - * Best-effort deletion of the assignment document after completion - -Service extraction candidates: - -* Move domain workflow into `AssignmentService` or a new `GroundTruthUpdateService` - * `update_assigned_item(dataset: str, bucket: UUID, item_id: str, user_id: str, update: AssignmentUpdateRequest, if_match: str | None) -> GroundTruthItem` - * Keep the API handler responsible for request parsing only -* Extract shared helpers for use by both assignments and ground-truth CRUD: - * `parse_history(payload_history: list[dict[str, Any]] | None) -> list[HistoryItem] | None` - * `require_etag(if_match: str | None, body_etag: str | None) -> str` - -Notes on existing services: - -* `apply_computed_tags` already exists in `app/services/tagging_service.py` and is called here from the handler -* The handler still owns the workflow steps and error mapping that are likely to be repeated elsewhere - -### 2) Single-item assignment orchestration - -File: `backend/app/api/v1/assignments.py` - -Excerpt: [backend/app/api/v1/assignments.py#L257-L323](backend/app/api/v1/assignments.py#L257-L323) - -What is happening in the handler: - -* Delegates assignment to `container.assignment_service.assign_single_item` -* Contains business-ish translation logic that likely belongs in a consistent error mapper: - * Converts different `ValueError` message substrings to 404 vs 409 vs 400 - -Service extraction candidates: - -* Keep `AssignmentService.assign_single_item` as-is, but standardize errors: - * Prefer typed exceptions (e.g., `NotFoundError`, `AlreadyAssignedError`, `InvalidStateError`) so HTTP mapping is stable and not substring-based - -### 3) Bulk import workflow - -File: `backend/app/api/v1/ground_truths.py` - -Excerpt: [backend/app/api/v1/ground_truths.py#L54-L127](backend/app/api/v1/ground_truths.py#L54-L127) - -What is happening in the handler: - -* Implements a full workflow, not just orchestration: - * Generates IDs for missing items using `randomname` - * Validates all items via `validate_bulk_items` and filters invalid items - * Optionally applies approval metadata for all surviving items - * Applies computed tags for each item (fetches registry once) - * Persists through `container.repo.import_bulk_gt` - -Service extraction candidates: - -* Move into a dedicated import service (or a `GroundTruthService`): - * `import_bulk(items: list[GroundTruthItem], *, buckets: int | None, approve: bool, user_id: str | None) -> ImportBulkResponse` -* Explicitly separate concerns: - * ID generation and order preservation - * Validation and error aggregation - * Approval metadata policy - * Tag recomputation policy - -Notes on existing services: - -* `validate_bulk_items` is in `app/services/validation_service.py` -* Computed-tag logic is in `app/services/tagging_service.py` -* The handler currently coordinates all these pieces and should become a thin wrapper - -### 4) Ground-truth list query validation - -File: `backend/app/api/v1/ground_truths.py` - -Excerpt: [backend/app/api/v1/ground_truths.py#L160-L252](backend/app/api/v1/ground_truths.py#L160-L252) - -What is happening in the handler: - -* Implements query normalization and validation rules: - * Coerces `status` string into `GroundTruthStatus` - * Validates `limit` and `page` - * Trims `itemId` and `refUrl`, enforces max lengths - * Parses comma-separated `tags` with max tag count and max length - -Service extraction candidates: - -* Keep low-level validation here if it stays purely request-level, but consider extracting for reuse: - * `normalize_list_query(status: str | None, item_id: str | None, ref_url: str | None, tags: str | None, page: int, limit: int) -> NormalizedQuery` - -### 5) Ground-truth update workflow - -File: `backend/app/api/v1/ground_truths.py` - -Excerpt: [backend/app/api/v1/ground_truths.py#L283-L394](backend/app/api/v1/ground_truths.py#L283-L394) - -What is happening in the handler: - -* Repeats many of the same concerns as the assignments update endpoint: - * Partial updates across multiple fields (including status coercion) - * Reference parsing from list payloads - * Explicit rejection of `computedTags` and legacy `tags` (business rule) - * Manual tag update, mapped through domain validation - * History parsing, including parsing `refs` and `expectedBehavior` - * ETag requirement and mismatch mapping to HTTP 412 - * Computed tag application before persisting - * Re-fetching to return latest ETag - -Service extraction candidates: - -* Create a shared update service used by both SME and admin-like updates: - * `update_item(dataset: str, bucket: UUID, item_id: str, payload: dict[str, Any], *, if_match: str | None, user_id: str | None) -> GroundTruthItem` -* Consolidate shared parsing/validation helpers with the SME update handler: - * History parsing and reference parsing - * ETag policy enforcement and mismatch translation - * Tag-field acceptance policy (manual-only) - -### 6) Bulk recompute computed tags - -File: `backend/app/api/v1/ground_truths.py` - -Excerpt: [backend/app/api/v1/ground_truths.py#L408-L504](backend/app/api/v1/ground_truths.py#L408-L504) - -What is happening in the handler: - -* Implements a batch domain workflow: - * Fetches items based on filter criteria - * Applies computed tags for each item and diffs tag sets - * On changes and `dry_run=false`, bypasses ETag and upserts - * Aggregates errors and logs a summary - -Service extraction candidates: - -* Move into `TaggingService` or a dedicated maintenance service: - * `recompute_computed_tags(*, dataset: str | None, status: GroundTruthStatus | None, dry_run: bool) -> RecomputeTagsResponse` -* Centralize the “bypass ETag for maintenance” rule in one place - -## Additional candidates - -### Chat endpoint input and error policy - -File: `backend/app/api/v1/chat.py` - -Excerpt: [backend/app/api/v1/chat.py#L29-L158](backend/app/api/v1/chat.py#L29-L158) - -Notes: - -* Message sanitation and validation are largely request-layer concerns. -* The handler owns error-to-status mapping and correlation ID propagation. If this pattern repeats, it could be centralized (for example, a shared exception-to-response utility), but it is not urgent compared to GT/assignment workflows. - -### Tags endpoint config precedence - -File: `backend/app/api/v1/tags.py` - -Excerpt: [backend/app/api/v1/tags.py#L66-L106](backend/app/api/v1/tags.py#L66-L106) - -Notes: - -* The handler determines the source of truth for manual tags based on `settings.ALLOWED_MANUAL_TAGS` vs persisted registry. -* This is a domain/config decision and is a good candidate for `TagRegistryService`: - * `list_manual_tags_with_computed_filtered() -> tuple[list[str], list[str]]` - -## Container and DI observations - -* API handlers frequently depend on the global `container` singleton and call `container.repo.*` directly. -* When extracting services, prefer constructor-injected dependencies (repo protocols, registry providers) to reduce implicit coupling and make unit testing easier. - -## Suggested next steps - -* Extract shared “ground truth update” workflow into a single service method used by both assignments and ground-truth CRUD. -* Replace substring-based error mapping with typed domain exceptions to stabilize HTTP status codes. -* Keep handler functions thin: authentication, request parsing, and response formatting only. diff --git a/.copilot-tracking/subagent/20260121/backend-requirements-research.md b/.copilot-tracking/subagent/20260121/backend-requirements-research.md deleted file mode 100644 index 972a191..0000000 --- a/.copilot-tracking/subagent/20260121/backend-requirements-research.md +++ /dev/null @@ -1,210 +0,0 @@ -# Backend Behavioral Requirements (Doc-Inferred) - -Date: 2026-01-21 - -This document captures stable, high-level backend behavioral requirements inferred from backend documentation. It is intended to describe “what the backend must do” in a testable way, not propose new features. - -## Scope and sources - -Primary sources reviewed: - -- backend/README.md -- backend/CODEBASE.md -- backend/docs/api-change-checklist-assignments.md -- backend/docs/assign-single-item-endpoint.md -- backend/docs/api-write-consolidation-plan.v2.md -- backend/docs/drift_cleanup.md -- backend/docs/tagging_plan.md -- backend/docs/export-pipeline.md -- backend/docs/multi-turn-refs.md -- backend/docs/history-tags-feature.md -- backend/docs/cosmos-emulator-limitations.md - -## Requirements (by area) - -### API wire conventions - -- The API accepts both snake_case and camelCase inputs, but responses are always camelCase (aliases) via Pydantic. - - Evidence (backend/CODEBASE.md#L32-L34): - > - Pydantic v2 with aliases: accept snake_case or camelCase on input; always output camelCase via model_dump(..., by_alias=True). - -### Health check - -- The service exposes a health endpoint at `GET /healthz`. - - Evidence (backend/CODEBASE.md#L14-L15, backend/CODEBASE.md#L159-L161): - > - GET /healthz returns repo/backend info (Cosmos details when active). - > - Health check: GET /healthz - -### Optimistic concurrency (ETag / If-Match) - -- Updates must enforce optimistic concurrency using Cosmos ETags. - - Clients may supply ETag via `If-Match` header or an `etag` field in request body (depending on endpoint); missing/mismatch maps to HTTP 412. - - Evidence (backend/CODEBASE.md#L32-L34): - > - Concurrency uses ETag: updates require If-Match header or etag in body; 412 on missing/mismatch. - -- Assignment write paths must require `If-Match` and return/echo the updated ETag. - - Evidence (backend/docs/api-change-checklist-assignments.md#L7-L12, backend/docs/api-change-checklist-assignments.md#L74-L90): - > - Require `If-Match` on all write paths (approve/skip/delete) and return updated ETag. - > - Request headers (required): - > - `If-Match: ` (all write paths) - > - 412 Precondition Failed: Missing/invalid ETag. Error code `IF_MATCH_REQUIRED` or `ETAG_MISMATCH`. Include current ETag in `ETag` header. - - Evidence (backend/docs/api-change-checklist-assignments.md#L168-L178): - > - Concurrency: All writes require `If-Match` with the current ETag. On mismatch, return 412 and provide the current ETag in the `ETag` header. - > - ETag: Return the new ETag in the 200 response body and `ETag` header. - -### Delete semantics (soft delete) - -- “Delete” is represented as `status=deleted` (soft delete), and list APIs filter deleted items unless status is explicitly requested. - - Evidence (backend/CODEBASE.md#L33-L34): - > - Soft-delete via status=deleted; list APIs filter unless status is specified. - -### Write surface area (consolidation) - -- Ground Truth item writes are consolidated to two update endpoints: SME PUT and Curator PUT. - - Evidence (backend/docs/api-write-consolidation-plan.v2.md#L28-L36, backend/docs/api-write-consolidation-plan.v2.md#L60-L67): - > - SME PUT `/v1/assignments/{item_id}` - > - Add: optional `etag` in body, and accept `If-Match` header - > - Curator PUT `/v1/ground-truths/{datasetName}/{item_id}` - > - Add: optional `etag` in body, and accept `If-Match` header - > - Only two endpoints perform writes to GT items: SME PUT and Curator PUT. - -- Curator import remains a separate POST and is create-only (no updates). - - Evidence (backend/docs/api-write-consolidation-plan.v2.md#L38-L45, backend/docs/api-write-consolidation-plan.v2.md#L62-L64): - > - Curator POST `/v1/ground-truths` (import) - > - Unchanged in path/method; clarify it’s for create/import only (no updates) - > - Curator POST import remains for create-only flows. - -- Reference subroutes are removed; references are handled via the PUTs. - - Evidence (backend/docs/api-write-consolidation-plan.v2.md#L21-L25, backend/docs/api-write-consolidation-plan.v2.md#L64-L66): - > | POST | /v1/ground-truths/{datasetName}/{item_id}/references | Curator | Add references to item | Remove | Fold into Curator PUT with references | - > | DELETE | /v1/ground-truths/{datasetName}/{item_id}/references/{ref_id} | Curator | Remove a specific reference | Remove | Fold into Curator/SME PUT via references | - > - Reference-specific endpoints are removed and covered by references in PUTs. - -### Assignments - -- Ownership must be enforced on SME mutation routes; non-owner attempts return 403 with stable error code. - - Evidence (backend/docs/api-change-checklist-assignments.md#L7-L12, backend/docs/api-change-checklist-assignments.md#L82-L86): - > - Enforce ownership on SME update route with 403 and stable error code. - > - 403 Forbidden: Ownership violation. Error code `ASSIGNMENT_OWNERSHIP`. - - Evidence (backend/docs/api-change-checklist-assignments.md#L168-L176): - > - Ownership: Only the currently assigned user may mutate. If unassigned or assigned to a different user, return 403/`ASSIGNMENT_OWNERSHIP`. - -- On transitions to skipped/approved/deleted, assignment fields must be cleared atomically (same write). - - Evidence (backend/docs/api-change-checklist-assignments.md#L10-L12, backend/docs/api-change-checklist-assignments.md#L175-L178): - > - Clear assignment fields atomically on transitions (skipped/approved/deleted). - > - Assignment clearing: On transitions to skipped/approved/deleted, clear `assignedTo` and `assignedAt` atomically with the status change. - -- Assignment timestamps should be timezone-aware UTC (RFC3339), set via `datetime.now(timezone.utc)`. - - Evidence (backend/docs/api-change-checklist-assignments.md#L13-L14, backend/docs/api-change-checklist-assignments.md#L182-L184): - > - Use timezone-aware UTC timestamps via `datetime.now(timezone.utc)` when setting or updating `assignedAt` or other timestamps. - > - `assignedAt` (nullable, RFC3339 UTC). Set with `datetime.now(timezone.utc)`. - -- `/v1/assignments/my` response must include `etag` in the JSON body (headers optional). - - Evidence (backend/docs/api-change-checklist-assignments.md#L19-L24, backend/docs/api-change-checklist-assignments.md#L35-L38): - > - `etag` (string) - > - `assignedAt` (string, RFC3339 UTC) - > - `ETag` header is optional per-item, but the item’s `etag` MUST be included in the JSON body. - -- Single-item assignment endpoint (`POST /v1/assignments/{dataset}/{bucket}/{item_id}/assign`) must enforce 409 conflict when draft-assigned to another user, and on success always sets status to draft. - - Evidence (backend/docs/assign-single-item-endpoint.md#L22-L29, backend/docs/assign-single-item-endpoint.md#L33-L43): - > - **409 Conflict**: Item is already assigned to another user in draft state - > 2. **Items assigned to another user (draft status)**: Cannot be assigned (409 Conflict) ❌ - > **Important**: When an item is assigned, its status is **always set to draft**, regardless of previous state (approved, deleted, skipped, etc.). - -- Successful assignment must create/upsert a secondary “assignment document” in the assignments container (materialized view) for fast per-user queries. - - Evidence (backend/docs/assign-single-item-endpoint.md#L85-L95): - > When an item is successfully assigned, an assignment document is created in the assignments container with: - > ... - > This materialized view allows fast retrieval of all items assigned to a user via `/v1/assignments/my`. - -### Tagging - -- Tags must be stored and returned in canonical `group:value` format (lowercase), with normalization (trim/lowercase/dedupe/sort) for deterministic output. - - Evidence (backend/docs/tagging_plan.md#L3-L10): - > - Canonical form is `group:value` (all lowercase). Inputs are normalized (trimmed, lowercased, deduplicated, sorted for determinism). - - Evidence (backend/docs/tagging_plan.md#L54-L60): - > - Lowercase group and value; trim whitespace; collapse inner whitespace; accept and normalize `group : value` to `group:value`. - > - Deduplicate after normalization; sort ascending for deterministic storage. - -- Unknown groups/values are allowed, but known-group behavioral rules (e.g., exclusivity) must be enforced. - - Evidence (backend/docs/tagging_plan.md#L5-L10, backend/docs/tagging_plan.md#L60-L66): - > - Unknown groups and values are allowed. We do not enforce membership in a hardcoded set. - > - For known groups defined in our schema, we still enforce behavioral rules like mutual exclusivity. - > - Unknown groups or values are allowed. We only enforce format and known-group rules. - > - Exclusive groups (in the known schema) may contain at most one value. - -### Snapshot export pipeline - -- Snapshot export supports `attachment` and `artifact` delivery modes; missing/empty request body uses defaults equivalent to `attachment`. - - Evidence (backend/docs/export-pipeline.md#L26-L34): - > Supports `attachment` or `artifact` delivery. - > If the request body is omitted or `{}`, the server uses defaults (equivalent to `delivery.mode=attachment`). - -- `GET /v1/ground-truths/snapshot` always returns a JSON document payload (not artifacts). - - Evidence (backend/docs/export-pipeline.md#L33-L38): - > * Always returns a JSON document payload (not storage artifacts) - -- Artifact exports must write one JSON file per item plus a manifest under a deterministic path, and the manifest includes `schemaVersion` currently `v2`. - - Evidence (backend/docs/export-pipeline.md#L76-L90): - > Artifacts are written under: - > * `exports/snapshots/{snapshotAt}/ground-truth-{id}.json` - > * `exports/snapshots/{snapshotAt}/manifest.json` - > ... - > * `schemaVersion` (currently `v2`) - -- Export processors run before formatting; `merge_tags` merges manual/computed tags into a single sorted union `tags` array. - - Evidence (backend/docs/export-pipeline.md#L116-L128): - > ### `merge_tags` - > Merges tag fields into a single `tags` array on each exported document: - > * Reads `manualTags`/`manual_tags` and `computedTags`/`computed_tags` - > * Writes `tags` as a sorted union of the two - -### Multi-turn history: refs + per-turn tags - -- The backend must remain backward compatible with top-level `refs` while supporting optional per-history-item `refs` for assistant messages. - - Evidence (backend/docs/multi-turn-refs.md#L5-L8, backend/docs/multi-turn-refs.md#L67-L73): - > This change maintains backward compatibility with the existing top-level `refs` field. - > 1. **Top-level `refs` field preserved**: The `GroundTruthItem.refs` field at the top level remains unchanged and continues to work as before. - > 2. **Optional refs in history**: The `refs` field in `HistoryItem` is optional (defaults to `None`), so existing history items without refs continue to work. - -- History item `tags` is optional and defaults to an empty list; when parsing, accept both `msg` and `content` field names. - - Evidence (backend/docs/multi-turn-refs.md#L71-L76): - > 3. **Optional tags in history**: The `tags` field in `HistoryItem` is optional (defaults to an empty list), so existing history items without tags continue to work. - > 4. **Flexible field names**: The parser supports both `msg` and `content` field names for the message text, accommodating different client implementations. - -- Tags validation for history items is intentionally permissive: list-of-strings, no value-format restrictions, duplicates allowed. - - Evidence (backend/docs/history-tags-feature.md#L140-L150): - > - Tags must be a list of strings (enforced by Pydantic) - > - No format restrictions on individual tag values - > - Empty lists are allowed - > - Duplicate tags are allowed (no automatic deduplication at model level) - -### Observability and user identity - -- Logs must include a `user=` field derived per request; in dev mode it comes from `X-User-Id` header (else `anonymous`). - - Evidence (backend/README.md#L334-L341): - > Every log line now includes a `user=` field derived per request: - > - Dev mode (Easy Auth disabled): uses the `X-User-Id` header if provided, otherwise `anonymous`. - > - Tests can set `X-User-Id` to simulate multiple users. - -### Local dev and emulator constraints - -- When using Cosmos Emulator and multiturn data containing Unicode, the backend must support disabling unicode escaping to avoid emulator parsing bugs. - - Evidence (backend/README.md#L104-L121): - > - `GTC_COSMOS_DISABLE_UNICODE_ESCAPE=true` (workaround for emulator Unicode bug with multiturn data) - > **Solution:** Set `GTC_COSMOS_DISABLE_UNICODE_ESCAPE=true` ... ensures that the backend sends real UTF-8 characters instead of escape sequences... - -- Emulator does not support `ARRAY_CONTAINS`, so tag-filtering queries against the emulator cannot rely on server-side `ARRAY_CONTAINS` behavior. - - Evidence (backend/docs/cosmos-emulator-limitations.md#L5-L18, backend/README.md#L248-L259): - > **Issue:** The Cosmos DB Emulator does not support the `ARRAY_CONTAINS` SQL function... - > ... - > Integration tests that test tag filtering functionality must be skipped when using the emulator - > ... - > **Note:** Some tests are skipped when using the Cosmos DB Emulator due to unsupported features (e.g., `ARRAY_CONTAINS` for tag filtering). - -## Notes / interpretation boundaries - -- Some docs (e.g., drift cleanup) describe the intended “design compliance” direction rather than a fully enforced current behavior; items listed under “Goal/Acceptance criteria” are treated here as target requirements. - - Evidence (backend/docs/drift_cleanup.md#L7-L18): - > Goal: align current FastAPI endpoints so all Ground Truth writes happen only via SME PUT and Curator PUT... - > ... ETag-based concurrency enforced, and reference-specific routes removed. \ No newline at end of file diff --git a/.copilot-tracking/subagent/20260121/citation-validation.md b/.copilot-tracking/subagent/20260121/citation-validation.md deleted file mode 100644 index 2463083..0000000 --- a/.copilot-tracking/subagent/20260121/citation-validation.md +++ /dev/null @@ -1,41 +0,0 @@ - -# Citation Validation Report (2026-01-21) - -## Status - -Complete. - -- Target document: [.copilot-tracking/research/20260121-high-level-requirements-research.md](../research/20260121-high-level-requirements-research.md) -- Target document: [.copilot-tracking/research/20260121-high-level-requirements-research.md](../../research/20260121-high-level-requirements-research.md) -- Validation scope: all Markdown links in the form `[label](path#Lx)` or `[label](path#Lx-Ly)` -- Run date: 2026-01-21 - -## Key Findings - -- Total citations found: 86 -- Unique citations (deduplicated by `(path, startLine, endLine)`): 69 -- Broken citations: 0 - - Missing file: 0 - - Line range beyond EOF: 0 - - Invalid line numbers (e.g., < 1 or reversed): 0 - -Notes: - -- The document contains repeated citations (expected, because the same source supports multiple requirements). -- Several citations intentionally use single-line anchors (e.g., `#L79-L79`). These are valid and within file bounds. - -## Fix List (Corrected Line Ranges) - -No corrections were required. - -- Count: 0 -- Document changes applied: none - -## Validation Method - -- Parsed the target Markdown and extracted all citations matching `...](#L(-L)?)`. -- For each unique citation: - - Verified the target file exists at the repo-relative path. - - Counted file lines and validated `1 <= start <= end <= lineCount`. - -If you want a stronger “semantic” validation pass (confirming the referenced lines actually contain the claimed behavior), tell me which sections are highest priority and I’ll spot-check them and tighten ranges where appropriate. diff --git a/.copilot-tracking/subagent/20260121/consolidated-requirements-synthesis.md b/.copilot-tracking/subagent/20260121/consolidated-requirements-synthesis.md deleted file mode 100644 index 09d8b48..0000000 --- a/.copilot-tracking/subagent/20260121/consolidated-requirements-synthesis.md +++ /dev/null @@ -1,298 +0,0 @@ ---- -title: Consolidated requirements synthesis -description: Consolidated high-level requirements derived from subagent research reports, with primary-source evidence and identified ambiguities. -author: GitHub Copilot (subagent) -ms.date: 2026-01-21 -ms.topic: reference -keywords: - - requirements - - synthesis - - ground truth curation - - backend - - frontend -estimated_reading_time: 15 ---- - -## Scope - -This document consolidates high-level requirements from prior subagent research reports into a single, testable requirements set. - -Notes on inputs: - -* Two requested inputs were not found at the expected paths: - * `.copilot-tracking/subagent/20260121/conventions-and-sources-research.md` - * `.copilot-tracking/subagent/20260121/prd-requirements-research.md` -* Closest available subagent sources used instead: - * `.copilot-tracking/subagent/20260121/conventions-research.md` - * `.copilot-tracking/subagent/20260121/backend-requirements-research.md` - * `.copilot-tracking/subagent/20260121/frontend-requirements-research.md` - * `.copilot-tracking/subagent/20260121/cosmos-repo-research.md` (constraints only) - * `.copilot-tracking/subagent/20260121/api-logic-research.md` and `.copilot-tracking/subagent/20260121/synthesis-notes.md` (constraints only) - -All requirements below include at least one primary source reference (repo file path plus line range) as cited by the subagent reports. - -## Top 10 requirements - -1. The system must support an assignment-based curation workflow where users work primarily from an assigned-items queue. - Evidence: frontend/CODEBASE.md#L128-L148; backend/docs/assign-single-item-endpoint.md#L85-L95. -1. The backend must enforce optimistic concurrency on writes using Cosmos DB ETags, requiring `If-Match` and returning updated ETags. - Evidence: backend/docs/api-change-checklist-assignments.md#L74-L90; backend/docs/api-change-checklist-assignments.md#L168-L178. -1. The UI must gate approval based on reference completeness (selection, visited state, and minimum key-paragraph length). - Evidence: frontend/CODEBASE.md#L75-L79; frontend/src/components/app/defaultCurateInstructions.md#L1-L4. -1. The backend must implement soft delete via status transitions and exclude deleted items from lists unless explicitly requested. - Evidence: backend/CODEBASE.md#L33-L34; frontend/CODEBASE.md#L145-L147. -1. References must support search-and-add and selected-reference management, including de-duplication by URL and visited tracking. - Evidence: frontend/CODEBASE.md#L136-L144. -1. The system must support multi-turn conversation editing with per-turn metadata and additional approval constraints. - Evidence: frontend/IMPLEMENTATION_SUMMARY.md#L88-L112; backend/docs/multi-turn-refs.md#L67-L76. -1. Snapshot export must support attachment delivery and artifact delivery with a manifest and stable schema versioning. - Evidence: backend/docs/export-pipeline.md#L26-L34; backend/docs/export-pipeline.md#L76-L90. -1. Tag storage and behavior must normalize tags into a canonical `group:value` format and enforce known-group behavioral rules. - Evidence: backend/docs/tagging_plan.md#L3-L10; backend/docs/tagging_plan.md#L60-L66. -1. The backend must be usable with the Cosmos DB Emulator for local development, with documented emulator limitations handled safely. - Evidence: backend/docs/cosmos-emulator-limitations.md#L5-L27; backend/app/main.py#L56-L85. -1. Telemetry must be opt-in and safe-by-default, and the UI must present a user-friendly error boundary. - Evidence: frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L13-L18; frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L79-L86. - -## Product goals - -* Enable curators and SMEs to curate ground-truth items efficiently using an assignment-based workflow with a focused curation workspace. - * Evidence: - * frontend/CODEBASE.md#L70-L80. - * frontend/CODEBASE.md#L128-L148. -* Support both single-turn (Q/A) and multi-turn (conversation history) ground-truth formats. - * Evidence: - * frontend/IMPLEMENTATION_SUMMARY.md#L88-L112. - * backend/docs/multi-turn-refs.md#L67-L76. -* Preserve backward compatibility for existing stored item shapes while introducing multi-turn enhancements. - * Evidence: - * backend/docs/multi-turn-refs.md#L67-L73. - -## Frontend UX - -* Provide a single-page curation workspace with a multi-pane layout (queue, editor/actions, references, stats/modals). - * Evidence: - * frontend/CODEBASE.md#L70-L80. -* Provide an assigned-items queue that supports selection, refresh, and visibility of key item attributes. - * Evidence: - * frontend/CODEBASE.md#L128-L148. -* Provide a self-serve assignment action with a configurable default limit. - * Evidence: - * frontend/README.md#L20-L44. - * frontend/CODEBASE.md#L128-L148. -* Enable editing of item content, including that saving is not blocked by a “change category” requirement. - * Evidence: - * frontend/CODEBASE.md#L86-L95. -* Provide references UX with two experiences: search candidates and manage selected/attached references. - * Evidence: - * frontend/CODEBASE.md#L136-L144. -* Prevent duplicate reference additions by URL, including disabling add when URL is already present. - * Evidence: - * frontend/CODEBASE.md#L136-L144. -* Support opening references in a new tab and marking visited state, and provide user feedback when popups are blocked. - * Evidence: - * frontend/CODEBASE.md#L136-L144. - * frontend/CODEBASE.md#L215-L233. -* Allow capturing a key paragraph per selected reference and display a length/counter affordance. - * Evidence: - * frontend/CODEBASE.md#L136-L144. -* Support removing a reference with an undo window. - * Evidence: - * frontend/CODEBASE.md#L136-L144. - * frontend/CODEBASE.md#L215-L233. -* Gate approval based on reference completeness and item state (deleted items cannot be approved). - * Evidence: - * frontend/CODEBASE.md#L75-L79. - * frontend/CODEBASE.md#L145-L147. -* Provide save semantics that detect no-op updates and communicate “No changes”. - * Evidence: - * frontend/CODEBASE.md#L140-L146. -* Support soft-delete and restore workflows with clear UI indicators and approval gating. - * Evidence: - * frontend/CODEBASE.md#L145-L147. -* Support export as a backend-driven snapshot download. - * Evidence: - * frontend/CODEBASE.md#L145-L146. -* Support applying tags to an item using a known tag schema. - * Evidence: - * frontend/docs/MVP_REQUIREMENTS.md#L22-L27. -* Surface curation instructions as user-consumable markdown and support fetch/write per dataset with concurrency control. - * Evidence: - * frontend/docs/MVP_REQUIREMENTS.md#L15-L18. - * frontend/CODEBASE.md#L165-L168. -* Support multi-turn conversation editing with a timeline view, turn operations, and optional context. - * Evidence: - * frontend/IMPLEMENTATION_SUMMARY.md#L88-L112. -* Enforce multi-turn approval constraints beyond single-turn, including relevance marking and key-paragraph constraints for relevant references. - * Evidence: - * frontend/IMPLEMENTATION_SUMMARY.md#L147-L158. -* Provide keyboard shortcuts for save and approve attempts. - * Evidence: - * frontend/CODEBASE.md#L184-L184. -* Provide toast-based feedback for network failures and undo interactions. - * Evidence: - * frontend/CODEBASE.md#L215-L233. -* Provide a demo mode that disables telemetry and may use mock providers. - * Evidence: - * frontend/README.md#L74-L92. - -## Backend and API - -* Expose a health endpoint at `GET /healthz`. - * Evidence: - * backend/CODEBASE.md#L14-L15. - * backend/CODEBASE.md#L159-L161. -* Accept snake_case and camelCase inputs but always emit camelCase responses. - * Evidence: - * backend/CODEBASE.md#L32-L34. -* Enforce optimistic concurrency using Cosmos ETags on write paths. - * Evidence: - * backend/CODEBASE.md#L32-L34. - * backend/docs/api-change-checklist-assignments.md#L74-L90. -* Require `If-Match` on assignment write paths and return the updated ETag in both response headers and body. - * Evidence: - * backend/docs/api-change-checklist-assignments.md#L74-L90. - * backend/docs/api-change-checklist-assignments.md#L168-L178. -* Return 412 on missing or mismatched ETag with stable error codes and include the current ETag in the response. - * Evidence: - * backend/docs/api-change-checklist-assignments.md#L74-L90. - * backend/docs/api-change-checklist-assignments.md#L168-L178. -* Represent delete via soft delete semantics (`status=deleted`). - * Evidence: - * backend/CODEBASE.md#L33-L34. -* Consolidate ground-truth item writes into the SME PUT and Curator PUT endpoints, with reference changes folded into these updates. - * Evidence: - * backend/docs/api-write-consolidation-plan.v2.md#L28-L36. - * backend/docs/api-write-consolidation-plan.v2.md#L64-L66. -* Keep curator import as a create-only flow. - * Evidence: - * backend/docs/api-write-consolidation-plan.v2.md#L38-L45. - * backend/docs/api-write-consolidation-plan.v2.md#L62-L64. -* Enforce assignment ownership on SME mutation routes and return a stable ownership error. - * Evidence: - * backend/docs/api-change-checklist-assignments.md#L82-L86. - * backend/docs/api-change-checklist-assignments.md#L168-L176. -* Clear assignment fields atomically when transitioning to skipped, approved, or deleted. - * Evidence: - * backend/docs/api-change-checklist-assignments.md#L10-L12. - * backend/docs/api-change-checklist-assignments.md#L175-L178. -* Use timezone-aware UTC timestamps for assignment time fields. - * Evidence: - * backend/docs/api-change-checklist-assignments.md#L13-L14. - * backend/docs/api-change-checklist-assignments.md#L182-L184. -* Include `etag` in JSON bodies for assignment responses. - * Evidence: - * backend/docs/api-change-checklist-assignments.md#L19-L24. - * backend/docs/api-change-checklist-assignments.md#L35-L38. -* Provide a single-item assign endpoint that rejects items already draft-assigned to another user and sets status to draft upon successful assignment. - * Evidence: - * backend/docs/assign-single-item-endpoint.md#L22-L29. - * backend/docs/assign-single-item-endpoint.md#L33-L43. -* Create or upsert a secondary assignment document (materialized view) to enable fast per-user assigned-item queries. - * Evidence: - * backend/docs/assign-single-item-endpoint.md#L85-L95. - -## Data and storage - -* Support Cosmos DB as a persistence backend with a storage-layer abstraction. - * Evidence: - * backend/CODEBASE.md#L24-L30. - * backend/app/adapters/repos/base.py#L1-L55. -* Support a Cosmos emulator mode for local development, without blocking app startup if the emulator is not ready. - * Evidence: - * backend/app/main.py#L56-L85. - * backend/CODEBASE.md#L11-L14. -* Handle Cosmos emulator query limitations, including lack of `ARRAY_CONTAINS`, by adjusting behavior and skipping tests where appropriate. - * Evidence: - * backend/docs/cosmos-emulator-limitations.md#L5-L27. -* Support a safe workaround for Cosmos emulator Unicode/backslash parsing bugs when configured. - * Evidence: - * backend/README.md#L104-L121. - * backend/docs/cosmos-emulator-unicode-workaround.md#L35-L39. -* Preserve backward compatibility for stored ground-truth fields while extending the multi-turn model (optional refs and tags in history). - * Evidence: - * backend/docs/multi-turn-refs.md#L67-L76. - -## Export - -* Support snapshot export with `attachment` and `artifact` delivery modes, with stable defaults when the request is empty. - * Evidence: - * backend/docs/export-pipeline.md#L26-L34. -* Return JSON document payloads for snapshot download endpoints. - * Evidence: - * backend/docs/export-pipeline.md#L33-L38. -* For artifact delivery, write a deterministic set of per-item files plus a manifest that includes a stable `schemaVersion`. - * Evidence: - * backend/docs/export-pipeline.md#L76-L90. -* Run export processors before formatting and support merging tag fields into a single exported `tags` array. - * Evidence: - * backend/docs/export-pipeline.md#L116-L128. - -## Observability and operations - -* Include a per-request user identifier in logs. - * Evidence: - * backend/README.md#L334-L341. -* Provide opt-in telemetry that is disabled by default and safely no-ops when disabled or in demo mode. - * Evidence: - * frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L13-L18. - * frontend/README.md#L74-L92. -* Provide a UI error boundary that catches render failures and optionally reports exceptions when telemetry is enabled. - * Evidence: - * frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L79-L86. - -## Security and privacy - -* In dev mode, support user simulation via `X-User-Id` to drive per-user behaviors. - * Evidence: - * backend/README.md#L334-L341. - * frontend/README.md#L20-L44. -* Enforce ownership for assignment mutation endpoints to prevent unauthorized changes. - * Evidence: - * backend/docs/api-change-checklist-assignments.md#L82-L86. - * backend/docs/api-change-checklist-assignments.md#L168-L176. -* Keep telemetry safe-by-default and opt-in. - * Evidence: - * frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L13-L18. - -## Quality and testing - -* Maintain deterministic tag normalization behavior to support stable comparisons, exports, and tests. - * Evidence: - * backend/docs/tagging_plan.md#L54-L60. -* Skip or adjust tests in emulator mode when the emulator does not support required query capabilities. - * Evidence: - * backend/docs/cosmos-emulator-limitations.md#L5-L27. - * backend/README.md#L248-L259. - -## Cross-cutting constraints and notes - -These items are implementation-adjacent but reflect constraints or invariants documented in sources. - -* Prefer a layered architecture where API routes remain thin and workflow/state validation occurs in services rather than repository implementations. - * Evidence: - * backend/CODEBASE.md#L24-L30. - * backend/docs/assign-single-item-endpoint.md#L78-L87. -* Treat emulator compatibility as a first-class constraint for local development. - * Evidence: - * backend/docs/cosmos-emulator-limitations.md#L5-L27. - * backend/README.md#L104-L121. - -## Conflicts and ambiguities to resolve - -* Reference search and LLM endpoints appear inconsistent between frontend docs. - * Evidence: - * frontend/docs/MVP_REQUIREMENTS.md#L28-L36. - * frontend/CODEBASE.md#L136-L145. -* Tag semantics differ between “canonical group:value tags” and permissive per-history tags, and it is unclear which UI validations apply to which fields. - * Evidence: - * backend/docs/tagging_plan.md#L3-L10. - * backend/docs/history-tags-feature.md#L140-L150. -* Tag registry write support is unclear from the frontend requirements: it mentions “create new tags” while also stating no write endpoints for tags. - * Evidence: - * frontend/docs/MVP_REQUIREMENTS.md#L22-L27. -* Cosmos emulator unicode workaround coverage has potential drift for non-ground-truth containers. - * Evidence: - * backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md#L111-L128. - * backend/app/adapters/repos/tags_repo.py#L93-L124. - diff --git a/.copilot-tracking/subagent/20260121/conventions-and-sources-research.md b/.copilot-tracking/subagent/20260121/conventions-and-sources-research.md deleted file mode 100644 index 4b94a45..0000000 --- a/.copilot-tracking/subagent/20260121/conventions-and-sources-research.md +++ /dev/null @@ -1,170 +0,0 @@ ---- -title: Conventions and sources research -description: Repo guidance on documentation + identification of sources of truth for requirements vs implementation -author: GitHub Copilot -ms.date: 2026-01-21 -ms.topic: reference -keywords: - - conventions - - requirements - - documentation - - source of truth -estimated_reading_time: 6 ---- - -# Conventions and Sources Research — Requirements vs Implementation - -## Scope - -This note answers: - -- Where this repo documents **product requirements** (sources of truth) -- Where this repo documents **implementation plans/behavior** (derived from requirements) -- What **doc-writing conventions** exist (including markdown style constraints, if any) - -This is “research only” and does not propose changes. - -## Primary instruction sources (repo) - -### Copilot instruction files - -- Backend Copilot conventions: [backend/.github/copilot-instructions.md](../../../backend/.github/copilot-instructions.md#L1-L4) - - Timestamp rule: use `datetime.now(timezone.utc)` for timestamp updates. - - Typing rule: prefer built-in generics (`dict`, `list`) over `typing.Dict`/`typing.List`. - - Workflow hint: use notify MCP `show-notification` when done. -- Frontend Copilot conventions: [frontend/.github/copilot-instructions.md](../../../frontend/.github/copilot-instructions.md#L1) - - Workflow hint: use notify MCP `show-notification` when done. - -Note: There are duplicated copies under `workspace-1/` and `workspace-2/` mirroring the same instruction patterns. - -### Repo prompt templates (doc/plan authoring) - -- “Discussion prep” prompt: [backend/.github/prompts/build_context.prompt.md](../../../backend/.github/prompts/build_context.prompt.md#L1-L6) - - Explicitly instructs creating a markdown file in `/docs` for discussion preparation. -- “Planning” prompt: [backend/.github/prompts/plan.prompt.md](../../../backend/.github/prompts/plan.prompt.md#L1-L16) - - Explicitly instructs writing plans to `/plans/*-plan.md`. -- Frontend planning prompt: [frontend/.github/prompts/plan.prompt.md](../../../frontend/.github/prompts/plan.prompt.md#L1-L14) - - Similar planning guidance, but **does not** prescribe a plan output folder. - -## Sources of truth (product requirements) - -### 1) Canonical requirements doc - -- Requirements: [docs/ground-truth-curation-reqs.md](../../../docs/ground-truth-curation-reqs.md) - - Explicitly labeled “MVP Requirements”. - - Used as the declared requirements source of truth for backend implementation planning: [backend/docs/fastapi-implementation-plan.md](../../../backend/docs/fastapi-implementation-plan.md#L1-L7) - -Interpretation: - -- Treat `docs/ground-truth-curation-reqs.md` as the top-level contract for “what the system must do” (personas, scope, flows, and open questions). - -### 2) Business value framing - -- Product value narrative: [BUSINESS_VALUE.md](../../../BUSINESS_VALUE.md) - - Declares ground truth as “source of truth for model and agent evaluation”. - -Interpretation: - -- This doc is not a detailed functional spec, but it is a “why we’re building this” source and can anchor prioritization. - -### 3) Backlog / work-item source inputs - -- Jira-derived backlog lists: - - [prd.json](../../../prd.json) - - [prd-refined-1.json](../../../prd-refined-1.json) - - [prd-refined-2.json](../../../prd-refined-2.json) - - [prd-genericize.json](../../../prd-genericize.json) - - [Jira.csv](../../../Jira.csv) - -Interpretation: - -- These files look like work-item exports (issue IDs, titles, descriptions, status). They are useful for scope tracking and prioritization, but they are not written as a normative requirements spec. - -## Secondary “spec/design” docs (normative by area) - -These docs behave like “design specs” for specific subsystems and often use explicit language like “authoritative”, “canonical”, or “source of truth”. They appear intended to guide implementation behavior. - -### Tagging: manual vs computed - -- Manual tags design: [docs/manual-tags-design.md](../../../docs/manual-tags-design.md) - - Manual tags “remain authoritative” and are “source of truth” in `manualTags`: [docs/manual-tags-design.md](../../../docs/manual-tags-design.md#L16-L22) - - A merged `tags` view may exist, but is “not authoritative”: [docs/manual-tags-design.md](../../../docs/manual-tags-design.md#L43-L44) -- Computed tags design: [docs/computed-tags-design.md](../../../docs/computed-tags-design.md) - - Includes explicit “authoritative manual tags” examples. - -Related requirement gap: - -- The MVP requirements doc explicitly lists “Authoritative source of truth for tags” as an open question: [docs/ground-truth-curation-reqs.md](../../../docs/ground-truth-curation-reqs.md#L382) - -Interpretation: - -- Tag “source of truth” is partly specified (manualTags authoritative) but still called out as an open requirements question at the MVP level. - -### Frontend runtime configuration - -- Runtime config precedence: [docs/frontend-runtime-configuration.md](../../../docs/frontend-runtime-configuration.md) - - Declares backend env vars as “authoritative” and frontend `.env` as fallback-only: [docs/frontend-runtime-configuration.md](../../../docs/frontend-runtime-configuration.md#L23-L33) - -### Export schema and migration - -- Canonical export schema and migration: [docs/json-export-migration-plan.md](../../../docs/json-export-migration-plan.md) - - Uses “canonical schema” language for the JSON wire format. - -## Implementation sources (how the repo should be built/extended) - -### Backend implementation guides - -- Backend “authoritative” implementation guide: [backend/CODEBASE.md](../../../backend/CODEBASE.md) - - Explicitly says to add clarifications there so it “stays authoritative”: [backend/CODEBASE.md](../../../backend/CODEBASE.md#L222) -- Backend staged implementation plan: [backend/docs/fastapi-implementation-plan.md](../../../backend/docs/fastapi-implementation-plan.md) - - Explicitly derived from the canonical requirements doc. -- Backend feature/workflow specs (implementation-facing): [backend/docs/](../../../backend/docs/) - - Examples: API consolidation plans, export pipeline, tagging plan, emulator limitations/workarounds. - -Interpretation: - -- `backend/CODEBASE.md` is the “how to work in this codebase” source. -- `backend/docs/*` appears to be the system’s implementation-oriented spec set. - -### Frontend implementation guides - -- Frontend codebase guide: [frontend/CODEBASE.md](../../../frontend/CODEBASE.md) - - Documents architecture, conventions, and safe extension points. -- Frontend MVP checklist: [frontend/docs/MVP_REQUIREMENTS.md](../../../frontend/docs/MVP_REQUIREMENTS.md#L1) - - Appears to be a status-tracking checklist (items marked `[x]/[ ]`), mixing frontend needs with backend status notes. - -Interpretation: - -- `frontend/CODEBASE.md` is the best “implementation guide” for frontend structure. -- `frontend/docs/MVP_REQUIREMENTS.md` is useful operationally, but it reads more like a progress checklist than a normative product requirements doc. - -## Markdown / doc-writing style constraints (repo-observable) - -### 1) Frontmatter convention is common - -Many markdown documents include Microsoft Docs-style YAML frontmatter: - -- Example: `ms.date` / `ms.topic` in [docs/manual-tags-design.md](../../../docs/manual-tags-design.md#L1-L12) -- Example: `ms.date` / `ms.topic` in [frontend/CODEBASE.md](../../../frontend/CODEBASE.md#L1-L12) - -Interpretation: - -- For “real” documentation/spec files (especially in `docs/` and major `CODEBASE.md` guides), using YAML frontmatter appears to be the convention. - -### 2) Markdownlint appears in some artifacts, but no repo config was found - -- Multiple `.copilot-tracking/*` documents start with `` (evidence via grep), suggesting markdownlint is used somewhere in the authoring workflow. -- No `.markdownlint*` config file was found in this repo (search across common config names returned none). - -Interpretation: - -- There is no repo-visible markdownlint ruleset to follow, but some generated/tracking artifacts proactively disable markdownlint. - -### 3) Formatting/tooling constraints are primarily code-focused - -- Frontend uses Biome for lint/format via `biome check --write`: [frontend/package.json](../../../frontend/package.json#L7-L18) and config in [frontend/biome.json](../../../frontend/biome.json) - - This is primarily relevant to code (TS/JS/JSON). No repo evidence that markdown is formatted/linted by Biome here. - -## Notes on repo layout duplicates - -This repo contains `workspace-1/` and `workspace-2/` directories with mirrored docs and `.github` conventions. For “source of truth” purposes, the top-level `docs/`, `backend/`, and `frontend/` folders appear to be the canonical set; the workspace copies look like snapshots or sandboxes. diff --git a/.copilot-tracking/subagent/20260121/conventions-research.md b/.copilot-tracking/subagent/20260121/conventions-research.md deleted file mode 100644 index a8302be..0000000 --- a/.copilot-tracking/subagent/20260121/conventions-research.md +++ /dev/null @@ -1,185 +0,0 @@ -# Conventions Research — Backend Refactor (Repo/Service/API layering + Cosmos emulator) - -## Scope - -This note summarizes repository conventions and layering rules relevant to: - -- Moving workflow logic out of API/routes and repo implementations into services -- Handling Cosmos DB emulator differences (including a potential “emulator subclass” strategy) - -## Primary Sources - -- Architecture overview: [backend/CODEBASE.md](../../../../backend/CODEBASE.md#L8-L36) -- DI/composition container: [backend/app/container.py](../../../../backend/app/container.py#L1-L322) -- App startup wiring: [backend/app/main.py](../../../../backend/app/main.py#L1-L85) -- Repo protocol: [backend/app/adapters/repos/base.py](../../../../backend/app/adapters/repos/base.py#L1-L120) -- Cosmos repo implementation: [backend/app/adapters/repos/cosmos_repo.py](../../../../backend/app/adapters/repos/cosmos_repo.py#L630-L1890) -- Emulator docs: - - Conditional patch pattern: [backend/CONDITIONAL_PATCH_IMPLEMENTATION.md](../../../../backend/CONDITIONAL_PATCH_IMPLEMENTATION.md#L1-L88) - - Emulator limitations: [backend/docs/cosmos-emulator-limitations.md](../../../../backend/docs/cosmos-emulator-limitations.md#L1-L90) - - Unicode/backslash workarounds: [backend/docs/cosmos-emulator-unicode-workaround.md](../../../../backend/docs/cosmos-emulator-unicode-workaround.md#L1-L219), [backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md](../../../../backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md#L1-L230) - -## Conventions & Layering Rules - -### 1) Explicit layered architecture (API → Services → Repos/Adapters) - -The backend explicitly documents a layered architecture with composition in a central container: - -- API layer: routers in [backend/CODEBASE.md](../../../../backend/CODEBASE.md#L24-L30) -- Services layer: workflow logic in [backend/CODEBASE.md](../../../../backend/CODEBASE.md#L24-L30) -- Repositories/adapters layer: Cosmos repo implements a protocol in [backend/CODEBASE.md](../../../../backend/CODEBASE.md#L24-L30) -- Composition via singleton container: [backend/CODEBASE.md](../../../../backend/CODEBASE.md#L24-L30) - -Practical implication for refactor: - -- Route handlers should remain “thin” (HTTP parsing/validation + calling services). -- Services should own workflow/state validation and call repos. -- Repos should be storage-focused (querying/persistence, ETag enforcement), not business policy. - -### 2) Service layer owns state validation; repo methods can be intentionally state-agnostic - -The “assign single item” backend design doc explicitly states that `assign_to()` is state-agnostic and that state validation belongs in the service layer: - -- “State validation is the responsibility of the service layer” in [backend/docs/assign-single-item-endpoint.md](../../../../backend/docs/assign-single-item-endpoint.md#L78-L87) - -This is a strong precedent for moving validations/decision logic out of repo implementations and into services. - -### 3) DI/composition pattern: singleton `container` wires repos and services - -The DI approach is a simple global singleton container object used by routers and services: - -- Container class and global instance: [backend/app/container.py](../../../../backend/app/container.py#L34-L71), [backend/app/container.py](../../../../backend/app/container.py#L321-L322) -- Container lazily initializes repo/services; tests and lifespan call `init_cosmos_repo()` to bind to the current event loop: [backend/app/container.py](../../../../backend/app/container.py#L50-L56) -- Cosmos startup is centralized in `startup_cosmos()` and explicitly: - - creates repo instances - - initializes async clients - - validates containers - in [backend/app/container.py](../../../../backend/app/container.py#L190-L223) - -Practical implication for refactor: - -- New services should be registered on `Container` (as attributes) and wired in `init_cosmos_repo()` (or in `__init__` if repo-independent). -- Route handlers should call `container.` rather than `container.repo` when a workflow exists. - -### 4) Current state: routers sometimes call repos directly (mixed style) - -There is evidence of both patterns: - -- Direct repo calls from API routes: e.g. [backend/app/api/v1/ground_truths.py](../../../../backend/app/api/v1/ground_truths.py#L241-L246) and [backend/app/api/v1/ground_truths.py](../../../../backend/app/api/v1/ground_truths.py#L277-L293) -- But also service usage from routes: snapshot endpoints call `container.snapshot_service`: [backend/app/api/v1/ground_truths.py](../../../../backend/app/api/v1/ground_truths.py#L135-L151) - -Interpretation: - -- The repo supports both direct usage and service-orchestrated usage today. -- The documented architecture (and newer design docs) push toward service-owned workflows. - -## Emulator Handling Conventions - -### 1) Emulator is expected to be flaky/unready at startup; startup should be fail-soft - -Startup intentionally does not block if Cosmos init fails (emulator might not be ready): - -- “Don’t block startup; emulator may not be ready yet” in [backend/app/main.py](../../../../backend/app/main.py#L56-L85) -- Same idea documented in [backend/CODEBASE.md](../../../../backend/CODEBASE.md#L11-L14) - -Practical implication: - -- Emulator-specific subclasses/branches should preserve fail-soft behavior (don’t crash the app on emulator-only issues where possible). - -### 2) Conditional behavior for emulator compatibility is a standard pattern here - -The repo already uses “if emulator then alternate implementation” in multiple places: - -- Emulator detection via endpoint string: [backend/app/adapters/repos/cosmos_repo.py](../../../../backend/app/adapters/repos/cosmos_repo.py#L639-L641) - -**Conditional patching example (`assign_to`)** - -- Documented split into main + prod patch path + emulator read-modify-replace path: [backend/CONDITIONAL_PATCH_IMPLEMENTATION.md](../../../../backend/CONDITIONAL_PATCH_IMPLEMENTATION.md#L11-L22) -- Implemented selection logic in code: [backend/app/adapters/repos/cosmos_repo.py](../../../../backend/app/adapters/repos/cosmos_repo.py#L1719-L1737) - -This establishes a repo convention: - -- Prefer a single public method that routes internally based on emulator detection. -- Keep emulator compatibility paths available when Cosmos emulator lacks features. - -### 3) Emulator limitations drive in-memory fallbacks and test skips - -The emulator limitation on `ARRAY_CONTAINS` is explicitly documented: - -- Emulator does not support `ARRAY_CONTAINS`; tag filtering queries fail; tests are skipped: [backend/docs/cosmos-emulator-limitations.md](../../../../backend/docs/cosmos-emulator-limitations.md#L5-L27) -- Workaround: in-memory tag filtering fallback described in [backend/docs/cosmos-emulator-limitations.md](../../../../backend/docs/cosmos-emulator-limitations.md#L29-L36) - -And the Cosmos repo uses emulator-specific fallback for pagination with tags/ref_url: - -- “For queries with tags… filter in-memory… use in-memory filtering for ref_url if Cosmos emulator is used…” in [backend/app/adapters/repos/cosmos_repo.py](../../../../backend/app/adapters/repos/cosmos_repo.py#L694-L709) - -### 4) Emulator Unicode/backslash issues are handled via flag-driven transforms - -There are two related docs here: - -1) “Unicode character” normalization doc (smart quotes/dashes etc) - -- Workaround is activated by `GTC_COSMOS_DISABLE_UNICODE_ESCAPE=true` and should not be enabled in production: [backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md](../../../../backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md#L27-L33), [backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md](../../../../backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md#L148-L161) - -2) “Unicode escape sequence / backslash” bug doc (Base64 encode `refs.content`) - -- Final solution is Base64 encoding `refs[*].content` when `GTC_COSMOS_DISABLE_UNICODE_ESCAPE=true`: [backend/docs/cosmos-emulator-unicode-workaround.md](../../../../backend/docs/cosmos-emulator-unicode-workaround.md#L35-L39) -- Encoding/decoding helpers and `_contentEncoded` marker: [backend/docs/cosmos-emulator-unicode-workaround.md](../../../../backend/docs/cosmos-emulator-unicode-workaround.md#L41-L88) -- Explicit scope is only `refs[*].content` and only when the flag is true: [backend/docs/cosmos-emulator-unicode-workaround.md](../../../../backend/docs/cosmos-emulator-unicode-workaround.md#L105-L120) - -Practical implication: - -- Emulator-specific behavior is controlled through settings flags and is intentionally scoped to the minimum necessary. -- Any emulator subclass approach should respect and reuse these flags rather than introducing a second, parallel flag. - -## Settings / Flag Conventions Relevant to Emulator and Backend Selection - -- Settings use `GTC_` prefix and load env defaults from `environments/sample.env`: [backend/app/core/config.py](../../../../backend/app/core/config.py#L11-L21) -- Backend selection is via `REPO_BACKEND` (memory|cosmos): [backend/app/core/config.py](../../../../backend/app/core/config.py#L31-L34) -- Emulator-related flags: - - `USE_COSMOS_EMULATOR`: [backend/app/core/config.py](../../../../backend/app/core/config.py#L41-L46) - - `COSMOS_CONNECTION_VERIFY` (self-signed cert): [backend/app/core/config.py](../../../../backend/app/core/config.py#L44-L49) - - `COSMOS_DISABLE_UNICODE_ESCAPE`: [backend/app/core/config.py](../../../../backend/app/core/config.py#L47-L52) - - `COSMOS_TEST_MODE` (don’t init cosmos in lifespan): [backend/app/core/config.py](../../../../backend/app/core/config.py#L49-L53), [backend/app/main.py](../../../../backend/app/main.py#L58-L69) - -## Style / Misc. Engineering Conventions - -- Timestamp updates should use UTC: [backend/.github/copilot-instructions.md](../../../../backend/.github/copilot-instructions.md#L1) -- Prefer built-in generics (`dict`, `list`) over `typing.Dict`/`typing.List`: [backend/.github/copilot-instructions.md](../../../../backend/.github/copilot-instructions.md#L2) - -## Guidance for the Planned Refactor - -### Moving logic from repos/API into services - -Repository conventions support: - -- Keeping repo operations storage-centric and state-agnostic when appropriate, with state validation in services: [backend/docs/assign-single-item-endpoint.md](../../../../backend/docs/assign-single-item-endpoint.md#L78-L87) -- Using the singleton container to expose services (as already done for snapshots): [backend/app/api/v1/ground_truths.py](../../../../backend/app/api/v1/ground_truths.py#L135-L151) - -Suggested “shape” aligned with conventions: - -- Add/extend a service in `backend/app/services/*_service.py` -- Wire it on the container in `init_cosmos_repo()` so it gets the active repo -- Update routers to call the service - -### Introducing a Cosmos emulator subclass (interpretation) - -No doc explicitly mandates “subclassing,” but the repo has a clear convention of environment-conditional paths: - -- Internal switching inside the Cosmos repo based on `is_cosmos_emulator_in_use()`: [backend/app/adapters/repos/cosmos_repo.py](../../../../backend/app/adapters/repos/cosmos_repo.py#L639-L641) -- `assign_to()` explicitly uses two implementations (patch vs read-modify-replace) selected at runtime: [backend/app/adapters/repos/cosmos_repo.py](../../../../backend/app/adapters/repos/cosmos_repo.py#L1719-L1737) - -If you introduce a subclass, it should fit the existing composition pattern: - -- The selection should happen in container wiring (e.g., `init_cosmos_repo()`), not in routers. -- The emulator-specific class should still implement the same `GroundTruthRepo` protocol. -- It should retain the fail-soft startup posture (emulator might not be ready). - -A minimal-risk alternative consistent with existing code: - -- Keep a single `CosmosGroundTruthRepo` and add conditional internal branches for emulator-only incompatibilities (the existing pattern). - -## Notes / Gaps - -- There is no explicit “ports/adapters hexagonal architecture” guidance beyond the documented folder layout and the `GroundTruthRepo` protocol. -- Observability docs are extensive but not directly prescriptive for repo/service refactors, except indirectly (fail-soft + structured logging patterns). diff --git a/.copilot-tracking/subagent/20260121/cosmos-repo-research.md b/.copilot-tracking/subagent/20260121/cosmos-repo-research.md deleted file mode 100644 index c02fe3a..0000000 --- a/.copilot-tracking/subagent/20260121/cosmos-repo-research.md +++ /dev/null @@ -1,311 +0,0 @@ -# Cosmos repo + emulator mixing research (2026-01-21) - -## Scope - -Primary file: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py) - -Related emulator/config wiring: -- [backend/app/container.py](backend/app/container.py) -- [backend/app/core/config.py](backend/app/core/config.py) -- [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md) -- [backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md](backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md) -- [backend/app/adapters/repos/tags_repo.py](backend/app/adapters/repos/tags_repo.py) -- [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py) - -Goal: produce an inventory of code blocks inside cosmos_repo.py, classify into A/B/C, and propose concrete override seams for a new emulator-specific repo module (`cosmos_emulator.py`) that subclasses (or wraps) the production repo. - ---- - -## High-level finding - -`CosmosGroundTruthRepo` currently mixes: - -- Production Cosmos persistence concerns (SDK client creation, query construction, container calls, concurrency via ETags) -- Assignment/business workflow logic (sampling allocation, quota math, selection + de-biasing, user id validation) -- Emulator compatibility hacks (unicode sanitization, backslash sentinel, base64 refs encoding, EXISTS/ARRAY_CONTAINS workarounds, intermittent delete/upsert retries, conditional assignment via read-modify-replace) - -This makes it hard to reason about “production correctness” separately from “emulator survivability”, and it forces emulator constraints (like no `EXISTS` in SQL) into the default repo surface. - ---- - -## Inventory by category (line-cited) - -### A) Pure persistence concerns - -These blocks are “Cosmos adapter” responsibilities (query construction, paging/container calls, error mapping), and should remain in the repo layer. - -1) Cosmos client/connection policy setup and async loop binding -- Connection policy + retry options built from settings: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L260-L300) -- Async client initialization and container client acquisition: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L302-L356) - -2) Container existence validation with actionable error messages -- DB/container validation flow: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L358-L428) - -3) Document serialization/deserialization and schema compatibility -- `_to_doc()` converts model to JSON-safe dict, sets UUID bucket string, ensures updatedAt, persists computed `totalReferences`: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L386-L454) -- `_from_doc()` normalizes fetched doc, handles legacy `history=None`, validates to model: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L456-L474) - -4) Query construction primitives and safe sort clause construction -- Filter builder (status/dataset/item_id/tags/ref_url): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L526-L605) -- Sort resolution + stable in-memory sort key: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L607-L671) -- ORDER BY clause constructed via fixed mapping (no raw user input): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L679-L724) - -5) Paginated read path (production Cosmos) -- Direct query path with ORDER BY + OFFSET/LIMIT, then a second query for total count: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L726-L814) - -6) Counting logic -- Tag-aware count (SQL count for prod, in-memory tag check for emulator): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L913-L1043) -- Non-tag count uses `SELECT VALUE COUNT(1)` to avoid the “NonValueAggregate” plan issue: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1045-L1107) - -7) Basic CRUD paths -- List-by-dataset query (includes docType exclusion for curation docs): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1109-L1135) -- `get_gt()` read-item by hierarchical partition key: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1137-L1154) -- Curation instruction upsert with conditional replace by ETag: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1188-L1275) -- Assignment document CRUD in secondary container: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1841-L1989) - - -### B) Business/service logic (should be moved out) - -These blocks encode *workflow rules* and *domain-level decisions* rather than storage mechanics. They can be preserved, but should move to service layer(s). - -1) Total reference semantics are domain/business logic -- `totalReferences` is derived from either history refs or item refs, and the repo mutates the model during persistence: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L367-L385) and [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L386-L413) - -Why this is service logic: -- It encodes a product/business definition (“history refs take priority”) and impacts UI/filters. -- The adapter shouldn’t be responsible for deciding business meaning; it should persist what it’s given. - -Suggested owner: -- A new `GroundTruthDerivationsService` (or fold into existing `CurationService` / “ground truth service” if present). - -Suggested signatures: -- `class GroundTruthDerivationsService:` - - `def compute_total_references(self, item: GroundTruthItem) -> int` - - `def apply_derived_fields(self, item: GroundTruthItem) -> GroundTruthItem` (sets `totalReferences`, possibly `questionLength`, etc.) - -2) Sampling allocation, quotas, and selection are assignment workflow -- The repo contains a full sampling/selection algorithm including: - - fetching already-assigned items first - - reading sampling allocation config - - quota computation via largest remainder - - per-dataset candidate queries - - round-robin interleave + final global fill - - shuffling to debias query ordering - [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1388-L1600) -- Quota computation helper is pure allocation math: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1681-L1716) - -Why this is service logic: -- These are product-level rules about how to distribute assignment opportunities. -- It is hard to test in isolation when buried in the persistence adapter. - -Suggested owner: -- `AssignmentService` already exists and is the natural owner. It currently orchestrates `self_assign()` and retries by excluding seen IDs: [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py#L44-L152) - -Suggested refactor: -- Move sampling algorithm out of repo into `AssignmentService` (or a new `AssignmentSamplingService` used by `AssignmentService`). - -Suggested signatures: -- `class AssignmentSamplingService:` - - `async def sample_candidates(self, *, user_id: str, limit: int, exclude_ids: list[str] | None = None) -> list[GroundTruthItem]` - - `def compute_quotas(self, weights: dict[str, float], k: int) -> dict[str, int]` - -Repository then exposes *only* persistence queries: -- `async def list_unassigned_candidates_global(self, *, user_id: str, limit: int, exclude_ids: list[str] | None) -> list[GroundTruthItem]` -- `async def list_unassigned_candidates_by_dataset_prefix(self, *, dataset_prefix: str, user_id: str, limit: int, exclude_ids: list[str] | None) -> list[GroundTruthItem]` - -3) Input validation of `user_id` belongs in API/service -- Repo rejects user IDs not matching a regex: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1718-L1743) - -Why this is service logic: -- Validation semantics (“allowed chars”) are part of API contract; the repo should not have to know. - -Suggested owner: -- `AssignmentService` (or API layer) should validate `user_id` before calling repository. - -Suggested signature: -- `def validate_user_id(self, user_id: str) -> None` (raise a typed error) or return `bool`. - - -### C) Emulator / compatibility hacks - -These blocks exist specifically because the emulator’s behavior differs from production Cosmos DB. - -1) Unicode/control-char sanitization, invalid backslash escaping, and restoration -- Smart punctuation replacements + escape/backslash handling helpers: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L29-L118) -- Recursive normalization (emulator-only) and restore (sentinel back to backslash): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L121-L219) -- The public “intent wrapper” `_ensure_utf8_strings()` used by writes: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L430-L454) - -Note: The repo also adds a *second* workaround by base64-encoding `refs[*].content` to avoid emulator rejection of “certain character sequences”: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L53-L104) and [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L148-L176) - -2) SQL feature gaps: emulator incompatibilities drive in-memory filtering -- `list_gt_paginated()` routes to emulator path when `tags` or `ref_url` are present and endpoint is localhost: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L748-L770) -- Emulator pagination path disables SQL tag/ref_url filters (no ARRAY_CONTAINS strategy / no EXISTS) then filters in memory: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L816-L912) - -This is consistent with the emulator limitations doc: -- [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md#L1-L39) - -3) Conditional assignment: patch in production, read-modify-replace in emulator -- Environment detection: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L671-L677) -- `assign_to()` routes to emulator vs production implementation: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1718-L1752) -- Production implementation uses `patch_item` with non-parameterized filter_predicate (string interpolation): [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1754-L1838) -- Emulator implementation uses read-modify-replace: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1840-L1980) - -Related design note: -- [backend/CONDITIONAL_PATCH_IMPLEMENTATION.md](backend/CONDITIONAL_PATCH_IMPLEMENTATION.md#L1-L52) - -4) Retry logic for emulator intermittent errors + payload sanitization retry -- `upsert_gt()` includes special retry paths for: - - `etag_mismatch` mapping - - intermittent emulator “jsonb type as object key” errors - - emulator invalid JSON payload errors triggering a sanitize-and-retry - [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1277-L1402) -- `delete_dataset()` includes emulator-only retry for jsonb/HTTP-format errors, plus retry on deleting curation doc: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1422-L1525) - ---- - -## Related emulator knobs and behaviors (outside cosmos_repo.py) - -1) Settings flags and Cosmos knobs -- Emulator flags + unicode escape toggle live in Settings: [backend/app/core/config.py](backend/app/core/config.py#L28-L56) - -2) DI container currently always uses `CosmosGroundTruthRepo` -- Repo wiring picks `CosmosGroundTruthRepo` and only uses endpoint scheme / `USE_COSMOS_EMULATOR` to decide AAD vs key auth, not to change repo class: [backend/app/container.py](backend/app/container.py#L86-L138) - -3) Tags repo exists separately and does not currently apply the unicode workaround -- `CosmosTagsRepo.save_global_tags()` does a plain upsert without `_ensure_utf8_strings`: [backend/app/adapters/repos/tags_repo.py](backend/app/adapters/repos/tags_repo.py#L93-L124) - -This matters because [backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md](backend/COSMOS_EMULATOR_UNICODE_WORKAROUND.md#L111-L128) claims tags repo applies normalization; code appears to have drifted. - ---- - -## Proposed refactor direction - -### Objective - -Create a clean production repo with no emulator branches in hot paths, and move emulator constraints into a separate implementation in `backend/app/adapters/repos/cosmos_emulator.py`. - -### Recommended shape - -1) Production repo remains `CosmosGroundTruthRepo` -- Keep only production-correct Cosmos SQL usage and patch-based assignment. -- Keep generic retry policy based on Cosmos SDK RetryOptions (already configured in connection policy). - -2) New emulator repo: `CosmosEmulatorGroundTruthRepo` -- Subclass `CosmosGroundTruthRepo` and override only the minimal behavior differences. -- Keep emulator-only sanitization and retry logic local to emulator class. - -3) Move B-category logic into services -- Sampling/quotas to `AssignmentService` (or `AssignmentSamplingService`) -- Derived fields like `totalReferences` to a derivations service - ---- - -## Exact override seams for `cosmos_emulator.py` - -### Suggested class name - -`CosmosEmulatorGroundTruthRepo` - -### Suggested constructor signature - -Keep it 1:1 with production to minimize DI churn: - -- `def __init__(self, endpoint: str, key: str | None, db_name: str, gt_container_name: str, assignments_container_name: str, connection_verify: bool | str | None = None, test_mode: bool = False, credential: Any | None = None) -> None` - -(Optionally add `*, emulator_flags: EmulatorFlags | None = None` only if you want to decouple from global `settings`.) - -### Minimal subclass surface (recommended) - -Override these methods/properties only: - -1) Environment detection -- `def is_cosmos_emulator_in_use(self) -> bool` - - Return `True` unconditionally in the emulator subclass to eliminate endpoint string checks. - -2) Document transforms -- Add hook methods in the *production* base class (or override existing wrapper): - - `def _pre_write_transform(self, doc: dict[str, Any]) -> dict[str, Any]` - - `def _post_read_transform(self, doc: dict[str, Any]) -> dict[str, Any]` - -In the emulator subclass: -- `_pre_write_transform` applies: - - unicode/control-char sanitization - - backslash sentinel substitution - - base64 refs content encoding - - (optional) `json.dumps(..., ensure_ascii=True)` roundtrip if needed for emulator -- `_post_read_transform` applies restore + base64 decode - -These behaviors are currently spread across: -- [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L53-L219) -- Used in write paths like import/upsert/curation/assignment docs: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L494-L513) and [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1234-L1272) and [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1912-L1934) - -3) Pagination capability differences -- `async def list_gt_paginated(...)` - - Emulator subclass should route to `_list_gt_paginated_with_emulator` whenever `tags` or `ref_url` are present. - - Production base class keeps the direct SQL path. - -Currently: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L748-L770) - -4) Assignment method -- `async def assign_to(self, item_id: str, user_id: str) -> bool` - - Emulator subclass forces read-modify-replace flow. - - Production base forces patch flow. - -Currently: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1718-L1980) - -5) Emulator-only retry policy for deletes/upserts -- `async def upsert_gt(...)` and `async def delete_dataset(...)` - - Emulator subclass retains the intermittent emulator bug retries. - - Production base can keep ETag handling and rely on SDK retry options, avoiding emulator-specific message matching. - -Currently: -- Upsert retry + sanitize retry: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1277-L1402) -- Delete dataset retry: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1422-L1525) - -If you want an even smaller surface, introduce a single overridable policy method: -- `def _should_retry_emulator_exception(self, exc: Exception) -> bool` -And keep retry loops in base calling it. - ---- - -## DI container & invariants - -### DI wiring change required - -`Container.init_cosmos_repo()` currently always constructs `CosmosGroundTruthRepo`: [backend/app/container.py](backend/app/container.py#L86-L138) - -To adopt subclassing cleanly, `init_cosmos_repo` should choose: -- `CosmosEmulatorGroundTruthRepo` when `settings.USE_COSMOS_EMULATOR` is true or endpoint is non-TLS local emulator -- `CosmosGroundTruthRepo` otherwise - -Invariants to preserve: -- Same constructor args for both repos, so container swap is trivial. -- `await repo._init()` must still be called on startup (lifespan/startup path relies on async client binding). - -### Tests likely to be impacted - -1) Unicode tests import the private normalization function directly -- [backend/tests/unit/test_unicode_fix.py](backend/tests/unit/test_unicode_fix.py#L10-L12) - -If normalization moves to emulator module, either: -- keep `_normalize_unicode_for_cosmos` exported from cosmos_repo.py as a compatibility shim, or -- update tests to import from emulator module. - -2) Unit tests validate `_build_query_filter` tag clause uses `ARRAY_CONTAINS` -- [backend/tests/unit/test_cosmos_repo.py](backend/tests/unit/test_cosmos_repo.py#L33-L58) - -If you split production vs emulator query builders, keep production semantics in `CosmosGroundTruthRepo._build_query_filter` and put emulator differences behind `list_gt_paginated` routing (recommended), so tests remain valid. - -3) Assignment tests may depend on selection behavior -- `AssignmentService` already retries with `exclude_ids`; repo sampling also supports `exclude_ids` via query building. Refactor must maintain that exclusion contract. - ---- - -## Recommendation snapshot - -- Move B-category logic out of [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py): - - `sample_unassigned` + `_compute_quotas` → `AssignmentService` / `AssignmentSamplingService` - - `totalReferences` derivation → a derivations service (or domain model) - - `user_id` validation → API/service -- Keep A-category logic in the repo. -- Create `CosmosEmulatorGroundTruthRepo` in `cosmos_emulator.py` and concentrate C-category logic there, with the override seams listed above. diff --git a/.copilot-tracking/subagent/20260121/frontend-requirements-research.md b/.copilot-tracking/subagent/20260121/frontend-requirements-research.md deleted file mode 100644 index fac6dff..0000000 --- a/.copilot-tracking/subagent/20260121/frontend-requirements-research.md +++ /dev/null @@ -1,263 +0,0 @@ -# Frontend requirements research (from frontend docs) - -Date: 2026-01-21 -Scope: Research-only inference of **high-level** frontend UX requirements that match the existing system. - -## Sources reviewed - -- [frontend/README.md](../../frontend/README.md) -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md) -- [frontend/IMPLEMENTATION_SUMMARY.md](../../frontend/IMPLEMENTATION_SUMMARY.md) -- [frontend/BACKEND_API_CHANGES.md](../../frontend/BACKEND_API_CHANGES.md) -- [frontend/docs/MVP_REQUIREMENTS.md](../../frontend/docs/MVP_REQUIREMENTS.md) -- [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](../../frontend/docs/OBSERVABILITY_IMPLEMENTATION.md) -- [frontend/src/components/app/defaultCurateInstructions.md](../../frontend/src/components/app/defaultCurateInstructions.md) - -## Inferred high-level UX requirements - -### 1) Runtime configuration and local development - -- The frontend must support configuring the backend base URL, OpenAPI schema URL, and a dev-only user identifier via environment variables. -- In local development, the frontend should call backend APIs under `/v1/...` and rely on a dev proxy to avoid CORS. -- The UI should support a configurable default “self-serve assignment” limit. - -Evidence: -- [frontend/README.md](../../frontend/README.md#L20-L44) - -> - `VITE_API_BASE_URL` – backend base URL … -> - `VITE_OPENAPI_URL` – OpenAPI spec URL … -> - `VITE_DEV_USER_ID` – optional dev-only user id sent as `X-User-Id` -> - `VITE_SELF_SERVE_LIMIT` – optional default for self-serve assignments -> … all requests to `/v1/...` are proxied to `VITE_API_BASE_URL` … - -### 2) App shape: single-page, multi-pane curation workspace - -- The app is a single-page experience (no router required by default) with a multi-pane curation workspace. -- The primary workspace must separate concerns into: - - Left: queue of items - - Center: editor and actions - - Right: references (search vs selected) - - Additional views: stats, and other overlays/modals. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L70-L80) - -> “A single-page React app.” -> “UX separation: Left queue, center editor … right references pane … stats view, and modal overlays.” - -### 3) Assignment-based workflows and queue navigation - -- The primary worklist must be “assigned items” (the curator’s current work queue). -- The queue should: - - Display each item’s ID, status, version, and a truncated question. - - Support selecting an item to edit. - - Support refreshing/reloading the list. - - Highlight deleted items. -- The UI should provide a “self-serve assignments” action in/near the queue to request more assigned work, using a configurable limit. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L128-L148) - -> “items: list shown in Queue, updated on save/refresh” -> “viewMode: … ‘curate’ … ‘questions’ … ‘stats’” -> “Self-serve assignments – Queue offers a button to request more assignments (limit via `VITE_SELF_SERVE_LIMIT`).” - -### 4) Editing flow (single-turn baseline) - -- The editor must allow updating question/answer content for the current item. -- “Change category” is no longer required for saving; the UI should not block saving on that. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L86-L95) - -> “Change category: previously required when Q/A changed; no longer enforced.” - -### 5) References: search, add, select, visit/open, and annotate - -- The right panel must provide two distinct reference experiences: - - Search tab: search for candidate references and add them into the item. - - Selected tab: manage references already attached to the item. -- Search UX requirements: - - Display search results and allow adding individual results. - - Support multi-select add. - - Prevent duplicate additions by URL (disable add when URL already present; de-dup by URL). -- Selected references UX requirements: - - List attached references. - - Allow toggling which references are selected. - - Allow opening a reference (in a new tab) and marking it as visited. - - Allow capturing a “key paragraph” per selected reference and show a counter/length affordance. - - Allow removing a reference and undoing that removal within a time window. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L136-L144) - -> “ref opening: marks visited and opens in a new tab …” -> “Search tab: … supports multi-select Add … disabled when URL already present; de-dup by URL.” -> “Selected tab: … visit/open … key paragraph with counter; Remove supports Undo (8s window).” - -Additional evidence (curation guidance shown to users): -- [frontend/src/components/app/defaultCurateInstructions.md](../../frontend/src/components/app/defaultCurateInstructions.md#L1-L4) - -> “Include references you actually visited; for selected ones, write a key paragraph (≥ 40 chars).” - -### 6) Approval gating and validation - -- The UI must gate “Approve” based on reference completeness: - - Requires at least one selected reference. - - If references exist, all references must be visited. - - Selected references must have a key paragraph of at least 40 characters. - - Deleted items cannot be approved. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L75-L79) - -> “Approval constraints: … at least one selected reference … all refs visited; selected refs have ≥40 char key paragraph. Deleted items cannot be approved.” - -### 7) Save semantics and user feedback - -- Save must be idempotent and detect “no-op” updates (avoid re-saving when nothing changed). -- If there are no changes, the UI should communicate “No changes”. -- Status-only updates should not be treated as content changes (no need to present them as version bumps in UX). - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L140-L146) - -> “Save – computes state fingerprint; if unchanged: returns ‘No changes’.” - -### 8) Soft delete / restore workflows - -- Users must be able to soft-delete items and restore them. -- Deleted items should visibly indicate deletion and be non-approvable. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L145-L147) - -> “Soft delete – … deleted items show a banner and cannot be approved; restore supported.” - -### 9) Export UX - -- Export should trigger a backend-driven snapshot download (JSON) rather than an in-app export modal. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L145-L146) - -> “Export – triggers backend snapshot download … no in-app JSON modal.” - -### 10) Tags: manage existing tags on an item - -- The UI must support applying tags to the current ground-truth item. -- Tag creation is not required (and may not be supported by the backend); the UX should focus on selecting from a known set. -- Tag validation may be constrained by a fixed schema. - -Evidence: -- [frontend/docs/MVP_REQUIREMENTS.md](../../frontend/docs/MVP_REQUIREMENTS.md#L22-L27) - -> “get the known set of existing tags … (`GET /tags/schema`)” -> “allow the user to create new tags (no write endpoints for tags)” -> “apply the tags to the current ground truth …” -> “tag validation … fixed schema” - -### 11) Curation instructions - -- The UI must surface curation instructions as user-consumable markdown. -- Instructions are expected to be fetchable and writable per dataset (with concurrency control). - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L86-L92) - -> `curationInstructions?: string` - -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L165-L168) - -> “InstructionsPane … collapsible curation instructions surfaced per item” - -- [frontend/docs/MVP_REQUIREMENTS.md](../../frontend/docs/MVP_REQUIREMENTS.md#L15-L18) - -> “get curation instructions (`GET /datasets/{datasetName}/curation-instructions`)” -> “write curation instructions (`PUT /datasets/{datasetName}/curation-instructions` with ETag)” - -### 12) Multi-turn curation (conversation history) - -- The UI must support multi-turn conversation editing in addition to classic single-turn Q/A. -- It must provide: - - A timeline view of conversation turns. - - Adding/editing/deleting turns. - - An optional “context” field for application/product context. - - A mode toggle (single-turn vs multi-turn) with auto-detection and persistence. -- Multi-turn approval adds requirements beyond single-turn: - - Must contain at least one user turn and one agent turn. - - All references must be marked with a relevance state. - - All “relevant” references must have key paragraphs ≥ 40 characters. - -Evidence: -- [frontend/IMPLEMENTATION_SUMMARY.md](../../frontend/IMPLEMENTATION_SUMMARY.md#L88-L112) - -> “Mode Toggle … Auto-detection … Persistence: Saves preference to localStorage” -> “Reference Relevance Tracking … Requires all references to be marked before approval …” -> “Application Context … Collapsible Editor …” - -- [frontend/IMPLEMENTATION_SUMMARY.md](../../frontend/IMPLEMENTATION_SUMMARY.md#L147-L158) - -> “Multi-Turn Approval Requirements … All references marked … All ‘relevant’ references have key paragraphs ≥40 chars …” - -### 13) Keyboard shortcuts - -- The app should support global shortcuts for primary curation actions: - - Cmd/Ctrl+S: save draft. - - Cmd/Ctrl+Enter: attempt approve (still gated by validation). - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L184-L184) - -> “Keyboard shortcuts: Cmd/Ctrl+S saves draft; Cmd/Ctrl+Enter attempts approve (gated)” - -### 14) Error handling and user feedback surfaces - -- The UI should provide toast-based feedback for: - - Network failures (and keep state consistent). - - Undo interactions (reference removal undo window). - - Browser popup blocking when opening references in new tabs. - -Evidence: -- [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L215-L233) - -> “Undo delete window: 8 seconds via toast action …” -> “Network failures … show toast and keep state consistent” -> “Popup blocked on new tab: info toast prompts user” - -### 15) Telemetry / observability (optional, safe-by-default) - -- Telemetry must be opt-in, safe, and no-op when disabled (including demo mode). -- The UI should have a user-friendly error boundary for rendering failures, and log exceptions to telemetry when enabled. - -Evidence: -- [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](../../frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L13-L18) - -> “Opt-in … Telemetry is disabled by default …” -> “Safe: No-ops gracefully in demo mode or when configuration is missing” - -- [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](../../frontend/docs/OBSERVABILITY_IMPLEMENTATION.md#L79-L86) - -> “Error Boundary … Catches rendering errors … Renders a user-friendly fallback UI …” - -### 16) Demo mode - -- The UI must support a “demo mode” that toggles behavior at startup via environment variables. -- Demo mode should disable telemetry and may use mock providers/services. - -Evidence: -- [frontend/README.md](../../frontend/README.md#L74-L92) - -> “VITE_DEMO_MODE … to enable demo behavior” -> “Telemetry automatically no-ops in demo mode …” - -## Noted doc drift / open questions (for follow-up) - -- Search + generation backend availability appears inconsistent across docs: - - [frontend/docs/MVP_REQUIREMENTS.md](../../frontend/docs/MVP_REQUIREMENTS.md#L28-L36) states no backend search/LLM endpoints. - - [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L136-L145) describes search and generation flows calling backend (`searchReferences`, `callAgentChat`). - - This may be historical vs current behavior; confirm actual endpoints and desired UX when offline/backends are missing. - -- Export behavior differs by context: - - [frontend/CODEBASE.md](../../frontend/CODEBASE.md#L145-L146) states Export triggers snapshot download. - - Multi-turn export expansion is described as part of model/export logic in [frontend/IMPLEMENTATION_SUMMARY.md](../../frontend/IMPLEMENTATION_SUMMARY.md#L110-L112). Confirm whether export expansion is implemented in frontend, backend, or both. diff --git a/.copilot-tracking/subagent/20260121/prd-requirements-research.md b/.copilot-tracking/subagent/20260121/prd-requirements-research.md deleted file mode 100644 index 9eebea7..0000000 --- a/.copilot-tracking/subagent/20260121/prd-requirements-research.md +++ /dev/null @@ -1,176 +0,0 @@ -# PRD Requirements Research — High-level requirements consistent with current system - -Date: 2026-01-21 - -## Scope and method - -This report extracts **high-level “shall/should/may” product requirements** from the PRD sources in this repo and then labels each requirement: - -- **Matches existing system**: Yes / No / Unclear -- With a brief justification grounded in **current backend/frontend docs and code**. - -Primary requirement sources used: - -- [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md) (MVP requirements) -- [prd-genericize.json](prd-genericize.json) (genericization PRD) -- [prd.json](prd.json) (backlog items / future requirements) - -Notes: - -- [ralph/ralph-prd.txt](ralph/ralph-prd.txt) appears to be **agent execution instructions**, not product requirements. -- [BUSINESS_VALUE.md](BUSINESS_VALUE.md) is treated as **goals/KPIs**, not normative requirements. - ---- - -## Supported / consistent requirements (candidate “current PRD”) - -### R-001 — Product-agnostic configuration - -- Requirement (shall): The system shall be **product-agnostic**, removing hard-coded product/vendor branding and domain-specific content. -- Requirement (shall): The system shall make **branding** configurable. -- Requirement (shall): The system shall make **trusted reference domains** configurable. -- Requirement (shall): The system shall support a **generic demo mode** (generic sample data). -- Requirement (should): The system should make **manual tags** configurable. -- Primary evidence: [prd-genericize.json](prd-genericize.json#L13-L18), [prd-genericize.json](prd-genericize.json#L39-L45) -- Matches existing system: **Yes** -- System evidence: [frontend/src/config/branding.ts](frontend/src/config/branding.ts#L11), [frontend/src/services/runtimeConfig.ts](frontend/src/services/runtimeConfig.ts#L49), [backend/app/main.py](backend/app/main.py#L44), [frontend/src/config/demo.ts](frontend/src/config/demo.ts#L2-L13) - -### R-002 — Bulk import ground-truth items - -- Requirement (shall): The system shall allow a curator/admin to **bulk import** generated ground-truth items via an API. -- Requirement (should): The system should support importing **negative cases** via the same mechanism. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L142-L143) -- Matches existing system: **Yes** -- System evidence: [backend/CODEBASE.md](backend/CODEBASE.md#L103) - -### R-003 — Assignment visibility isolation - -- Requirement (shall): The system shall ensure an SME only sees **their assigned work** (and cannot access other SMEs’ assignments without override). -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L137-L139) -- Matches existing system: **Yes** (documented) -- System evidence: [backend/CODEBASE.md](backend/CODEBASE.md#L111-L112) - -### R-004 — Self-serve assignment (pull model) - -- Requirement (shall): The system shall allow SMEs to **self-serve** (request) a limited number of items to work on. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L44-L44) -- Matches existing system: **Yes** -- System evidence: [backend/CODEBASE.md](backend/CODEBASE.md#L111) - -### R-005 — SME review actions (draft/save/approve/delete) - -- Requirement (shall): The system shall allow an SME to **edit and save**, **approve**, or **delete** an assigned item. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L147-L147) -- Matches existing system: **Yes** -- System evidence: [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L29-L160) - -### R-006 — Snapshot & export of approved items - -- Requirement (shall): The system shall support a **weekly snapshot** and export an immutable JSON artifact containing **approved items + metadata**. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L193-L195) -- Matches existing system: **Yes** -- System evidence: [backend/CODEBASE.md](backend/CODEBASE.md#L108), [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L352-L352) - -### R-007 — Controlled-vocabulary tagging (apply tags) - -- Requirement (shall): The system shall allow an SME to apply **multiple tags from a controlled list** to an item, and those tags shall be reflected in exports. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L167-L169) -- Matches existing system: **Yes** (apply + schema retrieval) -- System evidence: [backend/app/api/v1/tags.py](backend/app/api/v1/tags.py#L32), [backend/tests/integration/test_tags_schema_api.py](backend/tests/integration/test_tags_schema_api.py#L6), [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L41-L46) - -### R-008 — Soft delete for ground-truth items - -- Requirement (shall): The system shall support **soft deletion** of items (hidden from default views/exports while retained for history), and allow deletion via the review workflow. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L173-L175) -- Matches existing system: **Partial / Unclear** (soft delete exists; restore/cleanup requirements appear incomplete) -- System evidence: [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1253-L1254), [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L16) - -### R-009 — Aggregate stats endpoint - -- Requirement (should): The system should provide a stats endpoint for progress/visibility. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L303-L303) -- Matches existing system: **Yes (aggregate)** -- System evidence: [backend/app/api/v1/stats.py](backend/app/api/v1/stats.py#L11-L14) - ---- - -## Out-of-scope / Not yet supported (per current system) - -These are high-level requirements present in PRD sources, but **do not currently match** what the repo documents/implements. - -### O-001 — LLM answer generation endpoint/workflow - -- Requirement (must/shall): SMEs shall be able to generate an answer using an LLM given the question + relevant context. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L150-L150) -- Matches existing system: **No** -- System evidence: [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L33-L36) - -### O-002 — AI Search integration for attaching/detaching references - -- Requirement (must/shall): The UI shall connect to AI Search and allow SMEs to attach/detach relevant documents, persisting them into item metadata/exports. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L158-L161) -- Matches existing system: **No** -- System evidence: [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L29-L31) - -### O-003 — Tag administration (manage controlled vocabulary) - -- Requirement (must/shall): Admins shall be able to manage the controlled tag list. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L167-L167) -- Matches existing system: **No** -- System evidence: [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L24-L27) - -### O-004 — SME-specific stats - -- Requirement (must/shall): SMEs shall see statistics about *their assigned items* to track progress toward sprint goals. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L188-L188) -- Matches existing system: **No** (current stats is not per-user) -- System evidence: [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L48-L48), [backend/app/api/v1/stats.py](backend/app/api/v1/stats.py#L11-L14) - -### O-005 — Batch as a first-class concept - -- Requirement (must/shall): Items shall be grouped into batches, with a single assignee per batch. -- Primary evidence: [docs/ground-truth-curation-reqs.md](docs/ground-truth-curation-reqs.md#L182-L182) -- Matches existing system: **Unclear** (assignments exist; “batch” entity support is not clearly implemented/documented) -- System evidence: [backend/CODEBASE.md](backend/CODEBASE.md#L111-L112) - -### O-006 — Entra-based authentication / access control design + implementation - -- Requirement (should/shall): The system should support Entra-based access control (design and/or implementation stories are captured in PRD backlog). -- Primary evidence: [prd.json](prd.json#L166-L173) -- Matches existing system: **No** (explicitly documented as placeholder) -- System evidence: [backend/CODEBASE.md](backend/CODEBASE.md#L120-L120) - -### O-007 — Keyword search of ground-truth items - -- Requirement (should/shall): The system should provide keyword search over question/answer for locating items. -- Primary evidence: [prd.json](prd.json#L16-L16) -- Matches existing system: **No** -- System evidence: [frontend/docs/MVP_REQUIREMENTS.md](frontend/docs/MVP_REQUIREMENTS.md#L29-L31) - -### O-008 — PII detection in approval flow - -- Requirement (should/shall): The system should detect PII during (or before) approval to prevent sensitive data from entering the approved set. -- Primary evidence: [prd.json](prd.json#L94-L95) -- Matches existing system: **No** -- System evidence: (no current backend/frontend evidence found in docs indicating PII scanning) - -### O-009 — Duplicate detection / prevention - -- Requirement (should/shall): The system should detect duplicates (draft vs approved) and prevent SMEs from working on duplicates. -- Primary evidence: [prd.json](prd.json#L148-L155) -- Matches existing system: **No** -- System evidence: (no current backend/frontend evidence found in docs indicating duplicate detection) - -### O-010 — Chunking support - -- Requirement (should/shall): The system should support chunking (as described in backlog). -- Primary evidence: [prd.json](prd.json#L40-L41) -- Matches existing system: **No** -- System evidence: (no current backend/frontend evidence found in docs indicating chunking support) - ---- - -## Quick takeaways - -- The **core curation loop** (bulk import → self-serve assignments → SME approve/edit/delete → export snapshot) is well supported and consistently documented. -- “Stretch” requirements (AI Search attach/detach, LLM generation, RBAC/Entra, per-user stats, tag administration) are present in PRD sources but are **not yet supported** per current repo docs. diff --git a/.copilot-tracking/subagent/20260121/subagent-reference-audit.md b/.copilot-tracking/subagent/20260121/subagent-reference-audit.md deleted file mode 100644 index 92887f8..0000000 --- a/.copilot-tracking/subagent/20260121/subagent-reference-audit.md +++ /dev/null @@ -1,44 +0,0 @@ - -# Subagent Reference Audit (20260121) - -## Purpose -Verify whether all subagent research files in `.copilot-tracking/subagent/20260121/` are referenced by the top-level research document. - -## Inputs -- Subagent folder: `.copilot-tracking/subagent/20260121/` -- Top-level doc: `.copilot-tracking/research/20260121-high-level-requirements-research.md` -- Match rule: extract markdown-style links (and raw text occurrences) that include the substring `.copilot-tracking/subagent/20260121/`. - -## Files Present In Subagent Folder -- api-logic-research.md -- backend-requirements-research.md -- citation-validation.md -- consolidated-requirements-synthesis.md -- conventions-and-sources-research.md -- conventions-research.md -- cosmos-repo-research.md -- frontend-requirements-research.md -- prd-requirements-research.md -- synthesis-notes.md - -## Files Referenced By Top-Level Doc -(Links/mentions found in `.copilot-tracking/research/20260121-high-level-requirements-research.md` that reference `.copilot-tracking/subagent/20260121/`.) - -- prd-requirements-research.md - -## Present But Not Referenced -- api-logic-research.md -- backend-requirements-research.md -- citation-validation.md -- consolidated-requirements-synthesis.md -- conventions-and-sources-research.md -- conventions-research.md -- cosmos-repo-research.md -- frontend-requirements-research.md -- synthesis-notes.md - -## Referenced But Missing -- (none) - -## Notes -- This audit only checks for references using the specific prefix `.copilot-tracking/subagent/20260121/`. If the top-level doc references these files via different relative paths (or without the folder prefix), they will not be counted here. diff --git a/.copilot-tracking/subagent/20260121/synthesis-notes.md b/.copilot-tracking/subagent/20260121/synthesis-notes.md deleted file mode 100644 index 507466e..0000000 --- a/.copilot-tracking/subagent/20260121/synthesis-notes.md +++ /dev/null @@ -1,143 +0,0 @@ ---- -title: Synthesis — Refactor recommendations (API/service/repo boundaries + Cosmos emulator repo) -description: Consolidated, line-cited recommendations based on prior research notes. -author: GitHub Copilot (subagent) -ms.date: 2026-01-21 -ms.topic: reference ---- - -## 1) Consolidated responsibility boundary proposal (API vs service vs repo) - -### API layer (FastAPI routers) -**Owns:** HTTP surface area only: authn/authz, request parsing, basic request-shape validation, and mapping typed service errors to HTTP responses. - -**Concrete examples (current violations):** -- The SME update endpoint contains a full workflow: ownership enforcement, partial update semantics, approval/status transitions that clear assignment, history parsing (including embedded refs), ETag enforcement, computed tag application, persistence, and best-effort deletion of the assignment document — all inside the router handler in [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L78-L232). -- The general ground truth update endpoint repeats many of the same workflow concerns: status coercion, explicit business rules rejecting `computedTags` and legacy `tags`, history parsing, ETag enforcement, computed tag application, persistence, and then re-fetch for fresh ETag in [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L232-L369). - -**Good existing pattern to emulate:** -- Snapshot routes delegate domain work to `container.snapshot_service` and keep the handler thin in [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L105-L154). - -### Service layer -**Owns:** domain workflows/state transitions, cross-endpoint invariants, and shared parsing/normalization. - -**Recommended service boundaries (aligned to existing code):** -- **`GroundTruthUpdateService` (new):** consolidate “update item” workflows used by both the SME update route and the general update route. - - Should own: partial update policy, history parsing, tag-field acceptance policy, computed tag recomputation policy, and ETag policy (requirement + mismatch translation). - - Justification: the routers currently duplicate logic and apply tags/ETags similarly in [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L104-L198) and [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L241-L363). -- **`AssignmentService` (existing):** own assignment workflows; keep repo calls as persistence/atomic update primitives. - - Today `AssignmentService.self_assign` orchestrates retries and uses `repo.assign_to` + assignment-doc materialization in [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py#L44-L146). - - Repo currently owns a large “sampling allocation + quota + selection + shuffle” algorithm in `sample_unassigned` and `_compute_quotas`, which is domain workflow rather than persistence and should move into the service layer ([backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1409-L1609), [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1649-L1680)). -- **`GroundTruthDerivationsService` (new) OR domain model responsibility:** derived-field computation currently lives in the Cosmos adapter. - - The repo computes and mutates `totalReferences` during persistence (`_compute_total_references` and `_to_doc`) in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L389-L443). This is a business definition ("history refs win") and should be owned above the storage adapter. - -### Repo layer (Cosmos adapters) -**Owns:** persistence mechanics only: Cosmos client/container I/O, query construction, paging, concurrency primitives (ETag usage), and minimal storage-centric validations. - -**Concrete repo responsibilities (current examples):** -- Interface surface is already formalized via `GroundTruthRepo` in [backend/app/adapters/repos/base.py](backend/app/adapters/repos/base.py#L1-L55). -- Storage-centric query construction with safe parameterization belongs in the repo (e.g., tag and ref-url clauses, including emulator limitations) in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L500-L590). -- Cosmos pagination uses a direct SQL path with `ORDER BY` and a separate emulator path with in-memory filtering when needed in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L660-L911). - -**What should move out of the repo:** -- Domain validation of `user_id` currently happens inside `assign_to` (regex) in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1689-L1714). That rule is an API/service contract; repo should assume validated input. - -## 2) Recommended `cosmos_emulator.py` design (inherit vs wrapper) + override seams - -### Recommendation: subclass (inherit) a production repo -Create `CosmosEmulatorGroundTruthRepo(CosmosGroundTruthRepo)` in a new module `backend/app/adapters/repos/cosmos_emulator.py`. - -**Why inherit (vs wrapper) in this codebase:** -- The container currently constructs a concrete `CosmosGroundTruthRepo` and wires services immediately afterward in [backend/app/container.py](backend/app/container.py#L83-L161). Keeping a compatible constructor minimizes DI churn. -- Many emulator differences are already expressed as “same public method, different internal behavior” toggled by `is_cosmos_emulator_in_use()` in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L644-L647). A subclass can make that decision structural (class-level) instead of conditional branches in production code. - -### Exact override seams (methods/properties) to isolate emulator behavior - -1) **`is_cosmos_emulator_in_use()`** -- Base currently detects emulator via endpoint string in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L644-L647). -- Emulator subclass override: return `True` unconditionally. - -2) **`list_gt_paginated()` routing + emulator pagination path** -- Base method conditionally routes to `_list_gt_paginated_with_emulator` when tags/ref_url are present and emulator is in use in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L674-L707). -- Emulator subclass override: simplify to always use `_list_gt_paginated_with_emulator` when `tags` or `ref_url` are provided, eliminating endpoint checks from production. -- The emulator path explicitly disables SQL tag/ref_url filters and performs in-memory filtering due to emulator limitations in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L806-L911). - -3) **Query filter construction for prod-only SQL features** -- The `EXISTS(...)` ref-url filter is only injected when `include_ref_url=True` in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L565-L585). -- Emulator subclass should avoid ref-url SQL filters (continue doing in-memory filtering as implemented) by ensuring `include_ref_url=False` for emulator list operations. - -4) **`assign_to()` (patch vs read-modify-replace)** -- Base currently: - - validates `user_id` with a regex - - chooses patch vs read-modify-replace based on emulator detection - in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1689-L1714). -- Emulator subclass override: - - delegate validation to service (stop duplicating API contract here) - - always execute read-modify-replace (compatibility path) and avoid `patch_item` filter predicates. - -5) **Write-path normalization + emulator-specific retries (`upsert_gt`)** -- Base uses `COSMOS_DISABLE_UNICODE_ESCAPE` gating and applies `_ensure_utf8_strings` before upsert/replace in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1099-L1117). -- Base also includes emulator-specific retry behavior keyed off `is_cosmos_emulator_in_use()` and message matching for invalid JSON payload and intermittent jsonb errors in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1120-L1216). -- Emulator subclass override: keep these retries (and optionally strengthen them), while production base can be simplified over time to rely on SDK retry policy. - -6) **Delete-path retries (`delete_dataset`)** -- Base has emulator-only retry logic for intermittent errors and HTTP-format issues in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1235-L1360). -- Emulator subclass override: keep retries local to emulator repo. - -### Consolidating the Unicode/backslash/base64 workaround into the emulator repo -Right now the workaround is spread across: -- Normalization + base64 helpers (`_normalize_unicode_for_cosmos`, `_restore_unicode_from_cosmos`) in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L45-L176). -- A repo-level wrapper `_ensure_utf8_strings()` in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L361-L377). -- Multiple call sites (import, curation upsert, GT upsert) that apply the wrapper in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L448-L479) and [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1079-L1117). -- Read-path restore inside `_from_doc()` in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L446-L459). - -**Recommendation:** define explicit “transform seams” in the base class and override them in the emulator subclass. -- Base adds two protected methods: - - `_transform_doc_for_write(doc: dict[str, Any]) -> dict[str, Any]` - - `_transform_doc_for_read(doc: dict[str, Any]) -> dict[str, Any]` -- Base default implementations are identity. -- Emulator subclass overrides them to apply `_normalize_unicode_for_cosmos` / `_restore_unicode_from_cosmos` (and thus base64 encode/decode of `refs[*].content`). These behaviors already exist in the module and are gated by `settings.COSMOS_DISABLE_UNICODE_ESCAPE` in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L99-L176). - -This turns today’s scattered per-method checks into a single, testable seam. - -## 3) Step-by-step migration plan (minimize risk, 6–10 steps) - -1) **Introduce typed domain exceptions for stable HTTP mapping** - - Replace substring-based ValueError parsing in the assign endpoint with typed errors (router currently maps substrings) in [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L255-L323). - -2) **Add `GroundTruthUpdateService` with a single “update workflow” entrypoint** - - Start by moving the shared logic (ETag requirement + mismatch mapping, history parsing, computed tags application) out of both routes. - - Current duplicated workflow lives in [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L104-L198) and [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L241-L363). - -3) **Switch routers to call the service (thin handlers)** - - Keep request parsing/validation in the handlers; move the workflow and repo calls into the service. - - Use the snapshot route pattern as precedent (service-first) in [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L105-L154). - -4) **Extract parsing helpers into a shared module** - - Create reusable helpers for history parsing (including refs and expectedBehavior) since both handlers implement near-identical loops in [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L152-L187) and [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L300-L338). - -5) **Move assignment sampling logic out of the repo into service** - - Shift the allocation/quota/selection algorithm from `CosmosGroundTruthRepo.sample_unassigned` to `AssignmentService` (or a dedicated `AssignmentSamplingService`). - - Current algorithm is in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1409-L1609), with quota math in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1649-L1680). - -6) **Move derived-field computation (`totalReferences`) out of the repo** - - Stop mutating `GroundTruthItem.totalReferences` inside `_to_doc` and compute it in a derivations service before persistence. - - Current mutation happens in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L410-L443). - -7) **Introduce `CosmosEmulatorGroundTruthRepo` and select it in the container** - - Container already derives an emulator/non-TLS condition via `USE_COSMOS_EMULATOR` and endpoint scheme in [backend/app/container.py](backend/app/container.py#L110-L119). - - Add a class selection branch there (keep constructor signature compatible). - - Emulator flag is defined in settings in [backend/app/core/config.py](backend/app/core/config.py#L28-L45). - -8) **Centralize the document transform seam** - - Implement `_transform_doc_for_write/_transform_doc_for_read` and route existing `_ensure_utf8_strings` usage through it. - - Grounding: normalization functions and wrapper already exist in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L45-L176) and [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L361-L377). - -9) **Update tests to target the new seams (keep behavior identical first)** - - Keep production behavior unchanged; emulator behavior should remain behind `USE_COSMOS_EMULATOR` or localhost endpoint detection initially. - -## 4) Alternatives considered (brief) - -- **Flags-in-repo (status quo):** simplest, but keeps production and emulator concerns entangled (e.g., emulator routing in `list_gt_paginated`) in [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L674-L707). -- **Subclass (recommended):** isolates emulator-only behavior while keeping constructor + protocol stable (container wiring remains straightforward) in [backend/app/container.py](backend/app/container.py#L83-L161). -- **Strategy object / wrapper:** cleanest purity-wise (inject a “capabilities/transforms” strategy), but higher churn because many internal calls and helper methods aren’t easily intercepted without adding new seams. diff --git a/.copilot-tracking/subagent/20260122/architecture-refactoring-research.md b/.copilot-tracking/subagent/20260122/architecture-refactoring-research.md deleted file mode 100644 index 56bcc29..0000000 --- a/.copilot-tracking/subagent/20260122/architecture-refactoring-research.md +++ /dev/null @@ -1,321 +0,0 @@ -# Architecture Refactoring Research - -**Date:** 2026-01-22 -**Stories:** SA-746 (Refactor API logic into services), SA-424 (Refactor cosmos_repo.py) - -## Executive Summary - -The backend has significant duplicate logic between `assignments.py` and `ground_truths.py` API endpoints. The `cosmos_repo.py` file is 1,500+ lines and contains emulator-specific workarounds and business logic that should be extracted. The existing service layer pattern (`AssignmentService`, `CurationService`, etc.) provides a clear blueprint for refactoring. - ---- - -## 1. Current API Endpoint Structure - -### Assignments API (`/v1/assignments/`) - -| Endpoint | Method | Purpose | Frontend Usage | -|----------|--------|---------|----------------| -| `/self-serve` | POST | Bulk self-assignment | Yes - requestAssignmentsSelfServe | -| `/my` | GET | List user's assignments | Yes - getMyAssignments | -| `/{dataset}/{bucket}/{item_id}` | PUT | Update assigned item | Yes - updateAssignedGroundTruth | -| `/{dataset}/{bucket}/{item_id}/assign` | POST | Assign single item | Yes - assignItem | -| `/{dataset}/{bucket}/{item_id}/duplicate` | POST | Duplicate as rephrase | Yes - duplicateItem | - -### Ground Truths API (`/v1/ground-truths/`) - -| Endpoint | Method | Purpose | Frontend Usage | -|----------|--------|---------|----------------| -| `` | POST | Bulk import | No (admin) | -| `` | GET | List all (paginated) | Yes - listAllGroundTruths (Explorer) | -| `/snapshot` | POST/GET | Export snapshot | Yes - downloadSnapshot | -| `/{datasetName}` | GET | List by dataset | Unknown | -| `/{datasetName}/{bucket}/{item_id}` | GET | Get single item | Yes - getGroundTruth | -| `/{datasetName}/{bucket}/{item_id}` | PUT | Update item | Yes - restoreGroundTruth | -| `/{datasetName}/{bucket}/{item_id}` | DELETE | Soft delete | Yes - deleteGroundTruth | -| `/recompute-tags` | POST | Bulk tag recomputation | No (admin) | - ---- - -## 2. Duplicate Logic Analysis - -### 2.1 Item Update Logic (HIGH PRIORITY) - -Both `assignments.py:update_item()` and `ground_truths.py:update_ground_truth()` contain nearly identical logic: - -**Shared patterns (~80% overlap):** - -```python -# Both endpoints do: -1. Fetch item via container.repo.get_gt() -2. Apply field updates (edited_question, answer, comment, status, refs, manual_tags) -3. Handle history field parsing (identical HistoryItem conversion) -4. Handle ETag validation (If-Match header or body.etag) -5. Apply computed tags via apply_computed_tags() -6. Persist via container.repo.upsert_gt() -7. Re-fetch and return updated item -``` - -**Differences:** - -| Aspect | Assignments | Ground Truths | -|--------|------------|---------------| -| Authorization | `assignedTo == user` check | No assignment check | -| Status handling | Clears assignment on approve/delete | No assignment clearing | -| Payload model | `AssignmentUpdateRequest` (Pydantic) | `dict[str, Any]` (raw) | -| Assignment doc cleanup | Yes (deletes assignment doc) | No | -| `approve` flag | Convenience boolean | Not supported | - -### 2.2 History Parsing (MEDIUM PRIORITY) - -Identical history parsing code in both endpoints (~30 lines each): - -```python -# Duplicated in assignments.py:140-160 and ground_truths.py:280-305 -history_items = [] -for h in payload.history: - refs_data = h.get("refs") - refs_list = None - if refs_data is not None: - refs_list = [r if isinstance(r, Reference) else Reference(**r) for r in refs_data] - expected_behavior_data = h.get("expected_behavior") or h.get("expectedBehavior") - history_items.append(HistoryItem( - role=h["role"], - msg=h.get("msg") or h.get("content", ""), - refs=refs_list, - expected_behavior=expected_behavior_data if isinstance(expected_behavior_data, list) else None, - )) -it.history = history_items -``` - -### 2.3 Tag Handling (LOW PRIORITY) - -Both endpoints validate and set `manual_tags` with identical patterns: - -```python -if "manual_tags" in provided_fields: # or "manualTags" in payload - try: - it.manual_tags = payload.manual_tags or [] - except ValueError as e: - raise HTTPException(status_code=400, detail=str(e)) -``` - ---- - -## 3. cosmos_repo.py Analysis - -### 3.1 File Statistics - -- **Total lines:** 1,536 -- **Functions/methods:** 35+ -- **Contains:** Cosmos emulator workarounds, Unicode sanitization, business logic - -### 3.2 Logical Components (Candidates for Extraction) - -| Component | Lines | Description | Extract To | -|-----------|-------|-------------|------------| -| Unicode sanitization | 50-150 | `_sanitize_string_for_cosmos`, `_normalize_unicode_for_cosmos`, `_restore_unicode_from_cosmos` | `cosmos_emulator.py` or `unicode_utils.py` | -| Base64 encoding for refs | 151-200 | `_base64_encode_refs_content`, `_base64_decode_refs_content` | `cosmos_emulator.py` | -| Sort security validation | 600-650 | `SortSecurityError`, `_build_secure_sort_clause` | Keep in repo (security) | -| Quota computation | 1100-1150 | `_compute_quotas` (largest remainder method) | `AssignmentService` | -| Query building | 500-600 | `_build_query_filter` | Keep in repo (query concern) | -| Document conversion | 350-450 | `_to_doc`, `_from_doc`, `_to_curation_doc`, `_from_curation_doc` | Keep in repo | - -### 3.3 Business Logic in Repository (Should Move to Service) - -1. **`sample_unassigned()`** (lines 1000-1150) - - Contains allocation/weighting logic - - Calls `_compute_quotas()` (policy decision) - - Should be: Service orchestrates, repo just queries - -2. **`assign_to()`** (lines 1200-1350) - - Contains conditional assignment logic - - User validation regex check (security concern - keep in service) - - Different code paths for emulator vs production - -3. **Total reference calculation** (lines 380-390) - - `_compute_total_references()` is business logic - - Currently in `_to_doc()` - should move to domain model or service - -### 3.4 Emulator-Specific Code - -The following are emulator workarounds that could be isolated: - -```python -# Pattern: is_cosmos_emulator_in_use() checks -def is_cosmos_emulator_in_use(self) -> bool: - return "localhost" in self._endpoint or "127.0.0.1" in self._endpoint - -# Used in: -- list_gt_paginated() - routes to _list_gt_paginated_with_emulator() -- _get_filtered_count() - different counting strategy -- assign_to() - read-modify-replace vs patch -- upsert_gt() - retry logic for jsonb errors -- delete_dataset() - retry logic -``` - ---- - -## 4. Current Service Layer Structure - -### 4.1 Existing Services - -| Service | Location | Responsibility | -|---------|----------|----------------| -| `AssignmentService` | services/assignment_service.py | Self-assign, assign single, duplicate | -| `CurationService` | services/curation_service.py | Dataset curation instructions | -| `SnapshotService` | services/snapshot_service.py | Export snapshots | -| `TaggingService` | services/tagging_service.py | Tag validation, computed tags | -| `ValidationService` | services/validation_service.py | Bulk import validation | -| `SearchService` | services/search_service.py | Azure AI Search adapter | -| `TagRegistryService` | services/tag_registry_service.py | Tag registry management | -| `ChatService` | services/chat_service.py | AI chat functionality | - -### 4.2 Service Pattern Used - -```python -class AssignmentService: - def __init__(self, repo: GroundTruthRepo): - self.repo = repo - - async def self_assign(self, user_id: str, limit: int) -> list[GroundTruthItem]: - # Orchestrates repo calls - # Contains business logic (retry, shuffle, validation) - pass -``` - -### 4.3 Container Wiring - -```python -# container.py -self.assignment_service = AssignmentService(self.repo) -self.curation_service = CurationService(self.repo) -self.snapshot_service = SnapshotService(self.repo, ...) -``` - ---- - -## 5. Refactoring Recommendations - -### 5.1 Phase 1: Extract Update Logic to Service (SA-746) - -Create `GroundTruthService` with shared update logic: - -```python -# services/ground_truth_service.py -class GroundTruthService: - def __init__(self, repo: GroundTruthRepo): - self.repo = repo - - async def update_item( - self, - dataset: str, - bucket: UUID, - item_id: str, - updates: ItemUpdateDTO, - user_id: str | None, - etag: str | None, - *, - enforce_assignment: bool = False, - clear_assignment_on_complete: bool = False, - ) -> GroundTruthItem: - """Unified item update logic.""" - pass - - def parse_history(self, raw_history: list[dict]) -> list[HistoryItem]: - """Parse history from API payload.""" - pass -``` - -### 5.2 Phase 2: Split cosmos_repo.py (SA-424) - -**File structure:** - -``` -backend/app/adapters/repos/ -├── base.py # Protocol (unchanged) -├── cosmos_repo.py # Core repo (~800 lines) -├── cosmos_emulator.py # Emulator workarounds (~200 lines) -├── cosmos_unicode.py # Unicode sanitization (~100 lines) -└── tags_repo.py # Tags (unchanged) -``` - -**Extract to cosmos_emulator.py:** - -- `_base64_encode_refs_content()` -- `_base64_decode_refs_content()` -- `_sanitize_string_for_cosmos()` -- `_normalize_unicode_for_cosmos()` -- `_restore_unicode_from_cosmos()` -- `_list_gt_paginated_with_emulator()` (as standalone function) -- `_assign_to_with_read_modify_replace()` (as standalone function) - -**Move to service layer:** - -- `_compute_quotas()` → `AssignmentService` -- `_compute_total_references()` → Domain model (`GroundTruthItem.total_references` property) - -### 5.3 Phase 3: Consolidate API Endpoints (Optional) - -Consider making `assignments` endpoint a thin wrapper that: - -1. Validates assignment ownership -2. Calls `GroundTruthService.update_item()` with `enforce_assignment=True` -3. Handles assignment document cleanup - ---- - -## 6. Frontend Impact Assessment - -### Assignments Endpoints (All Used by Frontend) - -- `POST /self-serve` - Used for initial assignment -- `GET /my` - Used for loading assigned items -- `PUT /{...}` - Used for all SME edits -- `POST /{...}/assign` - Used for explicit item assignment -- `POST /{...}/duplicate` - Used for rephrase creation - -### Ground Truths Endpoints - -- `GET /` (paginated) - Used by Explorer view -- `GET /{...}` - Used for item detail fetch -- `PUT /{...}` - Used for restore from deleted -- `DELETE /{...}` - Used for soft delete -- `GET /snapshot` - Used for export download - -**Conclusion:** Both endpoint groups are actively used. Refactoring must preserve API contracts. - ---- - -## 7. Risk Assessment - -| Risk | Likelihood | Impact | Mitigation | -|------|------------|--------|------------| -| Breaking API contract | Low | High | Keep endpoint signatures identical | -| ETag behavior changes | Medium | High | Comprehensive integration tests | -| Emulator-specific regressions | Medium | Medium | Run test suite with emulator flag | -| Service layer adds latency | Low | Low | Profile before/after | - ---- - -## 8. Next Steps - -1. **Create spec** for `GroundTruthService` with unified update logic -2. **Define interface** for emulator compatibility layer -3. **Estimate effort** for each phase -4. **Prioritize** based on Jira story scope - ---- - -## Appendix: File Line Counts - -``` -backend/app/api/v1/assignments.py - 242 lines -backend/app/api/v1/ground_truths.py - 405 lines -backend/app/adapters/repos/cosmos_repo.py - 1,536 lines -backend/app/adapters/repos/base.py - 57 lines -backend/app/services/assignment_service.py - 210 lines -backend/app/services/curation_service.py - 35 lines -backend/app/services/tagging_service.py - 130 lines -backend/app/services/validation_service.py - 70 lines -backend/app/services/snapshot_service.py - 90 lines -``` diff --git a/.copilot-tracking/subagent/20260122/assignment-error-feedback-research.md b/.copilot-tracking/subagent/20260122/assignment-error-feedback-research.md deleted file mode 100644 index 8a5ba58..0000000 --- a/.copilot-tracking/subagent/20260122/assignment-error-feedback-research.md +++ /dev/null @@ -1,233 +0,0 @@ -# Assignment Error Feedback Research - -**Date:** 2025-01-22 -**Topic:** assignment-error-feedback -**Status:** Complete - -## Executive Summary - -The assignment error feedback system has partial implementation. The backend returns appropriate status codes (409 for conflicts) with generic messages, but the frontend displays generic "Failed to assign item" errors instead of the backend's specific messages. The toast notification system is in place and supports actionable buttons, but is not leveraged for assignment conflict scenarios. - ---- - -## Research Findings - -### 1. Backend Response Structure for "Already Assigned" Failure - -**Location:** [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L258-L300) - -The `assign_item` endpoint handles assignment errors: - -```python -@router.post("/{dataset}/{bucket}/{item_id}/assign", status_code=200) -async def assign_item(...) -> GroundTruthItem: - try: - assigned = await container.assignment_service.assign_single_item(...) - return assigned - except ValueError as e: - error_msg = str(e) - if "already assigned" in error_msg.lower(): - raise HTTPException( - status_code=409, - detail="This item is already assigned to another user.", - ) -``` - -**Service Layer:** [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py#L196-L207) - -```python -if ( - item.assignedTo - and item.assignedTo != user_id - and item.status == GroundTruthStatus.draft -): - raise ValueError("Item is already assigned to another user") -``` - -### 2. Status Codes and Error Payload - -| Scenario | Status Code | Detail Message | -|----------|-------------|----------------| -| Item already assigned to another user (draft) | **409 Conflict** | `"This item is already assigned to another user."` | -| Item not found | **404 Not Found** | `"The requested item could not be found or has been deleted."` | -| Other validation failures | **400 Bad Request** | `"Unable to assign this item. Please check the item status and try again."` | - -**Current Payload Structure:** -```json -{ - "detail": "This item is already assigned to another user." -} -``` - -**Gap Identified:** The payload does NOT include: -- Error code (e.g., `ASSIGNMENT_CONFLICT`) -- Current assignee identity (`assignedTo`) -- Structured error object - -The PRD (SA-825) explicitly requires: -> "Backend returns a specific status code (e.g., 409 Conflict) and a structured error payload (e.g., code + assignedTo) so the frontend can render the correct UX." - -### 3. Frontend Error Handling for Assignments - -**Location:** [frontend/src/demo.tsx](frontend/src/demo.tsx#L184-L213) - -```tsx -onAssign={async (item) => { - try { - await assignItem(item.datasetName, item.bucket, item.id); - toast("success", `Assigned ${item.id} for curation`); - } catch (error) { - const message = - error instanceof Error - ? error.message - : "Failed to assign item"; - toast("error", message); - } -}} -``` - -**Service Layer:** [frontend/src/services/assignments.ts](frontend/src/services/assignments.ts#L64-L76) - -```typescript -export async function assignItem( - dataset: string, - bucket: string, - itemId: string, -): Promise { - const { data, error } = await client.POST( - "/v1/assignments/{dataset}/{bucket}/{item_id}/assign", - { params: { path: { dataset, bucket, item_id: itemId } } }, - ); - if (error) throw error; - return data as unknown as GroundTruthItemOut; -} -``` - -**Gap Identified:** The frontend: -1. Throws the raw error object from `openapi-fetch` -2. Only extracts `error.message` which may not contain the backend's `detail` -3. Does NOT check status codes or parse structured error responses -4. Falls back to generic "Failed to assign item" message - -### 4. Toast/Notification System - -**Location:** [frontend/src/hooks/useToasts.ts](frontend/src/hooks/useToasts.ts) - -The toast system supports: -- **Types:** `success`, `error`, `info` -- **Actionable buttons:** `actionLabel` and `onAction` callback -- **Auto-dismiss:** Configurable duration (default 3500ms) - -```typescript -export type Toast = { - id: string; - kind: "success" | "error" | "info"; - msg: string; - actionLabel?: string; // ← Supports action buttons - onAction?: () => void; // ← Callback for action -}; - -const showToast = useCallback( - (kind: Toast["kind"], msg: string, opts?: ShowOptions) => { ... }, - [dismiss], -); -``` - -**Toast Component:** [frontend/src/components/common/Toasts.tsx](frontend/src/components/common/Toasts.tsx) - -The UI renders action buttons when provided: -```tsx -{t.actionLabel && t.onAction && ( - -)} -``` - -### 5. Assignment Logic Locations - -#### Backend - -| Component | File | Purpose | -|-----------|------|---------| -| API Route | [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py#L258) | `POST /v1/assignments/{dataset}/{bucket}/{item_id}/assign` | -| Service | [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py#L175) | `assign_single_item()` - validation & orchestration | -| Repository | [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1859) | `assign_to()` - database operations | -| Error Classes | [backend/app/core/errors.py](backend/app/core/errors.py) | `ConflictError(HTTPException)` - not currently used for assignments | - -#### Frontend - -| Component | File | Purpose | -|-----------|------|---------| -| Service | [frontend/src/services/assignments.ts](frontend/src/services/assignments.ts#L64) | `assignItem()` - API call | -| Main App | [frontend/src/demo.tsx](frontend/src/demo.tsx#L184) | `onAssign` handler with error display | -| Toast Hook | [frontend/src/hooks/useToasts.ts](frontend/src/hooks/useToasts.ts) | Toast state management | -| Toast UI | [frontend/src/components/common/Toasts.tsx](frontend/src/components/common/Toasts.tsx) | Toast rendering | - ---- - -## Gap Analysis - -| Requirement (SA-825) | Current State | Gap | -|---------------------|---------------|-----| -| Clear, specific error message in UI | Generic "Failed to assign item" | ❌ Backend message not surfaced | -| Toast includes action to view assignee | No action button shown | ❌ Not implemented | -| Assignee identity surfaced | Not included in error response | ❌ Backend doesn't return `assignedTo` | -| Structured error payload with code | Plain `detail` string only | ❌ No error code or assignee field | - ---- - -## Recommendations - -### Backend Changes - -1. **Enhance error response structure** in [assignments.py](backend/app/api/v1/assignments.py#L290-L295): - ```python - raise HTTPException( - status_code=409, - detail={ - "code": "ASSIGNMENT_CONFLICT", - "message": "This item is already assigned to another user.", - "assignedTo": item.assignedTo # Include current assignee - } - ) - ``` - -2. **Update OpenAPI spec** to document 409 response schema with error structure. - -### Frontend Changes - -1. **Parse error responses** in [assignments.ts](frontend/src/services/assignments.ts#L64-L76) to extract status code and detail: - ```typescript - if (error?.status === 409) { - throw new AssignmentConflictError(error.body.detail); - } - ``` - -2. **Show specific toast with action** in [demo.tsx](frontend/src/demo.tsx#L206-L211): - ```typescript - toast("error", `Assigned to ${assignee}`, { - actionLabel: "View", - onAction: () => showAssigneeProfile(assignee) - }); - ``` - ---- - -## Related Documentation - -- [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md) - Endpoint specification -- [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md) - API change guidelines -- [prd-refined-2.json](prd-refined-2.json) - SA-825 requirements - -## Test Coverage - -Existing integration test: [backend/tests/integration/test_assignments_assign_single_cosmos.py](backend/tests/integration/test_assignments_assign_single_cosmos.py#L77-L93) - -```python -async def test_assign_single_item_already_assigned(...): - """Test assigning an item already assigned to another user returns 409.""" - # Verifies 409 status code for conflict scenario - r = await async_client.post(f"/v1/assignments/{ds}/{bucket}/{item_id}/assign", ...) - assert r.status_code == 409 -``` diff --git a/.copilot-tracking/subagent/20260122/assignment-takeover-research.md b/.copilot-tracking/subagent/20260122/assignment-takeover-research.md deleted file mode 100644 index 866a30a..0000000 --- a/.copilot-tracking/subagent/20260122/assignment-takeover-research.md +++ /dev/null @@ -1,262 +0,0 @@ -# Assignment Takeover Research - -**Date:** 2026-01-22 -**Topic:** Assignment takeover system - allowing SMEs to reassign items currently assigned to others -**Issue Reference:** SA-721 - ---- - -## Executive Summary - -The current system **blocks** assignment of draft items that belong to another user (409 Conflict). There is **no existing force/takeover logic** in the codebase. The backend has a clear validation checkpoint that could be modified to accept a `force` parameter. The frontend uses `window.confirm()` for confirmation dialogs throughout the codebase. - ---- - -## 1. Current Assignment Flow and Data Model - -### Assignment Data Model - -**GroundTruthItem** (in [backend/app/domain/models.py](backend/app/domain/models.py)): -```python -assignedTo: Optional[str] = Field(default=None, alias="assignedTo") -assigned_at: Optional[datetime] = Field(default=None, alias="assignedAt") -``` - -**AssignmentDocument** (materialized view for fast per-user queries): -```python -class AssignmentDocument(BaseModel): - id: str # stable id: "||" - pk: str # SME user id (partition key) - ground_truth_id: str - datasetName: str - bucket: UUID - docType: str = "sme-assignment" - schemaVersion: str = "v1" -``` - -### Assignment Flow - -1. **Self-serve assignment** (`POST /v1/assignments/self-serve`): - - Samples unassigned items from the pool - - Assigns batch to requesting user - - Creates `AssignmentDocument` for each item - -2. **Single-item assignment** (`POST /v1/assignments/{dataset}/{bucket}/{item_id}/assign`): - - User explicitly selects an item to work on - - Validates assignability (see conflict handling below) - - Sets `assignedTo`, `assignedAt`, `status=draft` - - Creates/updates `AssignmentDocument` - ---- - -## 2. Backend Conflict Handling - -### Current Validation Logic - -Location: [backend/app/services/assignment_service.py#L199-L210](backend/app/services/assignment_service.py#L199-L210) - -```python -# Validate item can be assigned -# Don't allow assignment of items already assigned to another user in draft state -if ( - item.assignedTo - and item.assignedTo != user_id - and item.status == GroundTruthStatus.draft -): - logger.warning( - f"assignment_service.assign_single_item.already_assigned - ..." - ) - raise ValueError("Item is already assigned to another user") -``` - -### Assignment Rules (from [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md)): - -| Scenario | Current Behavior | -|----------|------------------| -| Unassigned draft items | Can be assigned ✅ | -| Items assigned to another user (draft) | **Cannot be assigned (409 Conflict)** ❌ | -| Skipped items | Can be reassigned ✅ | -| Approved items | Can be assigned (moves to draft) ✅ | -| Deleted items | Can be assigned (moves to draft) ✅ | - -### Repository Layer - -Location: [backend/app/adapters/repos/cosmos_repo.py#L1719](backend/app/adapters/repos/cosmos_repo.py#L1719) - -The `assign_to()` method is **state-agnostic** - it performs the assignment unconditionally. The state validation happens in the **service layer**, not the repository. - -Filter predicate in Cosmos patch operation: -```python -filter_predicate = ( - f"FROM c WHERE (c.assignedTo = null OR c.assignedTo = '' " - f"OR c.assignedTo = '{user_id}' OR c.status != 'draft')" -) -``` - -This prevents reassigning draft items at the database level too, but could be modified for force-assign scenarios. - ---- - -## 3. Existing Force/Override Logic - -**Finding: No existing force/override parameter exists.** - -The current system has no mechanism to bypass the 409 Conflict for draft items assigned to others. The workaround mentioned in the PRD is: -> "delete the relevant assignment doc from cosmos and update the assignedTo field on the groundTruth doc" - ---- - -## 4. Frontend Confirmation Dialog Patterns - -The frontend uses **native `window.confirm()`** dialogs throughout. There are no custom modal confirmation components. - -### Examples Found - -1. **Unsaved changes warning** ([frontend/src/hooks/useGroundTruth.ts#L390](frontend/src/hooks/useGroundTruth.ts#L390)): - ```typescript - const confirmed = window.confirm( - "You have unsaved changes. Switching items will discard them. Continue?", - ); - ``` - -2. **Tag removal** ([frontend/src/components/app/editor/TagsEditor.tsx#L65](frontend/src/components/app/editor/TagsEditor.tsx#L65)): - ```typescript - const ok = window.confirm(`Remove tag "${tag}"?`); - ``` - -3. **Turn deletion** ([frontend/src/components/app/editor/MultiTurnEditor.tsx#L161](frontend/src/components/app/editor/MultiTurnEditor.tsx#L161)): - ```typescript - if (window.confirm("Are you sure you want to delete this turn?")) { - ``` - -4. **Reference removal** ([frontend/src/components/app/pages/ReferencesSection.tsx#L95](frontend/src/components/app/pages/ReferencesSection.tsx#L95)): - ```typescript - window.confirm(`Remove reference "${name}"? You can Undo for 8s.`) - ``` - -5. **External link confirmation** ([frontend/src/components/modals/InspectItemModal.tsx](frontend/src/components/modals/InspectItemModal.tsx)): - ```typescript - const confirmed = confirm( - `You are about to visit an external website:\n\n${parsedUrl.hostname}\n\nDo you want to continue?`, - ); - ``` - -### Modal Infrastructure - -- [frontend/src/hooks/useModalKeys.ts](frontend/src/hooks/useModalKeys.ts) - Keyboard handling for modals (Escape to close, Enter to confirm) -- [frontend/src/components/modals/ModalPortal.tsx](frontend/src/components/modals/ModalPortal.tsx) - Portal for rendering modals -- [frontend/src/components/modals/InspectItemModal.tsx](frontend/src/components/modals/InspectItemModal.tsx) - Example full modal implementation - ---- - -## 5. Assignment Document Structure in Cosmos - -### Container: `assignments` -- **Partition Key:** `/pk` (user ID with prefix `sme:{userId}`) - -### Document Structure -```json -{ - "id": "{datasetName}|{bucket}|{itemId}", - "pk": "sme:{userId}", - "ground_truth_id": "{itemId}", - "datasetName": "{datasetName}", - "bucket": "{uuid}", - "docType": "sme-assignment", - "schemaVersion": "v1" -} -``` - -### Related Operations - -- **Create/Update:** `repo.upsert_assignment_doc(user_id, item)` -- **Delete:** `repo.delete_assignment_doc(user_id, dataset, bucket, ground_truth_id)` -- **List by user:** `repo.list_assignments_by_user(user_id)` - ---- - -## 6. Implementation Recommendations - -### Backend Changes - -1. **Add `force` parameter to `assign_single_item`:** - ```python - async def assign_single_item( - self, dataset: str, bucket: UUID, item_id: str, user_id: str, - force: bool = False # NEW - ) -> GroundTruthItem: - ``` - -2. **Modify validation logic:** - ```python - if ( - item.assignedTo - and item.assignedTo != user_id - and item.status == GroundTruthStatus.draft - and not force # NEW: skip check if force=True - ): - raise ValueError("Item is already assigned to another user") - ``` - -3. **Clean up old assignment document:** - When force-assigning, delete the previous user's `AssignmentDocument` before creating the new one. - -4. **Update API endpoint:** - Accept `force` parameter in request body: - ```python - @router.post("/{dataset}/{bucket}/{item_id}/assign", status_code=200) - async def assign_item( - dataset: str, - bucket: UUID, - item_id: str, - body: dict[str, Any] = {}, # NEW: accept { force: true } - user: UserContext = Depends(get_current_user), - ) -> GroundTruthItem: - ``` - -### Frontend Changes - -1. **Catch 409 Conflict** in the assign service call -2. **Show confirmation dialog** with current assignee info: - ```typescript - const confirmed = window.confirm( - `This item is currently assigned to ${currentAssignee}. ` + - `Do you want to take over this assignment?` - ); - ``` -3. **Retry with `force: true`** if user confirms - -### API Contract - -**Request:** -```http -POST /v1/assignments/{dataset}/{bucket}/{item_id}/assign -Content-Type: application/json - -{ "force": true } -``` - -**Response:** Same as current (updated `GroundTruthItem`) - ---- - -## 7. Related Issues - -- **SA-721:** "GTC: Re-think assignment limitations (unassign, vacation, etc.)" - - Desired behavior from PRD: - 1. When a ground truth is already assigned to someone else, a different SME can choose "Assign to me anyway" - 2. UI prompts for confirmation before taking over the assignment - 3. After confirmation, assignment is transferred to the current user and the UI reflects the new assignee - ---- - -## Key Files Reference - -| Component | File | -|-----------|------| -| Assignment Service | [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py) | -| Assignment API Routes | [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py) | -| Cosmos Repository | [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py) | -| Domain Models | [backend/app/domain/models.py](backend/app/domain/models.py) | -| Frontend Assignment Service | [frontend/src/services/assignments.ts](frontend/src/services/assignments.ts) | -| Design Doc | [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md) | diff --git a/.copilot-tracking/subagent/20260122/assignment-workflow-research.md b/.copilot-tracking/subagent/20260122/assignment-workflow-research.md deleted file mode 100644 index 05f7aca..0000000 --- a/.copilot-tracking/subagent/20260122/assignment-workflow-research.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -topic: assignment-workflow -jtbd: JTBD-001 -date: 2026-01-22 -status: complete ---- - -# Research: Assignment Workflow - -## Context - -The assignment workflow enables users to request, claim, and complete curation work items with ownership and optimistic concurrency protections. - -## Sources Consulted - -### URLs -- (None) - -### Codebase -- [backend/CODEBASE.md](backend/CODEBASE.md): Documents the assignment endpoints and ETag/soft-delete conventions. -- [frontend/README.md](frontend/README.md): Describes dev user simulation header usage. - -### Documentation -- [.copilot-tracking/research/20260121-high-level-requirements-research.md](.copilot-tracking/research/20260121-high-level-requirements-research.md): Consolidates observed requirements and cites detailed sources. -- [backend/docs/api-change-checklist-assignments.md](backend/docs/api-change-checklist-assignments.md): Captures intended stable API semantics for assignment-related write paths. -- [backend/docs/assign-single-item-endpoint.md](backend/docs/assign-single-item-endpoint.md): Defines the single-item self-assign behavior and conflict protection. - -## Key Findings - -1. The system supports a self-serve assignment flow that returns items to work on, and a “my assignments” view scoped to the current user. -2. Assignment write paths enforce optimistic concurrency via ETag (If-Match or equivalent) and return stable conflict semantics. -3. Assignment ownership is enforced for mutation endpoints with a stable ownership error when violated. -4. Status transitions that represent completing work (approve/skip/delete) clear assignment fields atomically. -5. Doc-only gaps exist in PRD artifacts, but they are not treated as current requirements when not reflected in code. - -## Existing Patterns - -| Pattern | Location | Relevance | -|---------|----------|-----------| -| ETag-based optimistic concurrency | [backend/CODEBASE.md](backend/CODEBASE.md) | Defines write preconditions and conflict behavior | -| Dev user simulation via header | [frontend/README.md](frontend/README.md) | Supports per-user assignment semantics in development | - -## Open Questions - -- (None) - -## Recommendations for Spec - -- Specify assignment lifecycle states and ownership/ETag requirements as stable contracts. -- Specify what “my assignments” returns (draft items assigned to the current user). -- Specify expected error behavior for ETag mismatch and ownership violations. diff --git a/.copilot-tracking/subagent/20260122/batch-validation-research.md b/.copilot-tracking/subagent/20260122/batch-validation-research.md deleted file mode 100644 index 3b50651..0000000 --- a/.copilot-tracking/subagent/20260122/batch-validation-research.md +++ /dev/null @@ -1,241 +0,0 @@ -# Batch Validation Research - -**Date:** 2026-01-22 -**Story:** SA-241 - Enhanced error information for batch import -**Status:** Complete - -## Research Questions Answered - -### 1. How does the current bulk import validate individual records? - -**Location:** [validation_service.py](../../../backend/app/services/validation_service.py) - -The validation flow has two stages: - -#### Stage 1: Pre-persistence validation (validation_service.py) - -```python -async def validate_bulk_items(items: list[GroundTruthItem]) -> dict[str, list[str]]: -``` - -- **Tag validation only**: Currently validates only `manualTags` against the tag registry -- **Concurrent validation**: Uses `asyncio.gather()` to validate all items concurrently -- **Caching**: Fetches tag registry once and passes to all validation calls -- **Error collection**: Returns `dict[item_id, list[errors]]` mapping - -**Current validation checks:** - -| Check | Field | Implementation | -|-------|-------|----------------| -| Tag existence | `manualTags` | Tags must exist in tag registry | -| Tag format | `manualTags` | Must match `group:value` pattern | -| Tag rules | `manualTags` | TAG_SCHEMA rules (e.g., uniqueness within group) | - -#### Stage 2: Persistence-time validation (cosmos_repo.py) - -```python -async def import_bulk_gt(self, items: list[GroundTruthItem], buckets: int | None = None) -> BulkImportResult: -``` - -- **409 Conflict**: Catches duplicate ID errors from Cosmos -- **Other Cosmos errors**: Generic error message with article URL and ID - -### 2. What error information is returned when records fail validation? - -**Response model:** `ImportBulkResponse` in [ground_truths.py#L30](../../../backend/app/api/v1/ground_truths.py#L30) - -```python -class ImportBulkResponse(BaseModel): - imported: int # Number of items successfully imported - errors: list[str] # List of error messages - uuids: list[str] # IDs in request order (includes failed items) -``` - -**Current error message formats:** - -| Source | Format | Example | -|--------|--------|---------| -| Tag validation | `"Item '{item_id}': Error {message}"` | `"Item 'test-2': Error Unknown tag 'invalid:tag'."` | -| Duplicate (409) | `"exists (article: {url}, id: {id})"` | `"exists (article: http://..., id: abc-123)"` | -| Cosmos error | `"create_failed (article: {url}, id: {id}): {message}"` | `"create_failed (article: unknown, id: xyz): RU exceeded"` | - -**Gaps identified:** - -1. No structured error format - errors are strings, not objects -2. No field-level error information -3. No row/index reference for correlation -4. No error code for programmatic handling -5. Pydantic validation errors (if any bypass) would return 422, not included in errors array - -### 3. Is Cosmos batch/transactional batch being used, or individual creates? - -**Answer: Individual creates (1-by-1)** - -**Location:** [cosmos_repo.py#L486](../../../backend/app/adapters/repos/cosmos_repo.py#L486) - -```python -# sequential create to keep simple and clear errors -for it in items: - doc = self._to_doc(it) - try: - await gt.create_item(doc) # Individual create - success += 1 - except CosmosHttpResponseError as e: - # Error handling... -``` - -**Current behavior:** - -- Items are created **sequentially** in a loop -- No transactional batch support -- Partial success is possible (some items succeed, some fail) -- No rollback capability - -**Cosmos SDK batch capabilities NOT used:** - -- `container.execute_batch()` - not used -- `TransactionalBatch` - not used -- Bulk executor - not used - -### 4. What's the current ImportBulkResponse structure? - -**Location:** [models.py#L182](../../../backend/app/domain/models.py#L182) - -```python -class BulkImportResult(BaseModel): # Internal model - imported: int = 0 - errors: list[str] = Field(default_factory=list) - -class ImportBulkResponse(BaseModel): # API response - imported: int # Number of items successfully imported - errors: list[str] # List of error messages for failed items - uuids: list[str] # IDs in same order as request -``` - -**Example successful response:** - -```json -{ - "imported": 2, - "errors": [], - "uuids": ["item-1", "item-2"] -} -``` - -**Example partial failure response:** - -```json -{ - "imported": 1, - "errors": ["Item 'item-2': Error Unknown tag 'bad:tag'."], - "uuids": ["item-1", "item-2"] -} -``` - -## Current Error Handling Flow - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ import_bulk() │ -├─────────────────────────────────────────────────────────────────┤ -│ 1. Generate IDs for items missing them (randomname) │ -│ 2. validate_bulk_items() ─► Tag validation │ -│ ├─ Fetch tag registry once │ -│ ├─ Validate each item's manualTags │ -│ └─ Return dict[item_id, errors] │ -│ 3. Filter out invalid items │ -│ 4. Apply computed tags to valid items │ -│ 5. container.repo.import_bulk_gt() ─► Cosmos persistence │ -│ ├─ Loop: create_item() for each │ -│ ├─ Catch 409: append "exists" error │ -│ └─ Catch other: append "create_failed" error │ -│ 6. Merge validation errors + persistence errors │ -│ 7. Return ImportBulkResponse │ -└─────────────────────────────────────────────────────────────────┘ -``` - -## Identified Gaps for SA-241 - -### Gap 1: Unstructured error messages - -**Current:** Plain strings -**Needed:** Structured error objects with: - -- `index`: Row number in original request -- `itemId`: The item's ID -- `field`: Which field failed (if applicable) -- `code`: Error code for programmatic handling -- `message`: Human-readable message - -### Gap 2: No batch processing - -**Current:** Sequential `create_item()` calls -**Needed:** Cosmos transactional batch for: - -- Better performance (single network round-trip) -- Atomic operations within partition -- RU efficiency - -### Gap 3: Limited validation scope - -**Current:** Only manualTags validated -**Needed:** Consider validating: - -- Required fields -- Field length limits -- Reference URL format -- Custom business rules - -### Gap 4: No partial rollback capability - -**Current:** Items persist as they succeed -**Needed:** Consider all-or-nothing mode option - -### Gap 5: No validation summary - -**Current:** Just error list -**Needed:** Summary stats: - -- Total items received -- Validation failures count -- Persistence failures count -- By-field error breakdown - -## Recommendations for SA-241 - -1. **Define structured error model:** - - ```python - class ImportError(BaseModel): - index: int - itemId: str | None - field: str | None - code: str # e.g., "INVALID_TAG", "DUPLICATE_ID" - message: str - ``` - -2. **Enhance ImportBulkResponse:** - - ```python - class ImportBulkResponse(BaseModel): - imported: int - failed: int - total: int - errors: list[ImportError] # Structured errors - uuids: list[str] - ``` - -3. **Consider Cosmos batch operations** for performance (separate story) - -4. **Add validation for additional fields** as needed - -## Files Analyzed - -| File | Purpose | -|------|---------| -| [ground_truths.py](../../../backend/app/api/v1/ground_truths.py) | API endpoint, response models | -| [validation_service.py](../../../backend/app/services/validation_service.py) | Pre-persistence validation | -| [cosmos_repo.py](../../../backend/app/adapters/repos/cosmos_repo.py) | Database operations | -| [models.py](../../../backend/app/domain/models.py) | Domain models | -| [tagging_service.py](../../../backend/app/services/tagging_service.py) | Tag validation logic | -| [test_bulk_import_tag_validation.py](../../../backend/tests/unit/test_bulk_import_tag_validation.py) | Test coverage | diff --git a/.copilot-tracking/subagent/20260122/ci-code-quality-research.md b/.copilot-tracking/subagent/20260122/ci-code-quality-research.md deleted file mode 100644 index d404446..0000000 --- a/.copilot-tracking/subagent/20260122/ci-code-quality-research.md +++ /dev/null @@ -1,219 +0,0 @@ -# CI Code Quality Research - -**Date:** 2026-01-22 -**Topic:** ci-code-quality -**Jira:** SA-745 - Enforce formatting and linters in CI, reconcile drift - ---- - -## Summary - -The repository has established linting and formatting tooling for both backend (Python) and frontend (TypeScript), with backend pre-commit hooks configured but **no frontend pre-push hooks**. There is **active drift in the frontend** that needs reconciliation before CI enforcement. - ---- - -## 1. Backend (Python) Configuration - -### Package Manager - -- **uv** - Modern Python package manager from Astral -- Lock file: [backend/uv.lock](backend/uv.lock) - -### Linting/Formatting Tools - -| Tool | Purpose | Configuration | -|------|---------|---------------| -| **Ruff** | Linting + formatting | [backend/pyproject.toml](backend/pyproject.toml) `[tool.ruff]` | -| **Black** | Formatting (legacy, likely superseded by ruff) | `[tool.black]` section | -| **ty** | Type checking | `[tool.ty]` section | -| **Vulture** | Dead code detection | `[tool.vulture]` section | - -### Ruff Configuration - -```toml -[tool.ruff] -line-length = 100 - -[tool.ruff.lint] -select = [ - "F", # Pyflakes (F401, F841, F811, etc.) - "ERA", # Commented code (ERA001) - "RUF059", # Unused unpacked variables -] -``` - -### Pre-commit Hooks (Backend Only) - -File: [backend/.pre-commit-config.yaml](backend/.pre-commit-config.yaml) - -| Hook | Stage | Scope | -|------|-------|-------| -| `ruff-format` | pre-commit | `^backend/.*\.py$` | -| `ruff` (lint + fix) | pre-commit | `^backend/.*\.py$` | -| `ty` | pre-commit | `^backend/app/.*\.py$` | -| `pytest` | **pre-push** | Backend tests | - -### Current Drift Status - -``` -✅ Backend lint: All checks passed! -✅ Backend format: 67 files already formatted -``` - -**No drift in backend.** - ---- - -## 2. Frontend (TypeScript) Configuration - -### Package Manager - -- **npm** - Standard Node.js package manager -- Lock file: [frontend/package-lock.json](frontend/package-lock.json) - -### Linting/Formatting Tools - -| Tool | Purpose | Configuration | -|------|---------|---------------| -| **Biome** | Linting + formatting | [frontend/biome.json](frontend/biome.json) | -| **TypeScript** | Type checking | `tsc -b` via npm script | -| **Knip** | Dead code detection | [frontend/knip.json](frontend/knip.json) | - -### Biome Configuration - -```json -{ - "formatter": { "enabled": true }, - "linter": { - "enabled": true, - "rules": { - "correctness": { - "noUnusedImports": "error", - "noUnusedVariables": "warn", - "noUnusedFunctionParameters": "warn", - "noUnusedPrivateClassMembers": "warn" - } - } - } -} -``` - -### NPM Scripts - -```json -{ - "lint": "biome check --write", - "typecheck": "tsc -b --pretty false" -} -``` - -### Pre-commit/Pre-push Hooks - -**None configured.** No husky, lefthook, or lint-staged packages present. - -### Current Drift Status - -``` -❌ Frontend: Found 31 errors (formatting + organize imports) -``` - -**Active drift detected:** - -- 2 config files need formatting (`biome.json`, `knip.json`) -- Multiple source files have import organization issues -- Formatting issues in `vitest.config.ts` and source files - ---- - -## 3. CI Workflow Analysis - -File: [.github/workflows/gtc-ci.yml](.github/workflows/gtc-ci.yml) - -### Current CI Checks - -| Check | Type | Status | -|-------|------|--------| -| Backend unit tests | pytest | ✅ Runs | -| Backend integration tests | pytest | ✅ Runs | -| `ty check app` | Type checking | ✅ Runs | -| OpenAPI spec freshness | git diff | ✅ Runs | -| Frontend types check | `api:types:check` | ✅ Runs | -| Frontend tests | vitest | ✅ Runs | -| **Backend lint/format** | ruff | ❌ **Not in CI** | -| **Frontend lint/format** | biome | ❌ **Not in CI** | - -### Missing CI Jobs - -1. **Backend linting:** `uv run ruff check app` -2. **Backend formatting:** `uv run ruff format app --check` -3. **Frontend linting:** `npx biome check` - ---- - -## 4. Recommendations for SA-745 - -### Phase 1: Reconcile Drift - -1. Run `npm run lint` in frontend to auto-fix 31 errors -2. Commit formatting fixes separately for clean history - -### Phase 2: Add CI Enforcement - -Add to `.github/workflows/gtc-ci.yml`: - -```yaml -- name: Backend lint - working-directory: backend - run: uv run ruff check app - -- name: Backend format check - working-directory: backend - run: uv run ruff format app --check - -- name: Frontend lint - working-directory: frontend - run: npx biome check -``` - -### Phase 3: Add Frontend Pre-push Hooks - -Options: - -1. **Husky** - Most popular, npm-based -2. **Lefthook** - Fast, language-agnostic -3. **Extend pre-commit** - Add frontend hooks to existing backend config - -Recommended: Extend existing `pre-commit` framework (already in dev dependencies) with frontend hooks. - -### Phase 4: Environment Alignment - -- Document required tool versions in README -- Consider adding `engines` field to `package.json` -- Ensure `pre-commit install` is documented in setup instructions - ---- - -## 5. File References - -| File | Purpose | -|------|---------| -| [backend/pyproject.toml](backend/pyproject.toml) | Python tools config | -| [backend/.pre-commit-config.yaml](backend/.pre-commit-config.yaml) | Pre-commit hooks | -| [frontend/biome.json](frontend/biome.json) | Biome linter/formatter config | -| [frontend/package.json](frontend/package.json) | NPM scripts and dependencies | -| [.github/workflows/gtc-ci.yml](.github/workflows/gtc-ci.yml) | CI workflow | - ---- - -## 6. Quick Fix Commands - -```bash -# Fix frontend drift -cd frontend && npm run lint - -# Verify backend is clean -cd backend && uv run ruff check app && uv run ruff format app --check - -# Install pre-commit hooks (backend) -cd backend && uv run pre-commit install -``` diff --git a/.copilot-tracking/subagent/20260122/code-conventions-research.md b/.copilot-tracking/subagent/20260122/code-conventions-research.md deleted file mode 100644 index 5996a64..0000000 --- a/.copilot-tracking/subagent/20260122/code-conventions-research.md +++ /dev/null @@ -1,250 +0,0 @@ -# Code Conventions Research - -**Research Date:** 2025-01-22 -**Related Jira Stories:** SA-249, SA-250, SA-245 - ---- - -## Executive Summary - -This research identifies patterns requiring standardization across three areas: -1. **Pydantic models vs JSON dump** - Limited issues; most API endpoints correctly return Pydantic models -2. **Exception handling** - Significant use of generic `Exception` catches that could use specific Cosmos error types -3. **Logging patterns** - Two `print()` statements in app code; mature logging infrastructure using `extra={}` pattern - ---- - -## 1. JSON Dumps vs Pydantic Models (SA-249) - -### Findings - -The codebase generally handles Pydantic models correctly. FastAPI endpoints return Pydantic models directly, letting FastAPI handle JSON serialization. - -#### Locations Using `json.dumps()` or `model_dump()` - -| File | Line | Context | Assessment | -|------|------|---------|------------| -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L409) | 409 | `model_dump(mode="json", by_alias=True)` | **Appropriate** - Preparing data for Cosmos DB storage | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1068) | 1068 | `model_dump(mode="json", by_alias=True)` | **Appropriate** - Document upsert | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1917) | 1917 | `model_dump(mode="json", by_alias=True)` | **Appropriate** - Assignment document upsert | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L380) | 380 | `json.loads(json.dumps(sanitized, ensure_ascii=True))` | **Appropriate** - Unicode sanitization workaround | -| [snapshot_service.py](backend/app/services/snapshot_service.py#L81) | 81 | `model_dump(mode="json", ...)` for export items | **Appropriate** - Export formatting | -| [inference.py](backend/app/adapters/inference/inference.py#L772) | 772 | `json.dumps({"error": str(e)})` | **Review** - Error response in retrieval tool | - -#### Bucket UUID to String Coercion - -| File | Line | Context | -|------|------|---------| -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L411) | 411 | `d["bucket"] = str(d["bucket"])` - Converting for Cosmos storage | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1058) | 1058 | `str(bucket)` - Partition key construction | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1272) | 1272 | `str(it.bucket)` - Partition key for delete | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1762) | 1762 | `str(bucket)` - Partition key construction | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1837) | 1837 | `str(bucket)` - Partition key construction | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1907) | 1907 | `str(gt.bucket)` - Document ID construction | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1975) | 1975 | `str(bucket)` - Item ID construction | - -**Assessment:** Bucket-to-string conversion happens at the repository layer for Cosmos DB compatibility. This is appropriate since Cosmos DB partition keys must be strings. The Pydantic models properly type `bucket` as `UUID`, and conversion only happens at persistence boundaries. - -### Recommendation - -- **No changes required** for JSON serialization patterns in API layer -- Repository-level `model_dump()` and `str(bucket)` conversions are appropriate for persistence -- Consider documenting the pattern: "Models remain typed; string conversion only at persistence boundary" - ---- - -## 2. Generic Exception Catches (SA-250) - -### Locations in App Code - -The codebase has extensive use of generic `Exception` catches. Most are intentional defensive patterns with pragmatic comments, but some could benefit from more specific error types. - -#### High-Priority (Cosmos-related operations) - -| File | Line | Context | Recommendation | -|------|------|---------|----------------| -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L113) | 113 | Generic exception in repo | Use `CosmosHttpResponseError` | -| [container.py](backend/app/container.py#L83) | 83 | Credential building | Keep generic (import failures) | -| [container.py](backend/app/container.py#L127) | 127 | Search init | Keep generic (optional feature) | -| [container.py](backend/app/container.py#L260) | 260 | Inference init | Keep generic (optional feature) | - -#### API Layer Exception Catches - -| File | Line | Context | Recommendation | -|------|------|---------|----------------| -| [search.py](backend/app/api/v1/search.py#L22) | 22 | Search endpoint | Add specific error handling | -| [ground_truths.py](backend/app/api/v1/ground_truths.py#L308) | 308 | Status parsing | Keep generic (data validation) | -| [ground_truths.py](backend/app/api/v1/ground_truths.py#L483) | 483 | Tag recompute | Add specific error types | -| [assignments.py](backend/app/api/v1/assignments.py#L149) | 149 | Assignment update | Use `CosmosHttpResponseError` | -| [assignments.py](backend/app/api/v1/assignments.py#L238) | 238 | Assignment update | Use `CosmosHttpResponseError` | -| [chat.py](backend/app/api/v1/chat.py#L133) | 133 | Chat endpoint | Commented as safeguard | -| [chat.py](backend/app/api/v1/chat.py#L151) | 151 | Chat endpoint | Keep generic (multi-service) | -| [tags.py](backend/app/api/v1/tags.py#L83) | 83 | Tags endpoint | Add specific error types | - -#### Startup/Lifecycle (main.py) - -The [main.py](backend/app/main.py) file has numerous generic `Exception` catches (lines 77, 80, 114, 130, 155, 162, 170, 175, 197, 229). These are intentional "never block startup" patterns and should remain generic. - -#### Codebase Already Using CosmosHttpResponseError - -The codebase demonstrates proper usage in several places: - -```python -# cosmos_repo.py -from azure.cosmos.exceptions import CosmosHttpResponseError, CosmosResourceNotFoundError -``` - -Used correctly in: -- [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L488) - Line 488 -- [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1060) - Line 1060 -- [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1091) - Line 1091 - -### Recommendation - -1. **Keep generic** exceptions in: - - Startup/lifecycle code (main.py) - - Optional feature initialization (container.py) - - Third-party library error handling - -2. **Replace with specific** exceptions in: - - Repository operations interacting with Cosmos - - API endpoints that call Cosmos operations - - Use `CosmosHttpResponseError` and `CosmosResourceNotFoundError` - ---- - -## 3. Print Statements (SA-245) - -### Locations in App Code - -Only **2 print statements** exist in the main app code: - -| File | Line | Code | Recommendation | -|------|------|------|----------------| -| [main.py](backend/app/main.py#L122) | 122 | `print(APP_VERSION)` | Replace with `logger.info("app.version", extra={"version": APP_VERSION})` | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L401) | 401 | `print(item.__repr__())` | Replace with `logger.error("repo.invalid_item", extra={"item": item.__repr__()})` | - -### Scripts with Print Statements (Lower Priority) - -Scripts in `backend/scripts/` use `print()` extensively for CLI output: -- `cosmos_container_manager.py` - CLI progress output -- `cosmos_export_import.py` - Migration logging -- `delete_cosmos_emulator_dbs.py` - Cleanup status -- `init_seed_data.py` - Seed data feedback - -**Assessment:** Script print statements are appropriate for CLI tools and don't need conversion. - ---- - -## 4. Logging Patterns Analysis - -### Current Architecture - -The codebase has a mature logging infrastructure in [app/core/logging.py](backend/app/core/logging.py): - -#### Key Components - -1. **Setup Function** (`setup_logging`): - - Configures root logger with structured format - - Suppresses noisy Azure SDK logs - - Format: `%(asctime)s %(levelname)s %(name)s user=%(user_id)s %(message)s` - -2. **Trace Context Filter** (`_TraceContextFilter`): - - Injects `trace_id`, `span_id`, `user_id` into every log record - - Integrates with OpenTelemetry when available - -3. **User Identity Context**: - - `ContextVar` for current user ID - - `set_current_user()` / `clear_current_user()` functions - - Middleware automatically populates from Easy Auth or headers - -4. **Log Record Factory** (`_install_log_record_factory`): - - Custom factory ensures `user_id` attribute always exists - - Prevents `KeyError` when using `extra={"user_id": ...}` - -### The "Extra Field" Pattern (SA-245) - -The `extra={}` parameter is used throughout for structured logging: - -```python -# Example from assignment_service.py -logger.info( - "self_assign.assigned", - extra=self._log_context(it.id, it.datasetName), -) - -# Helper method creates consistent context -def _log_context(self, item_id: str | None = None, dataset: str | None = None) -> dict[str, str]: - context: dict[str, str] = {} - if item_id: - context["item_id"] = item_id - if dataset: - context["dataset"] = dataset - return context -``` - -#### Locations Using `extra={}` Pattern - -| File | Count | Notes | -|------|-------|-------| -| [assignment_service.py](backend/app/services/assignment_service.py) | 14 | Consistent `_log_context()` helper | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py) | 8+ | Various repo operations | -| [search_service.py](backend/app/services/search_service.py) | 1 | Search results | -| [tagging_service.py](backend/app/services/tagging_service.py) | 1 | Tag collision warning | -| [validation_service.py](backend/app/services/validation_service.py) | 2 | Validation logging | - -### Current Issues with Extra Pattern - -1. **Reserved field collision**: The `user_id` field is reserved by the log record factory. Using `extra={"user_id": ...}` would cause issues (documented in [assignment_service.py](backend/app/services/assignment_service.py#L27-L34)). - -2. **Inconsistent key naming**: Some use `item_id`, others use `itemId`; some use `count`, others use `candidate_count`. - -3. **Missing context helpers**: Only `AssignmentService` has a `_log_context()` helper; other services construct extra dicts inline. - -### Recommendations - -1. **Standardize key names** across all services (snake_case recommended) -2. **Create shared logging context helper** in `app/core/logging.py` -3. **Document reserved keys** (`user_id`, `trace_id`, `span_id`) -4. **Consider structured logging library** (e.g., `structlog`) for better JSON output in production - ---- - -## 5. Summary of Required Changes - -### Immediate (Low Effort) - -| Priority | File | Change | -|----------|------|--------| -| High | [main.py#L122](backend/app/main.py#L122) | Replace `print(APP_VERSION)` with logger | -| High | [cosmos_repo.py#L401](backend/app/adapters/repos/cosmos_repo.py#L401) | Replace `print(item.__repr__())` with logger | - -### Medium-Term (Moderate Effort) - -| Priority | Scope | Change | -|----------|-------|--------| -| Medium | API endpoints | Replace generic `Exception` with `CosmosHttpResponseError` where appropriate | -| Medium | Logging | Standardize extra field key naming convention | -| Low | Logging | Create shared `_log_context()` helper | - -### No Changes Required - -- Pydantic model return patterns (already correct) -- Bucket UUID-to-string conversion (appropriate at persistence layer) -- Generic exceptions in startup/lifecycle code -- Print statements in CLI scripts - ---- - -## Appendix: Files Referenced - -- [backend/app/main.py](backend/app/main.py) -- [backend/app/core/logging.py](backend/app/core/logging.py) -- [backend/app/container.py](backend/app/container.py) -- [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py) -- [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py) -- [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py) -- [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py) -- [backend/app/api/v1/search.py](backend/app/api/v1/search.py) -- [backend/app/api/v1/chat.py](backend/app/api/v1/chat.py) -- [backend/app/api/v1/tags.py](backend/app/api/v1/tags.py) diff --git a/.copilot-tracking/subagent/20260122/concurrency-control-research.md b/.copilot-tracking/subagent/20260122/concurrency-control-research.md deleted file mode 100644 index a742fa6..0000000 --- a/.copilot-tracking/subagent/20260122/concurrency-control-research.md +++ /dev/null @@ -1,212 +0,0 @@ ---- -topic: concurrency-control -jtbd: JTBD-008 -date: 2026-01-22 -status: complete ---- - -# Research: Concurrency Control - -## Context - -The concurrency control mechanism prevents race conditions during simultaneous updates. This research examines how GTC handles concurrent modifications to ground-truth items and assignments, identifies potential race conditions, and documents Azure Cosmos DB's concurrency mechanisms. - -## Sources Consulted - -### Codebase - -- [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py): Main Cosmos DB repository implementation with ETag-based optimistic concurrency -- [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py): Assignment service with self-assign workflow -- [backend/app/api/v1/assignments.py](backend/app/api/v1/assignments.py): Assignment API endpoints with ETag enforcement -- [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py): Ground-truth API endpoints with ETag enforcement -- [backend/docs/user-self-serve-plan.md](backend/docs/user-self-serve-plan.md): Design document for concurrent assignment handling - -### Documentation - -- [Azure Cosmos DB: Transactions and Optimistic Concurrency Control](https://learn.microsoft.com/en-us/azure/cosmos-db/database-transactions-optimistic-concurrency): Official Microsoft documentation on ETag-based OCC -- [specs/assignment-workflow.md](specs/assignment-workflow.md): Spec documenting concurrency requirements (NFR-001) -- [specs/data-persistence.md](specs/data-persistence.md): Spec documenting ETag enforcement requirement (FR-005) -- [backend/CODEBASE.md](backend/CODEBASE.md): Documents ETag concurrency conventions - -### PR Review Comments - -- PR #21 review comments (URLs returned 404 - repository may be private or comments deleted) - -## Key Findings - -### 1. ETag-Based Optimistic Concurrency Is Implemented - -GTC uses Azure Cosmos DB's native `_etag` system property for optimistic concurrency control: - -- **All write paths require ETag**: Both `assignments.py` and `ground_truths.py` enforce ETag via `If-Match` header or `etag` body field -- **HTTP 412 on mismatch**: Returns "ETag mismatch" when server ETag differs from client-provided ETag -- **Conditional replace**: Uses `MatchConditions.IfNotModified` with `replace_item()` ([cosmos_repo.py#L854-L870](backend/app/adapters/repos/cosmos_repo.py#L854-L870)) - -### 2. Assignment Uses Patch with Filter Predicate (Production) - -For production Cosmos DB, assignments use atomic patch operations with `filter_predicate`: - -```python -# From cosmos_repo.py _assign_to_with_patch() -filter_predicate = ( - f"FROM c WHERE (c.assignedTo = null OR c.assignedTo = '' " - f"OR c.assignedTo = '{user_id}' OR c.status != 'draft')" -) -``` - -This atomically enforces that items can only be assigned if: -- Item is unassigned (`assignedTo = null` or empty) -- Item is already assigned to requesting user -- Item is not in draft state (allowing re-assignment of completed items) - -### 3. Emulator Uses Read-Modify-Replace Pattern - -For Cosmos DB emulator (which doesn't support `filter_predicate`), GTC falls back to a read-modify-replace pattern ([cosmos_repo.py#L1322-L1378](backend/app/adapters/repos/cosmos_repo.py#L1322-L1378)): - -```python -# Conditional check happens in application code -can_assign = ( - not current_assigned_to - or current_assigned_to == "" - or current_assigned_to == user_id - or current_status != GroundTruthStatus.draft.value -) -``` - -**Risk**: The emulator path has a TOCTOU window between read and replace. - -### 4. Assignment Document Cleanup Is Non-Atomic - -When assignments complete (approve/skip/delete), the workflow: -1. Updates GroundTruthItem (clears `assignedTo`) -2. Deletes AssignmentDocument (separate operation) - -If step 2 fails, orphaned assignment docs may exist. The code handles this gracefully by logging errors but not failing the request ([assignments.py#L220-L240](backend/app/api/v1/assignments.py#L220-L240)). - -### 5. Self-Assign Handles Contention via Retry - -The self-assign workflow ([assignment_service.py#L36-L101](backend/app/services/assignment_service.py#L36-L101)): -- Samples candidates with 2x overfetch to handle contention -- Retries once with exclusion list if initial pass is short -- Individual assignment failures don't stop the batch - -## Race Condition Risks - -| Operation | Risk | Current Mitigation | Recommended Fix | -|-----------|------|-------------------|-----------------| -| Ground-truth update | Lost update if two users modify same item | ETag required on all writes; 412 on mismatch | **Adequate** - correctly implemented | -| Self-serve assignment | Two users claim same item | Patch with `filter_predicate` (atomic) | **Adequate for production**; emulator has TOCTOU | -| Single-item assign | Two users click assign simultaneously | Validates `assignedTo` before `assign_to()` | **Adequate** - `assign_to()` is atomic in production | -| Status transition | Concurrent approve/skip/delete | ETag enforced; separate users blocked by ownership | **Adequate** - ownership + ETag | -| Assignment doc cleanup | Orphaned docs if delete fails | Best-effort delete; logs error | **Low risk** - docs cleaned on next user query | -| Curation instructions | Two users update dataset instructions | ETag-based conditional replace | **Adequate** | -| Emulator assignment | TOCTOU between read and replace | None (emulator limitation) | Accept risk or use stored procedure | - -## Assignment Workflow Analysis - -### Current Flow - -``` -┌──────────────────────────────────────────────────────────────────┐ -│ Self-Serve Assignment │ -├──────────────────────────────────────────────────────────────────┤ -│ 1. sample_unassigned(limit * 2) │ -│ └─> Query for draft/skipped items where assignedTo is null │ -│ │ -│ 2. For each candidate: │ -│ ├─ assign_to(item_id, user_id) │ -│ │ └─> Patch with filter_predicate (atomic) │ -│ │ - Success: returns True │ -│ │ - 412/conflict: returns False │ -│ │ │ -│ └─ If success: upsert_assignment_doc() │ -│ └─> Creates materialized view doc in assignments container │ -│ │ -│ 3. Retry once with exclude_ids if still below limit │ -└──────────────────────────────────────────────────────────────────┘ -``` - -### Race Scenarios - -**Scenario A: Two users request assignments simultaneously** -- Both query `sample_unassigned()` and get overlapping candidate sets -- Each calls `assign_to()` with `filter_predicate` -- Cosmos DB ensures only one succeeds per item -- Losing user's request returns `False`, moves to next candidate -- **Result**: Safe - atomic at database level - -**Scenario B: User A assigns while User B updates same item** -- User A holds item with ETag `E1` -- User B assigns item (changes `assignedTo`) -- User A submits update with `E1` -- Cosmos rejects with 412 (ETag `E2` now on server) -- **Result**: Safe - ETag prevents lost update - -**Scenario C: Two tabs approve same item** -- Tab 1 and Tab 2 both load item with ETag `E1` -- Tab 1 approves -> succeeds, ETag becomes `E2` -- Tab 2 approves with `E1` -> 412 Precondition Failed -- **Result**: Safe - user sees conflict error - -**Scenario D (Emulator only): Assignment TOCTOU** -- User A reads item (unassigned) -- User B reads item (unassigned) -- User A writes `assignedTo=A` (succeeds) -- User B writes `assignedTo=B` (succeeds - no ETag check) -- **Result**: User A's assignment lost -- **Mitigation**: Emulator is development-only; production uses atomic patch - -## Azure Cosmos DB Concurrency Mechanisms - -From official documentation: - -### 1. Optimistic Concurrency Control (OCC) -- Every item has system-generated `_etag` property -- Updated automatically on every write -- Use `If-Match` header with `_etag` value for conditional writes -- Server returns 412 Precondition Failed on mismatch - -### 2. Patch Operations with Filter Predicate -- Atomic conditional update in single round-trip -- Filter evaluated server-side before applying patch -- Returns 412 if filter doesn't match - -### 3. Stored Procedures -- ACID transactions within a logical partition -- Automatic rollback on exception -- Useful for multi-item atomic operations - -### 4. Status Code Summary - -| Code | Meaning | Retry? | -|------|---------|--------| -| 409 | Conflict (duplicate ID or unique constraint) | No | -| 412 | Precondition Failed (ETag mismatch) | Read-then-retry | -| 449 | Transient write conflict | Yes with backoff | - -## Recommendations for Spec - -### Must Include - -1. **ETag enforcement on all writes**: Document that all update/delete operations require valid ETag; missing or mismatched ETag returns HTTP 412 -2. **Assignment atomicity**: Document that production uses Cosmos DB patch with `filter_predicate` for atomic assignment -3. **Ownership enforcement**: Document that only the assigned user can modify items in draft state -4. **Error handling contract**: Define stable error codes for 412 (ETag mismatch) and 409 (assignment conflict) - -### Should Include - -5. **Emulator limitations**: Note that emulator path has reduced concurrency guarantees (acceptable for development) -6. **Assignment document consistency**: Document that assignment docs are best-effort and may be orphaned temporarily -7. **Self-assign retry behavior**: Document overfetch and retry strategy for contention handling - -### Nice to Have - -8. **Monitoring guidance**: Recommend logging 412/409 rates to detect contention hotspots -9. **Client retry guidance**: Recommend exponential backoff on 412 with fresh read before retry -10. **Future: Stored procedure for multi-item transactions**: If cross-item atomicity needed (e.g., assignment + assignment doc creation), consider stored procedure - -## Open Questions - -1. Should the spec define a maximum retry count for clients on 412? -2. Is orphaned assignment document cleanup needed as a background job? -3. Should the emulator path use ETag-based replace instead of unconditional replace for better parity? diff --git a/.copilot-tracking/subagent/20260122/cosmos-indexing-research.md b/.copilot-tracking/subagent/20260122/cosmos-indexing-research.md deleted file mode 100644 index acd2524..0000000 --- a/.copilot-tracking/subagent/20260122/cosmos-indexing-research.md +++ /dev/null @@ -1,259 +0,0 @@ ---- -topic: cosmos-indexing -jtbd: JTBD-008 -date: 2026-01-22 -status: complete ---- - -# Research: Cosmos Indexing - -## Context - -The indexing strategy limits indexed fields to reduce write RU costs. This research examines the current Cosmos DB indexing policy, identifies queried fields, and recommends optimizations. - -## Sources Consulted - -### Codebase - -- [backend/scripts/indexing-policy.json](../../../backend/scripts/indexing-policy.json): The current indexing policy configuration -- [backend/app/adapters/repos/cosmos_repo.py](../../../backend/app/adapters/repos/cosmos_repo.py): All Cosmos DB queries and field access patterns -- [backend/app/domain/models.py](../../../backend/app/domain/models.py): Data model field definitions -- [backend/scripts/emulator_init.sh](../../../backend/scripts/emulator_init.sh): Container creation with indexing policy -- [.github/workflows/gtc-cd.yml](../../../.github/workflows/gtc-cd.yml): CI/CD indexing policy application - -### Documentation - -- [Azure Cosmos DB - Indexing policies](https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy): Comprehensive index configuration guide -- [Azure Cosmos DB - Optimize request cost](https://learn.microsoft.com/en-us/azure/cosmos-db/optimize-cost-reads-writes): RU optimization best practices -- [Azure Well-Architected Framework - Cosmos DB](https://learn.microsoft.com/en-us/azure/well-architected/service-guides/cosmos-db): Architecture recommendations - -## Key Findings - -### 1. Current Indexing Policy Uses Default "Index Everything" Strategy - -The current policy at [backend/scripts/indexing-policy.json](../../../backend/scripts/indexing-policy.json) indexes all paths: - -```json -{ - "indexingMode": "consistent", - "automatic": true, - "includedPaths": [{ "path": "/*" }], - "excludedPaths": [{ "path": "/\"_etag\"/?" }] -} -``` - -**Impact**: Every field in every document is indexed, including large text fields that are never queried (e.g., `answer`, `contextUsedForGeneration`, `content` in refs). - -### 2. Eight Composite Indexes Defined - -The policy includes composite indexes for sorting operations: - -| Composite Index | Purpose | Used? | -|----------------|---------|-------| -| `[reviewedAt DESC, id ASC]` | Paginated list sorting | ✅ Yes | -| `[updatedAt DESC, id ASC]` | Paginated list sorting | ✅ Yes | -| `[reviewedAt ASC, id ASC]` | Ascending sort variant | ✅ Yes | -| `[status ASC, reviewedAt DESC, id ASC]` | Filtered + sorted queries | ✅ Yes | -| `[totalReferences ASC, id ASC]` | Reference count sorting | ✅ Yes | -| `[totalReferences DESC, id ASC]` | Reference count sorting | ✅ Yes | -| `[status ASC, totalReferences ASC, id ASC]` | Filtered + sorted by refs | ✅ Yes | -| `[status ASC, totalReferences DESC, id ASC]` | Filtered + sorted by refs | ✅ Yes | - -All composite indexes appear to be actively used by the `_build_secure_sort_clause` method. - -### 3. Fields Actually Used in Queries - -Analysis of [cosmos_repo.py](../../../backend/app/adapters/repos/cosmos_repo.py) reveals these field access patterns: - -#### Filter Fields (WHERE clauses) - -| Field | Query Pattern | Frequency | -|-------|--------------|-----------| -| `docType` | Equality filter | Every query | -| `status` | Equality filter | High | -| `datasetName` | Equality/STARTSWITH | High | -| `id` | Equality/STARTSWITH | High | -| `assignedTo` | Equality/IS_NULL | Medium | -| `manualTags` | ARRAY_CONTAINS | Medium | -| `computedTags` | ARRAY_CONTAINS | Medium | -| `refs[].url` | EXISTS + CONTAINS (subquery) | Low | -| `history[].refs[].url` | EXISTS + CONTAINS (nested) | Low | - -#### Sort Fields (ORDER BY clauses) - -| Field | Direction | -|-------|-----------| -| `reviewedAt` | ASC, DESC | -| `updatedAt` | DESC | -| `totalReferences` | ASC, DESC | -| `id` | ASC (secondary sort) | -| `datasetName` | ASC (list_datasets) | - -#### Read-Only Fields (Never Filtered/Sorted) - -These fields are fetched but never appear in WHERE or ORDER BY: - -- `answer` (large text) -- `synthQuestion`, `editedQuestion` (text) -- `contextUsedForGeneration` (large text) -- `contextSource`, `modelUsedForGeneration` (text) -- `comment` (text) -- `refs[].content` (large text, often base64-encoded) -- `refs[].keyExcerpt`, `refs[].title` (text) -- `history[].msg` (text) -- `semanticClusterNumber`, `weight`, `samplingBucket`, `questionLength` (numeric) -- `schemaVersion`, `bucket` (metadata) -- `assignedAt`, `updatedBy` (audit fields) - -### 4. Partition Key Strategy - -The container uses MultiHash hierarchical partition key: `[/datasetName, /bucket]` - -**Important**: Per Microsoft documentation, partition key paths are NOT automatically indexed even with `/*`. They must be explicitly included for efficient filtering queries. - -### 5. Full-Text Indexes Not Configured - -The `fullTextIndexes` array is empty. The [keyword-search-research.md](./keyword-search-research.md) recommends adding full-text indexes for `synthQuestion`, `editedQuestion`, `answer`. - -## Current State - -### Indexing Policy Summary - -- **Mode**: Consistent (synchronous indexing) -- **Strategy**: Index all paths (`/*`) -- **Exclusions**: Only `_etag` -- **Composite indexes**: 8 defined, all actively used -- **Full-text indexes**: None -- **Vector indexes**: None - -### Estimated Storage Overhead - -With `/*` indexing and large text fields: -- `answer`: Up to several KB per item -- `contextUsedForGeneration`: Can be large -- `refs[].content`: Often thousands of characters -- `history[].msg`: Variable, can be large - -Index size could be **50-100%+ of data size** due to indexing these large text fields. - -## Query Analysis - -### Query Efficiency Assessment - -| Query Type | Indexed Fields Used | Efficiency | -|-----------|---------------------|------------| -| Paginated list | docType, status, reviewedAt | ✅ Optimal with composite | -| Dataset filter | datasetName | ✅ Efficient | -| ID search | id (STARTSWITH) | ✅ Efficient | -| Tag filter | manualTags, computedTags | ⚠️ ARRAY_CONTAINS has limitations | -| Ref URL search | refs[].url | ⚠️ EXISTS subquery, in-memory for emulator | -| Assignment queries | status, assignedTo | ✅ Efficient | -| Stats (count) | status | ✅ Efficient | - -### Fields Indexed But Never Queried - -These paths are indexed but provide no query benefit: - -1. `/answer/?` - Large text, never filtered -2. `/synthQuestion/?` - Never filtered (could benefit from full-text) -3. `/editedQuestion/?` - Never filtered (could benefit from full-text) -4. `/contextUsedForGeneration/?` - Never filtered -5. `/contextSource/?` - Never filtered -6. `/modelUsedForGeneration/?` - Never filtered -7. `/comment/?` - Never filtered -8. `/refs/[]/content/?` - Never filtered -9. `/refs/[]/keyExcerpt/?` - Never filtered -10. `/refs/[]/title/?` - Never filtered -11. `/history/[]/msg/?` - Never filtered -12. `/history/[]/role/?` - Never filtered -13. `/semanticClusterNumber/?` - Never filtered -14. `/weight/?` - Never filtered -15. `/samplingBucket/?` - Never filtered -16. `/questionLength/?` - Never filtered -17. `/schemaVersion/?` - Never filtered -18. `/assignedAt/?` - Never filtered -19. `/updatedBy/?` - Never filtered -20. `/curationInstructions/?` - Never filtered - -## Recommendations for Spec - -### 1. Switch to Explicit Inclusion Strategy - -Instead of `/*`, explicitly include only queried paths: - -```json -{ - "indexingMode": "consistent", - "automatic": true, - "includedPaths": [ - { "path": "/docType/?" }, - { "path": "/status/?" }, - { "path": "/datasetName/?" }, - { "path": "/id/?" }, - { "path": "/assignedTo/?" }, - { "path": "/reviewedAt/?" }, - { "path": "/updatedAt/?" }, - { "path": "/totalReferences/?" }, - { "path": "/manualTags/[]" }, - { "path": "/computedTags/[]" }, - { "path": "/refs/[]/url/?" } - ], - "excludedPaths": [ - { "path": "/*" } - ] -} -``` - -**Estimated RU savings**: 20-40% reduction in write RU costs based on Microsoft documentation stating that write costs correlate directly with indexed property count. - -### 2. Keep Existing Composite Indexes - -All 8 composite indexes are actively used. No changes needed. - -### 3. Add Missing Index for tagCount (Future) - -Per [explorer-sorting-research.md](./explorer-sorting-research.md), add composite index for `tagCount` sorting when that feature is implemented. - -### 4. Consider Full-Text Indexes (Future) - -Per [keyword-search-research.md](./keyword-search-research.md), add full-text indexes when implementing search: - -```json -{ - "fullTextIndexes": [ - { "path": "/synthQuestion" }, - { "path": "/editedQuestion" }, - { "path": "/answer" } - ] -} -``` - -### 5. Monitor and Measure - -- Use Azure Monitor to track RU consumption before/after policy changes -- Monitor index transformation progress during policy updates -- Test query performance with the new policy before production deployment - -### 6. Implementation Approach - -1. **Test in emulator first**: Apply new policy to dev/test environments -2. **Run query performance tests**: Verify all queries still perform acceptably -3. **Apply incrementally**: Index transformation happens online but consumes RUs -4. **Monitor transformation**: Track progress via SDK or portal - -## Potential RU Savings - -Based on Microsoft documentation: - -- **Write operations**: "Inserting a 1-KB item without indexing costs around ~5.5 RUs. Replacing an item costs two times the charge." -- **Indexing overhead**: Each indexed property adds to write RU cost -- **Large text fields**: Indexing multi-KB text fields significantly increases write costs - -**Conservative estimate**: Excluding 15-20 never-queried paths (especially large text fields) could reduce write RUs by **20-40%**. - -## References - -- [Indexing policies in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy) -- [Optimize request cost in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/optimize-cost-reads-writes) -- [Composite indexes in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#composite-indexes) -- [SA-242 Story](https://jira.example.com/browse/SA-242) diff --git a/.copilot-tracking/subagent/20260122/curation-editor-research.md b/.copilot-tracking/subagent/20260122/curation-editor-research.md deleted file mode 100644 index 957f63e..0000000 --- a/.copilot-tracking/subagent/20260122/curation-editor-research.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -topic: curation-editor -jtbd: JTBD-001 -date: 2026-01-22 -status: complete ---- - -# Research: Curation Editor - -## Context - -The curation editor provides the main workflow to edit ground-truth content (single-turn or multi-turn), apply tags, and transition items through draft/approved/skipped/deleted states. - -## Sources Consulted - -### URLs -- (None) - -### Codebase -- [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts): Maps single-turn items into a multi-turn history format and maps references across top-level and per-turn refs. -- [frontend/src/services/tags.ts](frontend/src/services/tags.ts): Defines tag schema fetch and exclusive-group validation in the UI. - -### Documentation -- [.copilot-tracking/research/20260121-high-level-requirements-research.md](.copilot-tracking/research/20260121-high-level-requirements-research.md): Consolidates editor and multi-turn behavior requirements. -- [frontend/CODEBASE.md](frontend/CODEBASE.md): Documents the curation workspace layout and approval gating constraints. -- [backend/CODEBASE.md](backend/CODEBASE.md): Documents API behaviors, including camelCase output and ETag concurrency. -- [backend/docs/multi-turn-refs.md](backend/docs/multi-turn-refs.md): Documents backward-compatible storage and editing semantics for multi-turn refs. -- [backend/docs/tagging_plan.md](backend/docs/tagging_plan.md): Documents tag normalization expectations. - -## Key Findings - -1. The UI treats all items as multi-turn in its internal model, converting legacy single-turn records into an initial two-message history. -2. The editor supports both top-level references and per-history-turn references, and maps them into a unified reference list for user workflows. -3. Approval is gated by reference completeness rules (at least one selected reference, all references visited, key paragraph constraints). -4. Tagging includes manual and computed tags, and the UI enforces “exclusive group” constraints based on backend-provided schema. -5. Documentation includes some conflicts (for example, tag write paths); when code does not reflect a doc claim, it is treated as doc-only. - -## Existing Patterns - -| Pattern | Location | Relevance | -|---------|----------|-----------| -| Single-turn to multi-turn normalization | [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts) | Defines current UI behavior and backward compatibility | -| Exclusive tag group validation | [frontend/src/services/tags.ts](frontend/src/services/tags.ts) | Defines validation expectations for tag selection | - -## Open Questions - -- (None) - -## Recommendations for Spec - -- Specify the multi-turn normalization rule as a frontend behavior and compatibility expectation. -- Specify tag behaviors in terms of observable constraints (exclusive groups, manual vs computed sets). -- Specify approval gating rules as UX invariants. diff --git a/.copilot-tracking/subagent/20260122/data-persistence-research.md b/.copilot-tracking/subagent/20260122/data-persistence-research.md deleted file mode 100644 index 1e56daf..0000000 --- a/.copilot-tracking/subagent/20260122/data-persistence-research.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -topic: data-persistence -jtbd: JTBD-001 -date: 2026-01-22 -status: complete ---- - -# Research: Data Persistence - -## Context - -The persistence layer abstracts storage behind a repository protocol with Azure Cosmos DB as the primary backend. - -## Sources Consulted - -### URLs -- (None) - -### Codebase -- [backend/app/adapters/repos/base.py](backend/app/adapters/repos/base.py): Defines the `GroundTruthRepo` protocol that abstracts storage operations. -- [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py): Implements the Cosmos DB repository. -- [backend/app/main.py](backend/app/main.py): Shows lifespan initialization for Cosmos repo; does not block startup on failure. - -### Documentation -- [.copilot-tracking/research/20260121-high-level-requirements-research.md](.copilot-tracking/research/20260121-high-level-requirements-research.md): Consolidates persistence and Cosmos emulator requirements. -- [backend/CODEBASE.md](backend/CODEBASE.md): Documents layered architecture and configuration for Cosmos. -- [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md): Documents emulator query limitations and test gating. -- [backend/docs/cosmos-emulator-unicode-workaround.md](backend/docs/cosmos-emulator-unicode-workaround.md): Documents optional Unicode escape workaround. - -## Key Findings - -1. The backend defines a `GroundTruthRepo` protocol to abstract storage, enabling in-memory and Cosmos backends. -2. The Cosmos implementation is the production backend and is initialized during app lifespan. -3. Startup does not block if Cosmos initialization fails; this supports emulator-not-ready scenarios. -4. The Cosmos emulator has query limitations (for example, lack of `ARRAY_CONTAINS`), and incompatible tests are gated/skipped. -5. An optional Unicode escape workaround exists for emulator-only invalid escape failures. - -## Existing Patterns - -| Pattern | Location | Relevance | -|---------|----------|-----------| -| Repository protocol abstraction | [backend/app/adapters/repos/base.py](backend/app/adapters/repos/base.py) | Defines interface for pluggable storage | -| Non-blocking lifespan init | [backend/app/main.py](backend/app/main.py) | Supports graceful degradation when emulator is unavailable | - -## Open Questions - -- (None) - -## Recommendations for Spec - -- Specify that storage is abstracted via a repository protocol with Cosmos as the primary backend. -- Specify non-blocking startup behavior when Cosmos is unavailable. -- Specify that emulator-incompatible behaviors are gated or skipped in tests. diff --git a/.copilot-tracking/subagent/20260122/dependency-injection-research.md b/.copilot-tracking/subagent/20260122/dependency-injection-research.md deleted file mode 100644 index 278d74d..0000000 --- a/.copilot-tracking/subagent/20260122/dependency-injection-research.md +++ /dev/null @@ -1,260 +0,0 @@ -# Dependency Injection Research: SA-238 - -**Research Date:** 2026-01-22 -**Topic:** Refactoring to use FastAPI dependency injection for config and cosmos - ---- - -## 1. Current Architecture Analysis - -### 1.1 Container.py Overview - -The [container.py](backend/app/container.py) file implements a **Service Locator** pattern (not true DI): - -```python -class Container: - repo: GroundTruthRepo - assignment_service: AssignmentService - search_service: SearchService - snapshot_service: SnapshotService - curation_service: CurationService - tag_registry_service: TagRegistryService - # ... more services - -container = Container() # Global singleton -``` - -**Key characteristics:** - -- Single global `container` instance created at module import time -- Services initialized lazily via explicit `init_*()` methods -- Cosmos repo created via `init_cosmos_repo(db_name)` or `startup_cosmos(db_name)` -- Services store direct references to other services and repos - -### 1.2 Service Instantiation Flow - -1. **App startup** ([main.py](backend/app/main.py#L60-L78)): - - `lifespan()` async context manager calls `container.startup_cosmos()` - - This creates repo instances and wires services - -2. **Container initialization methods**: - - `init_cosmos_repo()` - Creates Cosmos repo and dependent services - - `init_search()` - Configures Azure AI Search adapter - - `init_chat()` - Configures agent inference service - -### 1.3 Endpoint Access Pattern - -Endpoints access services via **direct module import** of the global container: - -```python -# In every API router file -from app.container import container - -@router.post("") -async def import_bulk(...): - result = await container.repo.import_bulk_gt(gt_items, buckets=buckets) -``` - -This pattern repeats across all 16+ files that import `container`. - ---- - -## 2. Existing FastAPI `Depends()` Usage - -The codebase **already uses** `Depends()` extensively for authentication: - -| File | Usage Pattern | -|------|---------------| -| [ground_truths.py](backend/app/api/v1/ground_truths.py) | `user: UserContext = Depends(get_current_user)` | -| [assignments.py](backend/app/api/v1/assignments.py) | `user: UserContext = Depends(get_current_user)` | -| [search.py](backend/app/api/v1/search.py) | `user: UserContext = Depends(get_current_user)` | -| [chat.py](backend/app/api/v1/chat.py) | `principal: Principal = Depends(require_user)` | -| [main.py](backend/app/main.py#L181) | `dependencies=[Depends(require_user)]` on routes | - -**24+ usages** of `Depends()` found, all for authentication. - -**No services** are currently injected via `Depends()`. - ---- - -## 3. Configuration Access Pattern - -### 3.1 Settings Module ([config.py](backend/app/core/config.py)) - -Configuration uses **Pydantic Settings** with a global singleton: - -```python -class Settings(BaseSettings): - model_config = SettingsConfigDict(env_prefix="GTC_", ...) - - COSMOS_ENDPOINT: str | None = None - COSMOS_KEY: SecretStr | None = None - # ... 60+ settings - -settings = Settings() # Global singleton -``` - -### 3.2 Settings Access - -Settings are accessed via direct import throughout: - -```python -from app.core.config import settings - -# Container uses it -if settings.COSMOS_ENDPOINT: - ... - -# Services use it -if settings.CHAT_ENABLED: - ... -``` - ---- - -## 4. Pain Points Identified - -### 4.1 Testing Complexity - -**Integration tests** require extensive fixtures to manage container state: - -From [tests/integration/conftest.py](backend/tests/integration/conftest.py#L87-L124): - -```python -@pytest.fixture(scope="function") -async def configure_repo_for_test_db(require_cosmos_backend, test_db_name, init_emulator_containers): - # Close any previous Cosmos async client - try: - prev_repo = getattr(container, "repo", None) - client = getattr(prev_repo, "_client", None) - if client is not None: - # Manual cleanup... - except Exception: - pass - container.init_cosmos_repo(db_name=test_db_name) -``` - -**Unit tests** create fake repos and directly mutate container: - -From [tests/unit/conftest.py](backend/tests/unit/conftest.py#L59-L130): - -```python -container.repo = _NoopMemoryRepo() -container.assignment_service = AssignmentService(container.repo) -container.snapshot_service = SnapshotService(container.repo, ...) -# ... manual wiring of all services -``` - -### 4.2 Service Coupling - -The [validation_service.py](backend/app/services/validation_service.py) directly imports container: - -```python -from app.container import container - -async def validate_ground_truth_item(item, valid_tags_cache=None): - if valid_tags_cache is None: - valid_tags_cache = set(await container.tag_registry_service.list_tags()) -``` - -This creates a **hidden dependency** that's hard to mock without modifying the global container. - -### 4.3 Async Initialization Complexity - -Container uses `cast(ServiceType, None)` as placeholder until async init: - -```python -self.repo = cast(GroundTruthRepo, None) -self.assignment_service = cast(AssignmentService, None) -``` - -This leads to potential `None` access if initialization order is wrong. - ---- - -## 5. What FastAPI DI Would Provide - -### 5.1 Benefits - -| Current Approach | FastAPI DI Alternative | -|------------------|------------------------| -| Global mutable singleton | Request-scoped or cached dependencies | -| Manual container wiring in tests | `app.dependency_overrides[dep] = mock` | -| Import-time coupling | Runtime injection | -| Settings passed around manually | `Annotated[Settings, Depends(get_settings)]` | - -### 5.2 Example Transformation - -**Current:** -```python -from app.container import container - -@router.post("") -async def import_bulk(items: list[GroundTruthItem]): - result = await container.repo.import_bulk_gt(items) -``` - -**With FastAPI DI:** -```python -def get_repo() -> GroundTruthRepo: - return container.repo # Or create fresh - -@router.post("") -async def import_bulk( - items: list[GroundTruthItem], - repo: GroundTruthRepo = Depends(get_repo) -): - result = await repo.import_bulk_gt(items) -``` - -**Test override:** -```python -async def test_import(): - app.dependency_overrides[get_repo] = lambda: MockRepo() - # Test now uses MockRepo without touching global container -``` - ---- - -## 6. Assessment - -### 6.1 Current Approach Works - -The current Service Locator pattern is: - -- **Consistent** - Used uniformly across all endpoints -- **Simple** - One import gives access to all services -- **Tested** - Extensive test coverage exists -- **Functional** - No reported bugs related to DI - -### 6.2 Migration Complexity - -A full FastAPI DI migration would require: - -1. Creating `Depends()` functions for each service (~8 services) -2. Updating all endpoint signatures (~50+ endpoints) -3. Rewriting test fixtures to use `dependency_overrides` -4. Managing async initialization differently (lifespan vs per-request) - -### 6.3 Recommendation - -**Status: Consider deferring or partial adoption** - -The current approach is working. Potential improvements without full migration: - -1. **Partial adoption**: Use `Depends()` for new endpoints -2. **Settings injection**: Create `get_settings()` dependency for easier testing -3. **Service injection for validation_service**: Remove direct container import - ---- - -## 7. Summary - -| Question | Finding | -|----------|---------| -| What does container.py do? | Service Locator with lazy initialization, holds all service singletons | -| How are services accessed? | Direct import of global `container` instance | -| What config objects exist? | Single `Settings` Pydantic model, global `settings` instance | -| Pain points? | Test complexity, service coupling, async init management | -| FastAPI DI already used? | Yes, but only for auth (`get_current_user`, `require_user`) | -| Migration worth it? | Partial adoption may be sufficient; full migration is high effort | diff --git a/.copilot-tracking/subagent/20260122/docs-content-strategy-research.md b/.copilot-tracking/subagent/20260122/docs-content-strategy-research.md deleted file mode 100644 index 0e9a776..0000000 --- a/.copilot-tracking/subagent/20260122/docs-content-strategy-research.md +++ /dev/null @@ -1,250 +0,0 @@ -# Documentation Content Strategy Research - -## Overview - -This research assesses the current documentation landscape for Ground Truth Curator, identifying audience fit, staleness, and organization recommendations. - ---- - -## 1. Documentation Inventory - -### Root Level - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| [README.md](README.md) | Developers | **Stub** | Single-line placeholder only | -| [AGENTS.md](AGENTS.md) | AI Agents | Current | Jujutsu workflow instructions | -| [BUSINESS_VALUE.md](BUSINESS_VALUE.md) | Stakeholders/SMEs | Current | Value proposition and KPIs | - -### Backend (`backend/`) - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| [README.md](backend/README.md) | Developers | **Current** | Comprehensive local setup guide | -| [CODEBASE.md](backend/CODEBASE.md) | Developers | **Current** | Architecture map, contracts, extension points | - -#### Backend Docs (`backend/docs/`) - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| export-pipeline.md | Developers | **Current** | Export API and storage backends | -| OBSERVABILITY_IMPLEMENTATION.md | Developers/Ops | Current | Telemetry setup | -| api-write-consolidation-plan.md | Developers | **Stale/Plan** | AI-generated implementation plan | -| api-write-consolidation-plan.v2.md | Developers | **Stale/Plan** | Superseded plan version | -| fastapi-implementation-plan.md | Developers | **Stale/Plan** | Original MVP implementation plan | -| drift_cleanup.md | Developers | **Stale/Plan** | API drift analysis (completed work) | -| tagging_plan.md | Developers | Partially current | Tag behavior reference | -| cosmos-emulator-limitations.md | Developers | Current | Emulator workarounds | -| cosmos-emulator-unicode-workaround.md | Developers | Current | Unicode escape fix | -| todos.md | Developers | **Stale** | Old MVP checklist | -| multi-turn-refs.md | Developers | Current | Multi-turn data model | -| history-tags-feature.md | Developers | Current | History item tags | -| user-self-serve-plan.md | Developers | **Stale/Plan** | Implemented feature | -| assign-single-item-endpoint.md | Developers | **Stale/Plan** | Endpoint design doc | -| pytest-fastapi-cosmos-emulator-best-practices.md | Developers | Current | Testing guidance | - -### Frontend (`frontend/`) - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| [README.md](frontend/README.md) | Developers | **Current** | Local dev guide | -| [CODEBASE.md](frontend/CODEBASE.md) | Developers | **Current** | Architecture map and contracts | - -#### Frontend Docs (`frontend/docs/`) - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| CONNECT_TO_BACKEND.md | Developers | Current | API types generation guide | -| MVP_REQUIREMENTS.md | Developers/SMEs | **Partially stale** | Original MVP checklist (some items done) | -| REFACTORING_PLAN.md | Developers | **Stale/Plan** | Completed refactor | -| OBSERVABILITY_IMPLEMENTATION.md | Developers | Current | Frontend telemetry | -| connecting-e2e-best-practices.md | Developers | Current | E2E testing patterns | - -#### Frontend Plans (`frontend/plans/`) - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| multi-turn-curation-plan.md | Developers | **Stale/Plan** | Implementation plan (in progress) | -| e2e-backend-integration-plan.md | Developers | **Stale/Plan** | Completed integration | -| playwright-e2e-test-plan.md | Developers | **Stale/Plan** | Test setup plan | -| keyboard-shortcuts-plan.md | Developers | **Stale/Plan** | Implemented feature | -| agent-integration-plan.md | Developers | **Stale/Plan** | LLM integration plan | -| telemetry-observability-plan.md | Developers | **Stale/Plan** | Implemented feature | -| *-plan.md (remaining) | Developers | **Stale/Plan** | Various implementation plans | - -### Docs Folder (`docs/`) - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| ground-truth-curation-reqs.md | Developers/SMEs | **Canonical** | MVP requirements and data model | -| computed-tags-design.md | Developers | **Current** | Tag architecture and export pipeline | -| manual-tags-design.md | Developers | Current | Manual tag system | -| frontend-runtime-configuration.md | Developers | Current | Runtime config | -| json-export-migration-plan.md | Developers | **Stale/Plan** | Completed migration | - -### Specs Folder (`specs/`) - -| File | Audience | Status | Notes | -|------|----------|--------|-------| -| _index.md | All | **Current** | Spec index by JTBD | -| assignment-workflow.md | Developers/SMEs | Draft | Current-state spec | -| explorer-view.md | Developers/SMEs | Draft | Current-state spec | -| curation-editor.md | Developers/SMEs | Draft | Current-state spec | -| reference-management.md | Developers/SMEs | Draft | Current-state spec | -| export-snapshots.md | Developers/SMEs | Draft | Current-state spec | -| data-persistence.md | Developers | Draft | Cosmos backend spec | -| observability-operations.md | Developers/Ops | Draft | Health and telemetry spec | -| *-enhancement specs | Developers | Draft | Future feature specs | - ---- - -## 2. Staleness Assessment - -### Categories - -**Current (Authoritative)** -- Backend README.md and CODEBASE.md -- Frontend README.md and CODEBASE.md -- Export pipeline docs -- Emulator workarounds -- Testing best practices -- Specs index and current-state specs - -**Stale/Plan Documents (AI-generated or completed work)** -- `backend/docs/fastapi-implementation-plan.md` - original MVP plan, now implemented -- `backend/docs/api-write-consolidation-plan*.md` - API redesign, mostly complete -- `backend/docs/drift_cleanup.md` - analysis of completed cleanup -- `backend/docs/user-self-serve-plan.md` - implemented -- `backend/docs/todos.md` - outdated checklist -- `frontend/docs/REFACTORING_PLAN.md` - completed refactor -- `frontend/plans/*.md` - most are completed implementation plans -- `docs/json-export-migration-plan.md` - completed migration - -**Partially Stale** -- `frontend/docs/MVP_REQUIREMENTS.md` - contains done items mixed with remaining work -- `docs/ground-truth-curation-reqs.md` - canonical but has outdated "todo" items - -### Drift Patterns - -1. **AI-generated plans remain after implementation** - Plans in `frontend/plans/` and `backend/docs/` were created to guide implementation but weren't archived after completion. - -2. **Checklists not updated** - MVP_REQUIREMENTS.md and todos.md have checkboxes that don't reflect current state. - -3. **Multiple versions** - api-write-consolidation-plan.md has v1 and v2 without clear indication which is canonical. - ---- - -## 3. Audience Analysis - -### Developer Audience - -**Well served by:** -- Backend/frontend README.md - local setup -- Backend/frontend CODEBASE.md - architecture understanding -- Export pipeline and emulator docs - specific technical guidance -- Specs folder - system behavior documentation - -**Gaps:** -- No consolidated "Getting Started" guide across the full stack -- No API reference (relies on OpenAPI spec) -- No contribution guide -- Architecture diagrams scattered or missing - -### SME/Curator Audience - -**Well served by:** -- BUSINESS_VALUE.md - value proposition -- ground-truth-curation-reqs.md - requirements context -- Current-state specs - system behavior documentation - -**Gaps:** -- **No user guide** - SMEs have no documentation for using the curation UI -- **No workflow guide** - No step-by-step curation workflow documentation -- **No onboarding material** - New SMEs must learn by exploration - -### Ops/Admin Audience - -**Partially served by:** -- Observability implementation docs -- Backend README deployment section - -**Gaps:** -- No runbook for production operations -- No incident response documentation -- Limited deployment documentation - ---- - -## 4. Content Organization Recommendations - -### Recommended Structure - -``` -docs/ -├── README.md # Documentation hub (NEW) -├── getting-started/ -│ ├── quickstart.md # Full-stack setup (NEW) -│ ├── developer-setup.md # Detailed dev environment -│ └── sme-onboarding.md # SME getting started (NEW) -├── user-guides/ -│ ├── curation-workflow.md # SME curation guide (NEW) -│ ├── tagging-guide.md # How to use tags (NEW) -│ └── export-guide.md # Export procedures (NEW) -├── architecture/ -│ ├── overview.md # System architecture (NEW) -│ ├── data-model.md # Consolidated from reqs -│ ├── api-reference.md # Link to OpenAPI -│ └── backend-internals.md # From CODEBASE.md -├── operations/ -│ ├── deployment.md # Deploy to Azure (NEW) -│ ├── monitoring.md # Observability guide -│ └── troubleshooting.md # Common issues (NEW) -├── contributing/ -│ ├── CONTRIBUTING.md # Contribution guide (NEW) -│ └── code-conventions.md # From specs -└── archive/ - └── plans/ # Move completed plans here -``` - -### Migration Actions - -1. **Create docs hub** - New README.md in docs/ with navigation - -2. **Create SME documentation** - Priority: curation-workflow.md and sme-onboarding.md - -3. **Archive stale plans** - Move completed implementation plans to `docs/archive/plans/` - -4. **Consolidate duplicates** - Merge api-write-consolidation-plan versions - -5. **Update checklists** - Either update or archive MVP_REQUIREMENTS.md and todos.md - -6. **Promote specs** - Current-state specs are good; link from docs hub - ---- - -## 5. Summary - -### Current State - -| Category | Count | Status | -|----------|-------|--------| -| Current/authoritative docs | 15 | Good coverage for developers | -| Stale plan documents | 12+ | Need archival | -| SME-focused docs | 0 | **Critical gap** | -| Ops documentation | 2 | Partial coverage | - -### Priorities - -1. **High: Create SME user guide** - No documentation for the primary user persona -2. **High: Archive stale plans** - Reduce confusion about authoritative sources -3. **Medium: Create docs hub** - Improve discoverability -4. **Medium: Getting started guide** - Reduce onboarding friction -5. **Low: Ops runbook** - Needed for production but can follow launch - -### Key Findings - -- **Developer docs are strong** - README and CODEBASE files provide good guidance -- **SME docs are absent** - Critical gap for the primary user audience -- **Plan documents create noise** - 12+ stale plans remain in active locations -- **Specs are well-organized** - JTBD-based spec structure is effective -- **No contribution guide** - Missing standard OSS documentation diff --git a/.copilot-tracking/subagent/20260122/docs-infrastructure-research.md b/.copilot-tracking/subagent/20260122/docs-infrastructure-research.md deleted file mode 100644 index 006543d..0000000 --- a/.copilot-tracking/subagent/20260122/docs-infrastructure-research.md +++ /dev/null @@ -1,167 +0,0 @@ ---- -title: Documentation Infrastructure Research -description: Research findings on current documentation state and MkDocs setup requirements -author: copilot -ms.date: 2026-01-22 -status: complete ---- - -## Summary - -The repository has **no existing MkDocs configuration**. Documentation is scattered across multiple locations with no unified build system. Setting up MkDocs requires creating the configuration from scratch. - -## Research Findings - -### 1. Existing Documentation Files - -**Root-level documentation:** - -| File | Purpose | -|------|---------| -| [README.md](../../../README.md) | Minimal project title only | -| [AGENTS.md](../../../AGENTS.md) | Jujutsu version control workflow instructions | -| [BUSINESS_VALUE.md](../../../BUSINESS_VALUE.md) | Business value documentation | - -**`docs/` folder (5 files + 1 subfolder):** - -| File | Description | -|------|-------------| -| computed-tags-design.md | Tag computation design | -| manual-tags-design.md | Manual tagging design | -| frontend-runtime-configuration.md | Frontend config guide | -| ground-truth-curation-reqs.md | Requirements document | -| json-export-migration-plan.md | Export migration plan | -| images/ | Image assets | -| specs/ | Empty subfolder | - -**`specs/` folder (26 specification files):** - -Organized specifications with an `_index.md` index file covering: - -- JTBD-001: Current-state system specs (7 topics) -- JTBD-002: Curation enhancements (7 topics) -- JTBD-003: Search and filtering (3 topics) -- JTBD-004: Data integrity and security (4 topics) -- JTBD-005: Code quality (4 topics) - -**`backend/docs/` folder (17 files):** - -Technical documentation including: - -- API change checklists and consolidation plans -- Cosmos emulator documentation and workarounds -- Feature plans (tagging, history, multi-turn refs) -- Best practices guides - -**`frontend/docs/` folder (5 files):** - -- CONNECT_TO_BACKEND.md -- MVP_REQUIREMENTS.md -- OBSERVABILITY_IMPLEMENTATION.md -- REFACTORING_PLAN.md -- connecting-e2e-best-practices.md - -**Component READMEs:** - -- [backend/README.md](../../../backend/README.md) - Comprehensive setup guide (~300 lines) -- [frontend/README.md](../../../frontend/README.md) - Development guide (~100 lines) -- backend/scripts/README.md -- scripts/README.md - -### 2. MkDocs Configuration Status - -**No `mkdocs.yml` exists.** File search returned no results. - -### 3. Existing Build Tooling - -**Root level:** No package.json exists at repository root. - -**`backend/pyproject.toml`:** - -- Uses `uv` for package management -- No documentation-related scripts or dependencies -- Dependencies: FastAPI, pytest, ruff, black (no mkdocs/sphinx) - -**`frontend/package.json`:** - -- Standard Vite/React scripts (dev, build, lint, test) -- No documentation scripts -- No documentation dependencies - -### 4. Documentation Structure Assessment - -| Location | File Count | Content Type | -|----------|------------|--------------| -| Root | 3 | Project overview | -| docs/ | 5 | Design docs, requirements | -| specs/ | 26 | Feature specifications | -| backend/docs/ | 17 | Technical guides | -| frontend/docs/ | 5 | Frontend guides | -| .copilot-tracking/ | 50+ | Research artifacts | - -**Total unique documentation files:** ~106 markdown files - -## What Needs to Be Set Up - -### Required for MkDocs - -1. **Create `mkdocs.yml`** at repository root with: - - Site metadata (name, description, repo URL) - - Theme configuration (recommend Material for MkDocs) - - Navigation structure organizing scattered docs - - Plugin configuration (search, etc.) - -2. **Add MkDocs dependencies** to `backend/pyproject.toml`: - - ```toml - [project.optional-dependencies] - docs = [ - "mkdocs>=1.6", - "mkdocs-material>=9.5", - ] - ``` - -3. **Create navigation structure** to unify: - - Root README as landing page - - `docs/` as design documentation - - `specs/` as specifications section - - `backend/docs/` as backend technical docs - - `frontend/docs/` as frontend technical docs - - Component READMEs as quickstart guides - -4. **Add scripts** for build/serve: - - `uv run mkdocs serve` for local development - - `uv run mkdocs build` for static site generation - -### Recommended Navigation Structure - -```yaml -nav: - - Home: index.md - - Getting Started: - - Backend Setup: backend/README.md - - Frontend Setup: frontend/README.md - - Specifications: - - Overview: specs/_index.md - - Current State: specs/assignment-workflow.md - # ... other specs - - Design Docs: - - Tags Design: docs/manual-tags-design.md - # ... other design docs - - Backend Reference: - - API Plans: backend/docs/api-write-consolidation-plan.md - # ... other backend docs - - Frontend Reference: - - Connect to Backend: frontend/docs/CONNECT_TO_BACKEND.md - # ... other frontend docs -``` - -## Key Findings Summary - -| Question | Answer | -|----------|--------| -| MkDocs configuration exists? | **No** | -| Documentation build tooling? | **None** | -| Documentation locations | 5+ scattered locations | -| Total markdown files | ~106 | -| Setup complexity | Medium (organize existing content) | diff --git a/.copilot-tracking/subagent/20260122/dos-prevention-research.md b/.copilot-tracking/subagent/20260122/dos-prevention-research.md deleted file mode 100644 index 38e33b0..0000000 --- a/.copilot-tracking/subagent/20260122/dos-prevention-research.md +++ /dev/null @@ -1,160 +0,0 @@ -# DoS Prevention Research: Bulk Import Endpoint - -**Date:** 2026-01-22 -**Story:** SA-409 -**Topic:** DoS vulnerability in bulk import endpoint - -## Executive Summary - -The bulk import endpoint (`POST /v1/ground_truths`) accepts an unbounded list of `GroundTruthItem` objects with **no size validation**. This creates a critical DoS vulnerability where attackers can exhaust server memory/CPU by submitting arbitrarily large payloads. No rate limiting middleware exists in the codebase. - -## Research Findings - -### 1. Current Bulk Import Endpoint - -**Location:** [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L55-L119) - -```python -@router.post("", response_model=ImportBulkResponse) -async def import_bulk( - items: list[GroundTruthItem], # ← NO SIZE LIMIT - user: UserContext = Depends(get_current_user), - buckets: int | None = Query(default=None, ge=1, le=50), - approve: bool = Query( - default=False, - description="If true, mark all imported items as approved and set review metadata.", - ), -) -> ImportBulkResponse: -``` - -**Confirmed gaps:** - -- No `max_length` constraint on the `items` list parameter -- No validation of list size before processing -- No request body size limit configured -- Iterates over entire list twice (ID assignment + validation) before any persistence - -### 2. Rate Limiting Libraries for FastAPI - -| Library | Description | Pros | Cons | -|---------|-------------|------|------| -| **slowapi** | FastAPI-friendly, based on limits | Drop-in, Redis support, decorator-based | Adds dependency | -| **fastapi-limiter** | Redis-based rate limiting | Async-native | Requires Redis | -| **starlette-throttle** | Starlette middleware | Simple | Less maintained | -| **Custom middleware** | Roll your own | Full control, no deps | More code to maintain | - -**Recommendation:** `slowapi` - mature, FastAPI-native, supports memory and Redis backends. - -### 3. Configuration Patterns in GTC - -**Settings location:** [backend/app/core/config.py](backend/app/core/config.py) - -The codebase uses `pydantic-settings` with: - -- Environment variable prefix: `GTC_` -- Type-safe settings via `Settings` class -- Field validation with `Field()` and `model_validator` - -**Existing pagination settings pattern to follow:** - -```python -# Pagination settings -PAGINATION_MAX_LIMIT: int = Field( - default=100, description="Maximum items per page for list queries" -) -PAGINATION_MIN_LIMIT: int = Field(default=1, description="Minimum items per page") -PAGINATION_TAG_FETCH_MAX: int = Field( - default=500, - description="Maximum items to fetch for tag filtering queries (memory safeguard)", -) -``` - -**Recommended new settings:** - -```python -# DoS prevention settings -BULK_IMPORT_MAX_ITEMS: int = Field( - default=1000, description="Maximum items per bulk import request" -) -RATE_LIMIT_REQUESTS: int = Field( - default=100, description="Rate limit: requests per window" -) -RATE_LIMIT_WINDOW_SECONDS: int = Field( - default=60, description="Rate limit window in seconds" -) -``` - -### 4. Existing Security Middleware - -**Location:** [backend/app/main.py](backend/app/main.py) - -Current middleware stack: - -1. **Easy Auth middleware** (`install_ezauth_middleware`) - Authentication via Azure Container Apps -2. **User logging middleware** (`user_logging_middleware`) - Request logging with user context - -**No existing:** - -- Rate limiting middleware -- Request body size validation -- DoS protection middleware - -**CORS note:** CORS is handled at platform level (Azure Container Apps), not in code. - -### 5. Request Body Size - -FastAPI/Starlette default has no body size limit. Uvicorn default is unlimited. This should be addressed at multiple levels: - -- Application level: Validate list length in endpoint -- Server level: Configure `--limit-max-body-size` in Uvicorn (bytes) -- Platform level: Azure Container Apps ingress limits - -## Gap Analysis - -| Control | Current State | Required | -|---------|--------------|----------| -| Batch size limit | ❌ None | ✅ Configurable max items | -| Rate limiting | ❌ None | ✅ Per-user/IP throttling | -| Request body size | ❌ Unlimited | ✅ Configurable max bytes | -| Validation before processing | ⚠️ Partial | ✅ Early rejection | - -## Recommended Implementation - -### Phase 1: Immediate (Batch Size Limit) - -1. Add `BULK_IMPORT_MAX_ITEMS` to `Settings` class -2. Add validation at start of `import_bulk`: - -```python -if len(items) > settings.BULK_IMPORT_MAX_ITEMS: - raise HTTPException( - status_code=400, - detail=f"Batch size {len(items)} exceeds maximum of {settings.BULK_IMPORT_MAX_ITEMS}" - ) -``` - -### Phase 2: Rate Limiting - -1. Add `slowapi` dependency to `pyproject.toml` -2. Configure rate limiter in `main.py` -3. Apply rate limit decorator to bulk endpoints - -### Phase 3: Server-Level Protection - -1. Configure Uvicorn `--limit-max-body-size` -2. Review Azure Container Apps ingress settings - -## Files to Modify - -| File | Change | -|------|--------| -| `backend/app/core/config.py` | Add DoS prevention settings | -| `backend/app/api/v1/ground_truths.py` | Add batch size validation | -| `backend/pyproject.toml` | Add slowapi dependency (Phase 2) | -| `backend/app/main.py` | Install rate limiting middleware (Phase 2) | - -## References - -- [slowapi documentation](https://github.com/laurents/slowapi) -- [FastAPI request body size](https://fastapi.tiangolo.com/advanced/request-body/) -- [OWASP DoS Prevention](https://owasp.org/www-community/attacks/Denial_of_Service) diff --git a/.copilot-tracking/subagent/20260122/draft-duplicate-detection-research.md b/.copilot-tracking/subagent/20260122/draft-duplicate-detection-research.md deleted file mode 100644 index 7226d03..0000000 --- a/.copilot-tracking/subagent/20260122/draft-duplicate-detection-research.md +++ /dev/null @@ -1,287 +0,0 @@ -# Draft Duplicate Detection Research - -**Date:** 2026-01-22 -**Topic:** Draft duplicate detection system for warning SMEs about potential duplicates - ---- - -## Research Questions and Findings - -### 1. Data Model for Ground Truth Items (Draft vs Approved Status) - -**Backend Model:** [backend/app/domain/models.py](backend/app/domain/models.py) - -The `GroundTruthItem` class defines the core data model: - -```python -class GroundTruthItem(BaseModel): - id: str - datasetName: str - bucket: Optional[UUID] = None - status: GroundTruthStatus = GroundTruthStatus.draft # Default is draft - docType: str = "ground-truth-item" - schemaVersion: str = "v2" - - # Question/Answer fields - synth_question: str = Field(alias="synthQuestion") # Original synthesized question - edited_question: Optional[str] = Field(default=None, alias="editedQuestion") # User-edited version - answer: Optional[str] = None - refs: list[Reference] = [] - - # Multi-turn support - history: Optional[list[HistoryItem]] = None - - # Tags - manual_tags: list[str] = [] - computed_tags: list[str] = [] -``` - -**Status Enum:** [backend/app/domain/enums.py](backend/app/domain/enums.py) - -```python -class GroundTruthStatus(str, Enum): - draft = "draft" - approved = "approved" - deleted = "deleted" - skipped = "skipped" -``` - -**Frontend Model:** [frontend/src/models/groundTruth.ts](frontend/src/models/groundTruth.ts) - -```typescript -export type GroundTruthItem = { - id: string; - question: string; // Maps to editedQuestion or synthQuestion - answer: string; - history?: ConversationTurn[]; - references: Reference[]; - status: "draft" | "approved" | "skipped" | "deleted"; - deleted?: boolean; // Soft delete flag - // ... -}; -``` - ---- - -### 2. Fields for Duplicate Comparison - -**Primary Comparison Candidates:** - -| Field | Backend Name | Frontend Name | Notes | -|-------|-------------|---------------|-------| -| Original Question | `synthQuestion` | N/A (mapped to `question`) | The AI-generated/imported question text | -| Edited Question | `editedQuestion` | `question` | User-curated question (takes precedence if set) | -| Answer | `answer` | `answer` | The curated answer text | -| Multi-turn History | `history` | `history` | Array of `{role, msg, refs}` for conversation turns | - -**Effective Question Logic:** -- Backend: `synthQuestion` is the original; `editedQuestion` is the user's edited version -- Frontend: Uses `editedQuestion || synthQuestion` as `question` -- For duplicate detection: Compare `editedQuestion || synthQuestion` between items - -**Fingerprint/Signature Logic:** [frontend/src/hooks/useGroundTruth.ts](frontend/src/hooks/useGroundTruth.ts#L113-L135) - -The `stateSignature` function shows what fields define item identity: -```typescript -function stateSignature(it: GroundTruthItem): string { - return JSON.stringify({ - id: it.id, - question: (it.question || "").trim(), - answer: (it.answer || "").trim(), - history: it.history || [], - references: refs, // sorted by id - manualTags: [...(it.manualTags || [])].sort(), - status: it.status, - deleted: !!it.deleted, - }); -} -``` - -**Recommended Comparison Fields for Duplicate Detection:** -1. **Question text** (normalized): `(editedQuestion || synthQuestion).trim().toLowerCase()` -2. **Answer text** (normalized): `answer.trim().toLowerCase()` -3. **Multi-turn content**: Concatenated `history[*].msg` for all turns - ---- - -### 3. Existing Duplicate Detection Logic - -**Finding: NO existing duplicate detection logic exists.** - -Grep search for `duplicate|similarity|compare` found: -- References to Jira tickets requesting the feature (SA-534, SA-535) -- Tag registry duplicate key prevention (unrelated) -- Reference deduplication within a single item (not cross-item) - -**Existing Validation Service:** [backend/app/services/validation_service.py](backend/app/services/validation_service.py) - -Current validation only checks: -- Manual tag values against the tag registry -- No duplicate item detection - -**Jira Context:** -- **SA-534:** "GTC: Duplicate Detection and Prevention for Drafts" (Spike, MVP label) -- **SA-535:** "GTC: One time pass duplicate removal from drafts/approved" - -Both tickets indicate the requirement: *"As an SME I want to avoid working on draft items that are duplicates of approved items."* - ---- - -### 4. Import/Creation Flow for Draft Items - -**Bulk Import Endpoint:** [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L54-L114) - -```python -@router.post("", response_model=ImportBulkResponse) -async def import_bulk( - items: list[GroundTruthItem], - buckets: int | None = Query(default=None), - approve: bool = Query(default=False), -) -> ImportBulkResponse: -``` - -**Current Import Flow:** -1. Items received via POST `/v1/ground-truths` -2. Generate IDs for items without one (randomname) -3. Validate items via `validate_bulk_items()` (tags only) -4. Optionally set approval metadata if `approve=true` -5. Apply computed tags -6. Persist via `container.repo.import_bulk_gt()` - -**Insertion Point for Duplicate Detection:** -- After step 2 (ID generation), before step 5 (persistence) -- Or as a pre-import validation step - -**Single Item Assignment:** [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py#L175-L220) - -When an SME assigns an item to themselves: -1. Fetch the item -2. Validate item can be assigned (not assigned to another user in draft) -3. Set `status = draft`, `assignedTo = user` -4. Create assignment document - -**Insertion Point:** Before or after step 3, check for duplicates against approved items. - ---- - -### 5. Warning/Notification Patterns in UI - -**Toast System:** [frontend/src/hooks/useToasts.ts](frontend/src/hooks/useToasts.ts) - -```typescript -export type Toast = { - id: string; - kind: "success" | "error" | "info"; - msg: string; - actionLabel?: string; - onAction?: () => void; -}; - -export function useToasts() { - // showToast(kind, msg, opts) - // opts: { duration, actionLabel, onAction } -} -``` - -**Toast Component:** [frontend/src/components/common/Toasts.tsx](frontend/src/components/common/Toasts.tsx) - -- Displays in bottom-right corner -- Color-coded by kind (success=emerald, error=rose, info=violet) -- Supports action buttons for interactive toasts - -**Usage Pattern for Warnings:** -```typescript -showToast("info", "This draft may duplicate an approved item", { - duration: 8000, - actionLabel: "View Similar", - onAction: () => openSimilarItemsModal() -}); -``` - -**Alert Icon Component:** [frontend/src/components/app/QueueSidebar.tsx](frontend/src/components/app/QueueSidebar.tsx#L181) - -Uses `CircleAlert` from lucide-react for inline warnings: -```tsx - unsaved -``` - ---- - -## Implementation Recommendations - -### Backend Duplicate Detection Service - -Create `backend/app/services/duplicate_detection_service.py`: - -```python -class DuplicateDetectionService: - async def find_similar_approved( - self, - item: GroundTruthItem, - threshold: float = 0.9 - ) -> list[GroundTruthItem]: - """Find approved items similar to the given draft item.""" - pass - - async def check_bulk_for_duplicates( - self, - items: list[GroundTruthItem] - ) -> dict[str, list[str]]: - """Check a batch of items for duplicates. Returns {item_id: [similar_ids]}.""" - pass -``` - -### Comparison Strategies - -1. **Exact Match:** Normalize and compare question text directly -2. **Fuzzy Match:** Use Levenshtein distance or similar -3. **Semantic Match:** Embed questions and use cosine similarity (future) - -### API Response Extension - -Extend `ImportBulkResponse` to include warnings: - -```python -class ImportBulkResponse(BaseModel): - imported: int - errors: list[str] - uuids: list[str] - warnings: list[DuplicateWarning] = [] # NEW - -class DuplicateWarning(BaseModel): - draft_id: str - similar_approved_ids: list[str] - similarity_score: float -``` - -### Frontend Integration - -1. **On Import:** Show summary of potential duplicates -2. **On Assignment:** Toast warning if assigned item resembles approved -3. **In Editor:** Badge or inline warning in sidebar for flagged items - ---- - -## Summary - -| Question | Finding | -|----------|---------| -| Data model for draft/approved? | `GroundTruthStatus` enum with `draft`, `approved`, `deleted`, `skipped` | -| Fields for comparison? | `synthQuestion`, `editedQuestion`, `answer`, `history[*].msg` | -| Existing duplicate detection? | **None** - feature is requested in Jira (SA-534, SA-535) | -| Import/creation flow? | Bulk import via POST `/v1/ground-truths`; single assign via assignment service | -| UI warning patterns? | Toast system with `success/error/info` kinds; `CircleAlert` icon for inline warnings | - ---- - -## Files Referenced - -- [backend/app/domain/models.py](backend/app/domain/models.py) - GroundTruthItem model -- [backend/app/domain/enums.py](backend/app/domain/enums.py) - GroundTruthStatus enum -- [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py) - Import endpoint -- [backend/app/services/validation_service.py](backend/app/services/validation_service.py) - Current validation -- [backend/app/services/assignment_service.py](backend/app/services/assignment_service.py) - Assignment flow -- [frontend/src/models/groundTruth.ts](frontend/src/models/groundTruth.ts) - Frontend model -- [frontend/src/hooks/useGroundTruth.ts](frontend/src/hooks/useGroundTruth.ts) - State signature logic -- [frontend/src/hooks/useToasts.ts](frontend/src/hooks/useToasts.ts) - Toast system -- [frontend/src/components/common/Toasts.tsx](frontend/src/components/common/Toasts.tsx) - Toast component diff --git a/.copilot-tracking/subagent/20260122/explorer-sorting-research.md b/.copilot-tracking/subagent/20260122/explorer-sorting-research.md deleted file mode 100644 index 06227bb..0000000 --- a/.copilot-tracking/subagent/20260122/explorer-sorting-research.md +++ /dev/null @@ -1,257 +0,0 @@ -# Explorer Sorting System Research - -## Context - -Research into how the Explorer component implements column sorting, sort state management, visual indicators, and backend integration. - -## Sources Consulted - -### Codebase - -- [frontend/src/components/app/QuestionsExplorer.tsx](frontend/src/components/app/QuestionsExplorer.tsx): Main Explorer component with sorting logic -- [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts): API service with sort parameter handling -- [backend/app/domain/enums.py](backend/app/domain/enums.py): `SortField` and `SortOrder` enum definitions -- [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py): API endpoint accepting sort parameters -- [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py): Cosmos DB ORDER BY implementation - ---- - -## 1. Current Column Sorting Implementation - -### Frontend Sort State - -The Explorer manages sort state with two pieces of React state: - -```typescript -type SortColumn = "refs" | "reviewedAt" | "hasAnswer" | null; -type SortDirection = "asc" | "desc"; - -const [sortColumn, setSortColumn] = useState(null); -const [sortDirection, setSortDirection] = useState("desc"); -``` - -### Sort Handler Logic - -The `handleSort` function implements a three-state toggle: - -1. **First click**: Set column, direction = `desc` -2. **Second click (same column)**: Toggle direction to `asc` -3. **Third click (same column)**: Clear sort (column = `null`, direction = `desc`) - -```typescript -const handleSort = (column: "refs" | "reviewedAt" | "hasAnswer") => { - if (sortColumn === column) { - if (sortDirection === "desc") { - setSortDirection("asc"); - } else { - setSortColumn(null); - setSortDirection("desc"); - } - } else { - setSortColumn(column); - setSortDirection("desc"); - } -}; -``` - ---- - -## 2. Available Sorting Options - -### Frontend Sortable Columns - -| Column | UI Label | API Parameter | -|--------|----------|---------------| -| `refs` | Refs | `totalReferences` | -| `reviewedAt` | Reviewed | `reviewedAt` | -| `hasAnswer` | Answer? | `hasAnswer` | - -### Backend SortField Enum - -```python -class SortField(str, Enum): - reviewed_at = "reviewedAt" - updated_at = "updatedAt" - id = "id" - has_answer = "hasAnswer" - totalReferences = "totalReferences" -``` - -### Backend SortOrder Enum - -```python -class SortOrder(str, Enum): - asc = "asc" - desc = "desc" -``` - -### Default Sort - -- **Backend default**: `reviewedAt DESC` -- **Frontend default**: No sort applied (column = `null`) - ---- - -## 3. Visual Sort Indicator Implementation - -### Indicator Design - -Sort indicators use arrow symbols displayed inline with column headers: - -- **Descending**: `↓` -- **Ascending**: `↑` - -### Two-State Visual System - -The Explorer shows two distinct indicator states: - -1. **Applied filter (violet)**: Shows the sort currently active in the backend response -2. **Pending filter (amber, 50% opacity)**: Shows a selected but unapplied sort - -```tsx -{appliedFilter.sortColumn === "refs" && ( - - {appliedFilter.sortDirection === "desc" ? "↓" : "↑"} - -)} -{sortColumn === "refs" && sortColumn !== appliedFilter.sortColumn && ( - - {sortDirection === "desc" ? "↓" : "↑"} - -)} -``` - -### Known Issue - -SA-361 reports that the ascending sort visual indicator does not update correctly for the Answer column. The code structure appears correct, so the bug may be in the conditional rendering logic or state synchronization. - ---- - -## 4. Tag Count as a Sortable Field - -### Current Status - -**Tag count is NOT currently a sortable field.** - -### Backlog Item - -SA-684 requests this feature: - -> "GTC: Ability to sort by tag number effectively" -> -> As a GTC user, I would like to be able to sort by tags descending to find ground truths that have fewer tags than expected to be able to find items needing review. - -### Implementation Requirements - -To add tag count sorting: - -#### Backend Changes - -1. Add `tagCount` to `SortField` enum: - ```python - class SortField(str, Enum): - # existing... - tag_count = "tagCount" - ``` - -2. Add computed field or stored property `tagCount` to documents (similar to `totalReferences` backfill pattern) - -3. Add field mapping in `_build_secure_sort_clause`: - ```python - secure_field_map = { - # existing... - SortField.tag_count: "c.tagCount", - } - ``` - -4. Create backfill script (follow `backfill_total_references.py` pattern) - -5. Update Cosmos DB indexing policy to include `tagCount` - -#### Frontend Changes - -1. Add `"tagCount"` to `SortColumn` type -2. Add sortable column header in table -3. Map frontend column name to API parameter - ---- - -## 5. Sorting Passed to Backend API - -### Frontend Service Call - -The Explorer builds API parameters from applied filter state: - -```typescript -const sortByParam = - appliedFilter.sortColumn === "refs" - ? "totalReferences" - : appliedFilter.sortColumn; - -const params = { - // ...filters - sortBy: sortByParam, - sortOrder: sortByParam ? appliedFilter.sortDirection : undefined, - page: safePage, - limit: itemsPerPage, -}; - -listAllGroundTruths(params); -``` - -### Service Layer - -`groundTruths.ts` passes parameters to the generated API client: - -```typescript -if (params.sortBy) - query.sortBy = params.sortBy as components["schemas"]["SortField"]; -if (params.sortOrder) query.sortOrder = params.sortOrder; -``` - -### API Endpoint - -`GET /v1/ground-truths` accepts query parameters: - -```python -sort_by: SortField = Query(default=SortField.reviewed_at.value, alias="sortBy"), -sort_order: SortOrder = Query(default=SortOrder.desc.value, alias="sortOrder"), -``` - -### Cosmos DB Query - -The repository builds a secure ORDER BY clause: - -```python -def _build_secure_sort_clause(self, sort_field: SortField, sort_direction: SortOrder) -> str: - secure_field_map = { - SortField.id: "c.id", - SortField.updated_at: "c.updatedAt", - SortField.reviewed_at: "c.reviewedAt", - SortField.has_answer: "c.reviewedAt", - SortField.totalReferences: "c.totalReferences", - } - # ...builds "ORDER BY c.field ASC/DESC" -``` - -A secondary sort by `c.id ASC` is added for stable pagination when the primary sort field is not `id`. - ---- - -## Key Findings Summary - -| Question | Answer | -|----------|--------| -| How is sorting implemented? | React state (`sortColumn`, `sortDirection`) with three-state toggle handler | -| Available sort options? | `refs` (totalReferences), `reviewedAt`, `hasAnswer` | -| Sort direction indicator? | Arrow symbols (↓/↑), violet=applied, amber=pending | -| Is tag count sortable? | No - requested in SA-684, not yet implemented | -| How is sort passed to API? | `sortBy` and `sortOrder` query params to `GET /v1/ground-truths` | - ---- - -## Recommendations - -1. **SA-361 bug fix**: Investigate why ascending sort visual for Answer column doesn't update -2. **SA-684 implementation**: Follow `totalReferences` pattern for computed `tagCount` field -3. **Consider**: Adding `updatedAt` as a frontend sortable column (already supported by backend) diff --git a/.copilot-tracking/subagent/20260122/explorer-state-preservation-research.md b/.copilot-tracking/subagent/20260122/explorer-state-preservation-research.md deleted file mode 100644 index 7857b45..0000000 --- a/.copilot-tracking/subagent/20260122/explorer-state-preservation-research.md +++ /dev/null @@ -1,203 +0,0 @@ -# Explorer State Preservation Research - -**Research Date:** 2026-01-22 -**Related Issue:** SA-364 - GTC Explorer: Assign from explorer switches to curation view, losing filters - ---- - -## 1. Current Explorer Component Structure - -### Primary Component -- **File:** [src/components/app/QuestionsExplorer.tsx](../../../frontend/src/components/app/QuestionsExplorer.tsx) -- **Type:** Functional component with internal state management -- **Purpose:** Displays ground truth items in a filterable, sortable table with actions (Assign, Inspect, Delete) - -### Component Hierarchy -``` -App.tsx -└── GTAppDemo (demo.tsx) - ├── AppHeader - ├── QuestionsExplorer (viewMode === "questions") - ├── CuratePane (viewMode === "curate") - └── StatsPage (viewMode === "stats") -``` - -### Key Interfaces - -```typescript -interface FilterState { - status: FilterType; // "all" | "draft" | "approved" | "skipped" | "deleted" - dataset: string; // dataset name or "all" - tags: string[]; // array of selected tags (AND logic) - itemId: string; // item ID filter text - refUrl: string; // reference URL filter text - sortColumn: SortColumn; // "refs" | "reviewedAt" | "hasAnswer" | null - sortDirection: SortDirection; // "asc" | "desc" -} -``` - ---- - -## 2. Filter State Management Analysis - -### Current Implementation: Local Component State - -All filter state is managed via `useState` hooks **inside** `QuestionsExplorer`: - -```typescript -// Filter state (unapplied - UI inputs) -const [activeFilter, setActiveFilter] = useState("all"); -const [selectedDataset, setSelectedDataset] = useState("all"); -const [selectedTags, setSelectedTags] = useState([]); -const [itemIdFilter, setItemIdFilter] = useState(""); -const [referenceUrlFilter, setReferenceUrlFilter] = useState(""); -const [sortColumn, setSortColumn] = useState(null); -const [sortDirection, setSortDirection] = useState("desc"); -const [itemsPerPage, setItemsPerPage] = useState(25); - -// Applied filter state (what was last sent to backend) -const [appliedFilter, setAppliedFilter] = useState({...}); -const [currentPage, setCurrentPage] = useState(1); -``` - -### Two-Phase Filter Pattern -1. **Unapplied state:** User modifies filters in UI -2. **Applied state:** User clicks "Apply Filters" button to execute query -3. `hasUnappliedChanges` computed via `useMemo` to track dirty state - -### Problems with Current Approach -- **No state lifting:** Filter state is entirely local to `QuestionsExplorer` -- **No persistence:** When component unmounts (view switch), all state is lost -- **No URL sync:** Filters are not reflected in URL params -- **No context provider:** No shared state mechanism across views - ---- - -## 3. Navigation Actions That Cause State Loss - -### Identified Navigation Triggers - -| Action | Code Location | Effect | -|--------|--------------|--------| -| **Assign button** | `demo.tsx:207-229` | Calls `assignItem()`, then `setViewMode("curate")` | -| **Header toggle** | `AppHeader.tsx:48-56` | Toggles between "curate" and "questions" | -| **Stats button** | `AppHeader.tsx:57-64` | Sets `viewMode` to "stats" | - -### Critical Code Path (Assign Action) - -```typescript -// demo.tsx lines 207-229 -onAssign={async (item) => { - // ...validation... - await assignItem(item.datasetName, item.bucket, item.id); - await gt.refreshList(); - await gt.selectItem(item.id); - setViewMode("curate"); // <-- CAUSES UNMOUNT OF QuestionsExplorer - toast("success", `Assigned ${item.id} for curation`); -}} -``` - -**Root cause:** `setViewMode("curate")` triggers React to unmount `QuestionsExplorer` and mount `CuratePane`, destroying all local filter state. - ---- - -## 4. State Persistence Mechanisms - -### Current State: **None implemented** - -#### localStorage -- **Usage:** Commented out in `CuratePane.tsx` (line 163) -- **Status:** Not active for any feature - -#### URL State / Query Parameters -- **Routing library:** **None** - app uses simple `viewMode` state switching -- **URL params:** Not used for any state persistence -- **`package.json`:** No `react-router`, `@tanstack/router`, or similar - -#### Context API -- **Existing contexts:** None for filter/view state -- **Pattern:** App uses prop drilling from `GTAppDemo` to children - -#### Session/Browser APIs -- `sessionStorage`: Not used -- `history.pushState/replaceState`: Not used - ---- - -## 5. Routing Architecture - -### Current Implementation: **No Routing Library** - -The application uses a simple state-based view switching pattern: - -```typescript -// demo.tsx -const [viewMode, setViewMode] = useState<"curate" | "questions" | "stats">("curate"); - -// Conditional rendering -{viewMode === "stats" && } -{viewMode === "questions" && } -{viewMode === "curate" && } -``` - -### Implications -- No URL-based navigation -- No browser back/forward support -- No deep linking capability -- No route-based code splitting - ---- - -## 6. Summary & Recommendations - -### Key Findings - -| Finding | Status | Impact | -|---------|--------|--------| -| Filter state is local to component | ✅ Confirmed | State lost on unmount | -| No routing library | ✅ Confirmed | No URL-based persistence | -| No localStorage usage | ✅ Confirmed | No browser persistence | -| No Context for filters | ✅ Confirmed | No cross-view sharing | -| Assign triggers view switch | ✅ Confirmed | Direct cause of SA-364 | - -### Recommended Solutions (Priority Order) - -#### Option A: Lift State to Parent (Minimal Change) -- Move `FilterState` to `GTAppDemo` -- Pass as props to `QuestionsExplorer` -- State survives view switches -- **Effort:** Low | **Risk:** Low - -#### Option B: URL Query Parameters (Better UX) -- Sync filter state to URL search params -- Use `URLSearchParams` API directly (no router needed) -- Enables deep linking and back/forward -- **Effort:** Medium | **Risk:** Low - -#### Option C: Context + localStorage (Full Persistence) -- Create `ExplorerFilterContext` -- Persist to localStorage on change -- Restore on mount -- **Effort:** Medium | **Risk:** Low - -#### Option D: Add React Router (Future-Proof) -- Integrate routing library -- Route-based view switching -- URL state via loader/search params -- **Effort:** High | **Risk:** Medium - -### Alternative Quick Fix -Per SA-364 proposed solution #1: -> "Do not automatically switch to the curation view when making an assignment from the explorer" - -This would involve removing `setViewMode("curate")` from the assign handler, keeping user in Explorer after assignment. However, this may not match desired UX if user wants to immediately curate the assigned item. - ---- - -## Files Referenced - -- [frontend/src/demo.tsx](../../../frontend/src/demo.tsx) - Main app container -- [frontend/src/components/app/QuestionsExplorer.tsx](../../../frontend/src/components/app/QuestionsExplorer.tsx) - Explorer component -- [frontend/src/components/app/AppHeader.tsx](../../../frontend/src/components/app/AppHeader.tsx) - Navigation header -- [frontend/package.json](../../../frontend/package.json) - Dependencies -- [prd-refined-2.json](../../../prd-refined-2.json) - Issue SA-364 definition diff --git a/.copilot-tracking/subagent/20260122/explorer-view-research.md b/.copilot-tracking/subagent/20260122/explorer-view-research.md deleted file mode 100644 index 8a8d9d8..0000000 --- a/.copilot-tracking/subagent/20260122/explorer-view-research.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -topic: explorer-view -jtbd: JTBD-001 -date: 2026-01-22 -status: complete ---- - -# Research: Explorer View - -## Context - -The explorer view enables browsing and filtering ground-truth items outside the assigned queue, and initiating actions such as inspection, assignment, and deletion. - -## Sources Consulted - -### URLs -- (None) - -### Codebase -- [frontend/src/components/app/QuestionsExplorer.tsx](frontend/src/components/app/QuestionsExplorer.tsx): Implements an explorer UI with filtering (status/dataset/tags/itemId/refUrl), sorting, pagination, and item actions. -- [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts): Implements `listAllGroundTruths()` and maps API payloads into the frontend model. -- [frontend/src/services/tags.ts](frontend/src/services/tags.ts): Fetches manual/computed tags and validates exclusive tag groups. - -### Documentation -- [.copilot-tracking/research/20260121-high-level-requirements-research.md](.copilot-tracking/research/20260121-high-level-requirements-research.md): Provides repo-wide behavioral requirements context. - -## Key Findings - -1. The explorer supports server-backed listing (`GET /v1/ground-truths`) with query parameters for status, dataset, tags, itemId, refUrl, sorting, and pagination. -2. The explorer fetches and displays available datasets and tags to drive filtering. -3. The explorer UI includes a concept of “inspect” and “assign” actions per item, plus a delete action. -4. The explorer assumes the backend provides pagination metadata when listing items. -5. Doc-only gaps exist in documentation about searching/browsing, but the explorer implementation is the current source of truth. - -## Existing Patterns - -| Pattern | Location | Relevance | -|---------|----------|-----------| -| Filter state vs applied filter state | [frontend/src/components/app/QuestionsExplorer.tsx](frontend/src/components/app/QuestionsExplorer.tsx) | Implements explicit Apply behavior and avoids unnecessary calls | -| Server-side sorting and pagination | [frontend/src/components/app/QuestionsExplorer.tsx](frontend/src/components/app/QuestionsExplorer.tsx) | Assumes backend performs sorting and returns pagination | -| List API wrapper mapping wire schema to UI model | [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts) | Defines frontend expectations for list payload shape | - -## Open Questions - -- (None) - -## Recommendations for Spec - -- Specify supported explorer filters and sorting fields as observable UI capabilities. -- Specify that the list view uses server-backed pagination when available. -- Specify that explorer actions (inspect/assign/delete) are initiated from the UI but depend on backend support. diff --git a/.copilot-tracking/subagent/20260122/export-snapshots-research.md b/.copilot-tracking/subagent/20260122/export-snapshots-research.md deleted file mode 100644 index 4554699..0000000 --- a/.copilot-tracking/subagent/20260122/export-snapshots-research.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -topic: export-snapshots -jtbd: JTBD-001 -date: 2026-01-22 -status: complete ---- - -# Research: Export Snapshots - -## Context - -The export system generates downloadable JSON snapshots of curated data in configurable formats. - -## Sources Consulted - -### URLs -- (None) - -### Codebase -- [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts): Includes `downloadSnapshot` function that triggers a browser download of JSON. -- [backend/app/services/snapshot_service.py](backend/app/services/snapshot_service.py): Implements snapshot export logic. - -### Documentation -- [.copilot-tracking/research/20260121-high-level-requirements-research.md](.copilot-tracking/research/20260121-high-level-requirements-research.md): Consolidates export/snapshot requirements. -- [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md): Documents attachment and artifact export modes, defaults, and manifest requirements. - -## Key Findings - -1. The backend supports two export modes: `attachment` (single JSON file) and `artifact` (per-item JSON files + manifest). -2. The snapshot download endpoint returns a JSON document for browser download with Content-Disposition header. -3. Artifact exports include a manifest with a stable `schemaVersion` and snapshot metadata. -4. Export processors run before formatting and may merge tag fields into a single `tags` array. -5. The frontend triggers download via a service function that invokes the snapshot endpoint. - -## Existing Patterns - -| Pattern | Location | Relevance | -|---------|----------|-----------| -| Snapshot export endpoint with Content-Disposition | [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md) | Defines wire behavior for download | -| Manifest with schemaVersion | [backend/docs/export-pipeline.md](backend/docs/export-pipeline.md) | Defines contract for artifact mode | - -## Open Questions - -- (None) - -## Recommendations for Spec - -- Specify supported export modes (attachment, artifact) and their default behavior. -- Specify that attachment mode returns a single JSON document with download headers. -- Specify that artifact mode includes a manifest with `schemaVersion`. diff --git a/.copilot-tracking/subagent/20260122/inspection-performance-research.md b/.copilot-tracking/subagent/20260122/inspection-performance-research.md deleted file mode 100644 index 4df4f4e..0000000 --- a/.copilot-tracking/subagent/20260122/inspection-performance-research.md +++ /dev/null @@ -1,197 +0,0 @@ -# Inspection Performance Research - -**Date:** 2026-01-22 -**Topic:** Caching and memoization patterns for inspection modals - -## 1. InspectItemModal Implementation - -**Location:** [frontend/src/components/modals/InspectItemModal.tsx](frontend/src/components/modals/InspectItemModal.tsx) - -### Data Fetching Pattern - -The `InspectItemModal` component fetches complete item data on every open: - -```tsx -// Lines 62-111 -useEffect(() => { - if (!isOpen || !item) { - setCompleteItem(null); - setLoadError(null); - return; - } - - // Always fetch fresh data to ensure we get complete conversation history - setIsLoading(true); - setLoadError(null); - - (async () => { - const completeItemData = await getGroundTruth( - item.datasetName || "", - item.bucket || "", - item.id, - ); - setCompleteItem(completeItemData); - })() -}, [isOpen, item]); -``` - -### Data Fetched - -- Complete `GroundTruthItem` via `getGroundTruth()` API call -- Runtime configuration for trusted reference domains -- Uses `MultiTurnEditor` component in read-only mode to display conversation - -### Performance Issue - -**No caching of previously viewed items.** Each time the modal opens for the same item, a fresh API call is made. The comment explicitly states "Always fetch fresh data" but this is unnecessary for recently viewed items in a read-only context. - -## 2. TurnReferencesModal Implementation - -**Location:** [frontend/src/components/app/editor/TurnReferencesModal.tsx](frontend/src/components/app/editor/TurnReferencesModal.tsx) - -### References Computation - -The modal filters references for a specific turn on every render: - -```tsx -// Line 88 - computed on every render -const turnRefs = references.filter((r) => r.messageIndex === messageIndex); -``` - -Additional computed values on every render: -```tsx -// Line 91 - set computed on every render -const urlsInTurn = new Set(turnRefs.map((r) => normalizeUrl(r.url))); -``` - -### Performance Issue - -**No memoization for references filtering.** The `turnRefs` filter and `urlsInTurn` Set are recomputed on every render, even when `references` and `messageIndex` haven't changed. - -## 3. Existing Caching Patterns - -### Service-Level Caching - -| Service | Caching Pattern | TTL | -|---------|-----------------|-----| -| [datasets.ts](frontend/src/services/datasets.ts) | In-memory cache with TTL | 5 minutes | -| [runtimeConfig.ts](frontend/src/services/runtimeConfig.ts) | Single-fetch cache (permanent) | Forever | - -**datasets.ts example:** -```typescript -const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes -let datasetsCache: { data: string[] | null; timestamp: number } = { - data: null, - timestamp: 0, -}; -``` - -**runtimeConfig.ts example:** -```typescript -let cachedConfig: RuntimeConfig | null = null; -let configPromise: Promise | null = null; - -export async function getRuntimeConfig(): Promise { - if (cachedConfig) return cachedConfig; - if (configPromise) return configPromise; - // ... fetch and cache -} -``` - -### No Ground Truth Item Caching - -The `groundTruths.ts` service has **no caching mechanism** for individual items. Each `getGroundTruth()` call makes a fresh API request. - -## 4. React Query / Data Fetching Library Status - -**React Query is NOT currently in use.** - -The `package.json` shows no `@tanstack/query` or `react-query` dependency: - -```json -"dependencies": { - "@microsoft/applicationinsights-web": "^3.0.4", - "openapi-fetch": "^0.9.8", - "react": "^19.1.1", - // ... no react-query -} -``` - -The reference in [connecting-e2e-best-practices.md](frontend/docs/connecting-e2e-best-practices.md) is documentation/guidance, not actual implementation. - -**Current data fetching approach:** -- Direct `fetch()` calls via `openapi-fetch` client -- Manual state management with `useState`/`useEffect` -- No automatic caching, deduplication, or stale-while-revalidate patterns - -## 5. Existing Memoization Patterns - -### useCallback Usage - -Found in multiple hooks: - -| File | Pattern | -|------|---------| -| [useReferencesSearch.ts](frontend/src/hooks/useReferencesSearch.ts) | `runSearch`, `clearResults` wrapped in `useCallback` | -| [useTags.ts](frontend/src/hooks/useTags.ts) | `refresh`, `ensureTag`, `filter` wrapped in `useCallback` | -| [useToasts.ts](frontend/src/hooks/useToasts.ts) | `dismiss`, `clear`, `showToast` wrapped in `useCallback` | -| [useGroundTruth.ts](frontend/src/hooks/useGroundTruth.ts) | Extensive `useCallback` usage for all actions | - -### useMemo Usage - -Found in components: - -| File | Pattern | -|------|---------| -| [QueueSidebar.tsx](frontend/src/components/app/QueueSidebar.tsx) | `ids` memoized | -| [QuestionsExplorer.tsx](frontend/src/components/app/QuestionsExplorer.tsx) | `hasUnappliedChanges`, `displayItems` memoized | -| [TagsEditor.tsx](frontend/src/components/app/editor/TagsEditor.tsx) | `suggestions` memoized | -| [InstructionsPane.tsx](frontend/src/components/app/InstructionsPane.tsx) | Memoization used | -| [useGroundTruth.ts](frontend/src/hooks/useGroundTruth.ts) | `qaChanged`, `canApprove`, `hasUnsaved` memoized | - -### Gaps in Memoization - -**InspectItemModal:** No `useMemo` or `useCallback` hooks used -**TurnReferencesModal:** No `useMemo` for `turnRefs` or `urlsInTurn` computations - -## 6. Recommendations - -### Immediate Optimizations - -1. **Add item cache to InspectItemModal:** - - Implement LRU cache for recently viewed items - - Cache key: `${datasetName}:${bucket}:${id}` - - Suggested TTL: 2-5 minutes or LRU with 10-20 item limit - -2. **Memoize TurnReferencesModal computations:** - ```tsx - const turnRefs = useMemo( - () => references.filter((r) => r.messageIndex === messageIndex), - [references, messageIndex] - ); - - const urlsInTurn = useMemo( - () => new Set(turnRefs.map((r) => normalizeUrl(r.url))), - [turnRefs] - ); - ``` - -### Medium-Term Improvements - -3. **Service-level item caching:** - - Add caching to `groundTruths.ts` similar to `datasets.ts` pattern - - Consider cache invalidation on save operations - -4. **Consider React Query adoption:** - - Provides automatic caching, deduplication, background refetch - - Simpler code for cache management - - Already documented in best practices - -## Summary - -| Component | Issue | Severity | -|-----------|-------|----------| -| InspectItemModal | No item caching - fetches on every open | High | -| TurnReferencesModal | No memoization for references filter | Medium | -| groundTruths.ts | No service-level item cache | Medium | -| Overall | No React Query adoption | Low | diff --git a/.copilot-tracking/subagent/20260122/keyword-search-research.md b/.copilot-tracking/subagent/20260122/keyword-search-research.md deleted file mode 100644 index 6e2fc0b..0000000 --- a/.copilot-tracking/subagent/20260122/keyword-search-research.md +++ /dev/null @@ -1,162 +0,0 @@ -# Keyword Search Research - -## Research Questions Answered - -### 1. How does the Explorer currently fetch and display ground truth items? - -The Explorer component ([frontend/src/components/app/QuestionsExplorer.tsx](../../../frontend/src/components/app/QuestionsExplorer.tsx)) fetches data via `listAllGroundTruths()` from the groundTruths service. Key behaviors: - -- **Server-side pagination and filtering**: Uses `GET /v1/ground-truths` with query parameters -- **Filter state vs applied state**: Separates filter UI state from applied/committed filters to batch changes -- **Explicit Apply button**: Users must click "Apply Filters" to send filter changes to backend -- **Parameters supported**: `status`, `dataset`, `tags`, `itemId`, `refUrl`, `sortBy`, `sortOrder`, `page`, `limit` - -### 2. What data fields exist on ground truth items that would need to be searched? - -From [frontend/src/models/groundTruth.ts](../../../frontend/src/models/groundTruth.ts) and [backend/app/domain/models.py](../../../backend/app/domain/models.py): - -**Primary text fields for keyword search:** -| Field | Type | Description | -|-------|------|-------------| -| `question` | string | The question text (derived from `synthQuestion` or `editedQuestion`) | -| `answer` | string | The answer text | -| `history` | ConversationTurn[] | Multi-turn conversation history | -| `history[].content` (msg) | string | Individual turn content (user or agent) | -| `comment` | string | Free-form curator notes | - -**History/ConversationTurn structure (multi-turn):** -```typescript -type ConversationTurn = { - role: "user" | "agent"; - content: string; - expectedBehavior?: ExpectedBehavior[]; -}; -``` - -**Backend HistoryItem model:** -```python -class HistoryItem(BaseModel): - role: HistoryItemRole # User or Assistant - msg: str - refs: Optional[list[Reference]] = None - expected_behavior: Optional[list[ExpectedBehavior]] -``` - -### 3. Is there any existing search functionality in the frontend or backend? - -**Backend search service exists but serves a different purpose:** - -- **File:** [backend/app/api/v1/search.py](../../../backend/app/api/v1/search.py) -- **Endpoint:** `GET /v1/search?q=&top=` -- **Purpose:** Queries an external AI Search index for reference documents (not ground truth items) -- **Implementation:** Delegates to `SearchService.query()` which uses a `SearchAdapter` for external search backends - -**Current filtering in Explorer (not keyword search):** -- `itemId`: Case-sensitive partial match on item ID -- `refUrl`: Case-sensitive partial match on reference URLs (item-level and history-level) -- `tags`: Filter by manual/computed tags (AND logic) -- `status`, `dataset`: Exact match filters - -**No existing keyword search for question/answer/history text content.** - -### 4. What API endpoints does the Explorer use to fetch items? - -| Endpoint | Method | Purpose | -|----------|--------|---------| -| `GET /v1/ground-truths` | GET | List/filter ground truths with pagination | -| `GET /v1/ground-truths/{datasetName}/{bucket}/{item_id}` | GET | Get single item by ID | -| `PUT /v1/ground-truths/{datasetName}/{bucket}/{item_id}` | PUT | Update item | -| `DELETE /v1/ground-truths/{datasetName}/{bucket}/{item_id}` | DELETE | Soft-delete item | - -**List endpoint query parameters:** -- `status`: Filter by status (draft, approved, skipped, deleted) -- `dataset`: Filter by dataset name -- `tags`: Comma-separated list of tags (AND logic) -- `itemId`: Partial ID match -- `refUrl`: Partial reference URL match -- `sortBy`: Sort field (reviewedAt, totalReferences, hasAnswer) -- `sortOrder`: asc or desc -- `page`, `limit`: Pagination - -### 5. How is the data structured in Cosmos DB and what indexes exist? - -**Container structure:** -- Uses MultiHash partition key: `[/datasetName, /bucket]` -- Ground truth items have `docType: "ground-truth-item"` - -**Indexing policy** from [backend/scripts/indexing-policy.json](../../../backend/scripts/indexing-policy.json): - -```json -{ - "indexingMode": "consistent", - "automatic": true, - "includedPaths": [{ "path": "/*" }], - "excludedPaths": [{ "path": "/\"_etag\"/?" }], - "compositeIndexes": [ - // For sorting by reviewedAt, updatedAt, totalReferences - [{"path": "/reviewedAt", "order": "descending"}, {"path": "/id", "order": "ascending"}], - [{"path": "/totalReferences", "order": "descending"}, {"path": "/id", "order": "ascending"}], - // ... more composite indexes for combined status + sort scenarios - ], - "fullTextIndexes": [] // Currently empty - no full-text search indexes -} -``` - -**Key finding:** `fullTextIndexes` array is empty. Cosmos DB does support full-text search via `FullTextContains()` function, but requires explicit full-text indexing configuration. - ---- - -## Key Findings Summary - -### Current State -1. **No keyword search exists** for searching text content in questions, answers, or multi-turn history -2. Explorer supports filtering by ID, URL, tags, status, and dataset - but not text content -3. An external search service exists but searches reference documents, not ground truth items -4. Cosmos DB full-text indexing is not currently configured - -### Fields to Search -For comprehensive keyword search across all conversation text: -- `synthQuestion` / `editedQuestion` (question text) -- `answer` -- `history[*].msg` (all turn content - both user and agent messages) -- Optionally: `comment` (curator notes) - -### Implementation Considerations - -**Option A: In-memory filtering (simple, limited scale)** -- Fetch all items matching other filters, filter in memory -- Pros: No infrastructure changes -- Cons: Poor performance with large datasets, RU cost for fetching all items - -**Option B: Cosmos DB full-text search** -- Add full-text indexes to indexing policy -- Use `FullTextContains()` or `FullTextScore()` in queries -- Pros: Native Cosmos support, server-side filtering -- Cons: Requires index configuration, may not work with Cosmos emulator - -**Option C: Azure AI Search integration** -- Index ground truth items in Azure AI Search -- Leverage existing `SearchService` pattern -- Pros: Advanced search capabilities, ranking -- Cons: Additional infrastructure, sync complexity - -### Recommended Next Steps -1. Determine scale requirements (how many items, how often searched) -2. Decide on search scope (question only vs all text fields vs multi-turn history) -3. Evaluate Cosmos DB full-text search feasibility (emulator compatibility) -4. Design API contract for keyword search parameter - ---- - -## Sources Consulted - -### Codebase Files -- [frontend/src/components/app/QuestionsExplorer.tsx](../../../frontend/src/components/app/QuestionsExplorer.tsx) - Explorer component implementation -- [frontend/src/models/groundTruth.ts](../../../frontend/src/models/groundTruth.ts) - Frontend data model -- [frontend/src/services/groundTruths.ts](../../../frontend/src/services/groundTruths.ts) - API service layer -- [backend/app/api/v1/ground_truths.py](../../../backend/app/api/v1/ground_truths.py) - Ground truths API endpoints -- [backend/app/api/v1/search.py](../../../backend/app/api/v1/search.py) - Existing search endpoint -- [backend/app/domain/models.py](../../../backend/app/domain/models.py) - Backend data models -- [backend/app/services/search_service.py](../../../backend/app/services/search_service.py) - Search service implementation -- [backend/app/adapters/repos/cosmos_repo.py](../../../backend/app/adapters/repos/cosmos_repo.py) - Cosmos DB repository -- [backend/scripts/indexing-policy.json](../../../backend/scripts/indexing-policy.json) - Cosmos DB index configuration diff --git a/.copilot-tracking/subagent/20260122/modal-keyboard-handling-research.md b/.copilot-tracking/subagent/20260122/modal-keyboard-handling-research.md deleted file mode 100644 index 3cbca53..0000000 --- a/.copilot-tracking/subagent/20260122/modal-keyboard-handling-research.md +++ /dev/null @@ -1,201 +0,0 @@ -# Modal Keyboard Handling Research - -**Date:** 2025-01-22 -**Topic:** modal-keyboard-handling -**Status:** Complete - -## Key Findings Summary - -| Question | Finding | -|----------|---------| -| Modal/dialog library | Custom implementation using React Portals (`createPortal`) | -| TurnReferencesModal location | [TurnReferencesModal.tsx](../../../frontend/src/components/app/editor/TurnReferencesModal.tsx) | -| Keyboard handling approach | Per-modal `onKeyDown` handlers + `useModalKeys` hook | -| Global keyboard system | Yes - `useGlobalHotkeys` and `ReferencesTabs` listeners | -| Input field handling | `stopPropagation()` pattern used inconsistently | - ---- - -## 1. Modal/Dialog Component Library - -**Finding:** The project uses a **custom modal system** built on React Portals - no third-party dialog library. - -### Components - -| File | Purpose | -|------|---------| -| [ModalPortal.tsx](../../../frontend/src/components/modals/ModalPortal.tsx) | Portal wrapper rendering to `#modal-root` | -| [InspectItemModal.tsx](../../../frontend/src/components/modals/InspectItemModal.tsx) | Read-only item inspection modal | -| [TurnReferencesModal.tsx](../../../frontend/src/components/app/editor/TurnReferencesModal.tsx) | Reference management modal | -| [TagsModal.tsx](../../../frontend/src/components/app/editor/TagsModal.tsx) | Tag management modal | - -### Portal Target - -```html - - -``` - ---- - -## 2. TurnReferencesModal Implementation - -**Location:** [frontend/src/components/app/editor/TurnReferencesModal.tsx](../../../frontend/src/components/app/editor/TurnReferencesModal.tsx) - -### Structure - -```tsx - -
{/* Backdrop */} -
-
-``` - -### Current Keyboard Handling (Lines 395-400) - -```tsx -onKeyDown={(e) => { - // Allow Escape to close, but let other keys pass through - if (e.key === "Escape") { - e.stopPropagation(); - onClose(); - } -}} -``` - -### Input Field Handler (Lines 442-447) - -```tsx - { - if (e.key === "Enter") { - e.preventDefault(); - handleSearchSubmit(); - } - }} -/> -``` - -**Issue:** Does NOT use `useModalKeys` hook or call `stopPropagation()` for the search input. - ---- - -## 3. Global Keyboard Shortcut System - -### useGlobalHotkeys Hook - -**Location:** [frontend/src/hooks/useGlobalHotkeys.ts](../../../frontend/src/hooks/useGlobalHotkeys.ts) - -```typescript -// Handles: Cmd/Ctrl+S (save draft), Cmd/Ctrl+Enter (approve) -// Checks isEditable before handling Enter -window.addEventListener("keydown", onKeyDown); -``` - -### useModalKeys Hook - -**Location:** [frontend/src/hooks/useModalKeys.ts](../../../frontend/src/hooks/useModalKeys.ts) - -```typescript -// Handles: Escape (close), Enter (confirm if not busy) -// Checks isEditable before handling Enter -// Used by InspectItemModal but NOT TurnReferencesModal -``` - -### ReferencesTabs Global Listener - -**Location:** [frontend/src/components/app/ReferencesPanel/ReferencesTabs.tsx](../../../frontend/src/components/app/ReferencesPanel/ReferencesTabs.tsx#L59-L76) - -```typescript -// Handles: Cmd/Ctrl+1 (search tab), Cmd/Ctrl+2 (selected tab) -// Checks isEditable before processing -window.addEventListener("keydown", onKeyDown); -``` - ---- - -## 4. Input Field Focus and Event Handling - -### Pattern Analysis - -| Component | Pattern | Issue | -|-----------|---------|-------| -| TagsModal | `onKeyDown={(e) => e.stopPropagation()}` on outer div | ✅ Prevents ALL key events from propagating | -| TurnReferencesModal | Only stops propagation for Escape | ⚠️ Other keys may leak to global listeners | -| InspectItemModal | Uses `useModalKeys` hook | ✅ Hook checks `isEditable` | - -### TagsModal Pattern (Best Practice Found) - -```tsx -
e.stopPropagation()} - onKeyDown={(e) => e.stopPropagation()} // Blocks all keyboard events - role="dialog" - aria-modal="true" -> -``` - -### TurnReferencesModal Pattern (Current) - -```tsx -
e.stopPropagation()} - onKeyDown={(e) => { - if (e.key === "Escape") { // Only Escape is handled - e.stopPropagation(); - onClose(); - } - }} - role="dialog" -> -``` - ---- - -## 5. Potential Issues Identified - -### Issue 1: Inconsistent `stopPropagation()` Usage - -- **TagsModal** blocks ALL keyboard events from propagating -- **TurnReferencesModal** only blocks Escape - other keys like `Cmd+1`, `Cmd+2` may trigger `ReferencesTabs` tab switching - -### Issue 2: Missing `useModalKeys` Hook - -- **TurnReferencesModal** implements its own partial keyboard handling -- **InspectItemModal** uses the standardized `useModalKeys` hook -- This creates inconsistent behavior across modals - -### Issue 3: Global Listener Race Conditions - -Multiple global `keydown` listeners exist: -1. `useGlobalHotkeys` (save/approve) -2. `useModalKeys` (escape/enter) -3. `ReferencesTabs` (tab switching) - -Each checks `isEditable` independently, but order of execution is not guaranteed. - ---- - -## 6. Recommendations - -1. **Standardize keyboard handling** - Update TurnReferencesModal to use `useModalKeys` hook -2. **Block all events on modal container** - Add `onKeyDown={(e) => e.stopPropagation()}` to prevent global listener interference -3. **Keep input-specific handlers** - Let Enter in search input trigger search, not modal close -4. **Consider event delegation** - Centralize keyboard event handling to avoid race conditions - ---- - -## Files Referenced - -- [frontend/src/components/app/editor/TurnReferencesModal.tsx](../../../frontend/src/components/app/editor/TurnReferencesModal.tsx) -- [frontend/src/components/app/editor/TagsModal.tsx](../../../frontend/src/components/app/editor/TagsModal.tsx) -- [frontend/src/components/modals/ModalPortal.tsx](../../../frontend/src/components/modals/ModalPortal.tsx) -- [frontend/src/components/modals/InspectItemModal.tsx](../../../frontend/src/components/modals/InspectItemModal.tsx) -- [frontend/src/hooks/useModalKeys.ts](../../../frontend/src/hooks/useModalKeys.ts) -- [frontend/src/hooks/useGlobalHotkeys.ts](../../../frontend/src/hooks/useGlobalHotkeys.ts) -- [frontend/src/components/app/ReferencesPanel/ReferencesTabs.tsx](../../../frontend/src/components/app/ReferencesPanel/ReferencesTabs.tsx) -- [frontend/index.html](../../../frontend/index.html) (line 12 - modal-root div) diff --git a/.copilot-tracking/subagent/20260122/observability-operations-research.md b/.copilot-tracking/subagent/20260122/observability-operations-research.md deleted file mode 100644 index a5a9e4a..0000000 --- a/.copilot-tracking/subagent/20260122/observability-operations-research.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -topic: observability-operations -jtbd: JTBD-001 -date: 2026-01-22 -status: complete ---- - -# Research: Observability and Operations - -## Context - -The observability and operations system provides opt-in telemetry, error handling, health endpoints, and demo-safe operation modes. - -## Sources Consulted - -### URLs -- (None) - -### Codebase -- [backend/app/main.py](backend/app/main.py): Defines `GET /healthz` endpoint returning repo/backend info. -- [frontend/src/services/telemetry.ts](frontend/src/services/telemetry.ts): Implements opt-in telemetry with safe no-op behavior. - -### Documentation -- [.copilot-tracking/research/20260121-high-level-requirements-research.md](.copilot-tracking/research/20260121-high-level-requirements-research.md): Consolidates observability requirements. -- [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md): Documents telemetry opt-in policy, error boundaries, and safe-by-default behavior. -- [frontend/README.md](frontend/README.md): Describes demo mode and telemetry configuration. - -## Key Findings - -1. The backend exposes a `GET /healthz` endpoint that returns repository and backend status. -2. Client telemetry is opt-in, disabled by default, and safe-by-default (no-op in demo mode or when configuration is missing). -3. The UI provides an error boundary that catches rendering errors and shows a user-friendly fallback. -4. Demo mode disables or safely no-ops telemetry and can use mock providers. -5. Telemetry integration with Application Insights is available when configured. - -## Existing Patterns - -| Pattern | Location | Relevance | -|---------|----------|-----------| -| Health endpoint | [backend/app/main.py](backend/app/main.py) | Defines operational status check | -| Opt-in telemetry with no-op fallback | [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md) | Defines safe-by-default policy | -| Error boundary | [frontend/docs/OBSERVABILITY_IMPLEMENTATION.md](frontend/docs/OBSERVABILITY_IMPLEMENTATION.md) | Defines graceful error handling UX | - -## Open Questions - -- (None) - -## Recommendations for Spec - -- Specify that the backend exposes a health endpoint at `GET /healthz`. -- Specify that client telemetry is opt-in and safe-by-default. -- Specify that the UI provides an error boundary for rendering failures. diff --git a/.copilot-tracking/subagent/20260122/partial-updates-research.md b/.copilot-tracking/subagent/20260122/partial-updates-research.md deleted file mode 100644 index 552d73f..0000000 --- a/.copilot-tracking/subagent/20260122/partial-updates-research.md +++ /dev/null @@ -1,275 +0,0 @@ -# Partial Updates Research: SA-244 - -**Research Date:** 2026-01-22 -**Topic:** Cosmos DB Partial Document Updates (Patch Operations) -**JTBD:** Help optimize GTC performance and Cosmos usage - ---- - -## Executive Summary - -The GTC codebase currently uses **full document replacement** (`replace_item`, `upsert_item`) for most updates, but already has **one working patch implementation** for assignment operations. Expanding partial updates to additional operations would reduce network bandwidth, improve latency, and potentially lower RU consumption for common update patterns. - ---- - -## Current Codebase Analysis - -### 1. Update Methods Currently Used - -| Method | Location | Usage | -|--------|----------|-------| -| `replace_item` | [cosmos_repo.py#L1113](backend/app/adapters/repos/cosmos_repo.py#L1113) | Main GT update with ETag | -| `replace_item` | [cosmos_repo.py#L1158](backend/app/adapters/repos/cosmos_repo.py#L1158) | GT update in retry loop | -| `replace_item` | [cosmos_repo.py#L1869](backend/app/adapters/repos/cosmos_repo.py#L1869) | Assignment fallback (emulator) | -| `upsert_item` | [cosmos_repo.py#L1126](backend/app/adapters/repos/cosmos_repo.py#L1126) | Create-if-missing fallback | -| `upsert_item` | [cosmos_repo.py#L1214](backend/app/adapters/repos/cosmos_repo.py#L1214) | Non-ETag updates | -| `upsert_item` | [tags_repo.py#L140](backend/app/adapters/repos/tags_repo.py#L140) | Tags document updates | -| **`patch_item`** | [cosmos_repo.py#L1784](backend/app/adapters/repos/cosmos_repo.py#L1784) | Assignment operations ✅ | - -### 2. Existing Patch Implementation - -The `assign_to` method at line 1784 already uses patch operations successfully: - -```python -patch_operations = [ - {"op": "set", "path": "/assignedTo", "value": user_id}, - {"op": "set", "path": "/assignedAt", "value": now}, - {"op": "set", "path": "/status", "value": GroundTruthStatus.draft.value}, - {"op": "set", "path": "/updatedAt", "value": now}, -] - -await gt.patch_item( - item=item_id, - partition_key=partition_key, - patch_operations=patch_operations, - filter_predicate=filter_predicate, -) -``` - -This demonstrates the pattern is already in production use with conditional updates. - -### 3. Main Update Operations in the Codebase - -| Operation | Fields Changed | Current Method | Patch Candidate? | -|-----------|----------------|----------------|------------------| -| SME assignment | `assignedTo`, `assignedAt`, `status`, `updatedAt` | `patch_item` ✅ | Already using patch | -| Status change | `status`, `updatedAt` | `replace_item` | ✅ High priority | -| Answer approval | `status`, `reviewed_at`, `updatedBy`, `assignedTo`, `assignedAt` | `upsert_gt` | ✅ High priority | -| Edit answer | `answer`, `edited_question`, `comment`, `updatedAt` | `upsert_gt` | ✅ Medium priority | -| Add/update refs | `refs`, `totalReferences`, `updatedAt` | `upsert_gt` | ⚠️ Complex (array operations) | -| Update tags | `manualTags`, `updatedAt` | `upsert_gt` | ✅ Medium priority | -| Update history | `history` | `upsert_gt` | ⚠️ Complex (nested arrays) | -| Curation instructions | Full document | `upsert_item` | ❌ Usually full doc | -| Global tags | `tags` array | `upsert_item` | ⚠️ Could use add/remove | - ---- - -## Azure Cosmos DB Patch API Capabilities - -### Supported Operations - -| Operation | Description | Use Case | -|-----------|-------------|----------| -| `set` | Set field value (creates if missing) | Status updates, field edits | -| `add` | Add to array or create field | Adding tags, refs | -| `replace` | Replace existing value (fails if missing) | Strict updates | -| `remove` | Remove field or array element | Clearing assignments | -| `incr` | Increment numeric field | Counters | -| `move` | Move value between paths | Field migrations | - -### Key Limitations - -1. **Max 10 operations** per patch request -2. **Item must exist** - patch_item fails if item not found (unlike upsert) -3. **No parameterized filter predicates** - SQL injection risk requires careful escaping -4. **System fields immutable** - Cannot patch `_id`, `_ts`, `_etag`, `_rid` -5. **Emulator compatibility** - May need fallback path (as implemented for assign_to) - -### Python SDK Syntax - -```python -# Single operation -operations = [{"op": "set", "path": "/status", "value": "approved"}] - -# Multiple operations -operations = [ - {"op": "set", "path": "/status", "value": "approved"}, - {"op": "set", "path": "/reviewedAt", "value": now}, - {"op": "set", "path": "/assignedTo", "value": None}, - {"op": "remove", "path": "/assignedAt"}, -] - -# With conditional predicate -response = await container.patch_item( - item=item_id, - partition_key=partition_key, - patch_operations=operations, - filter_predicate="FROM c WHERE c.status = 'draft'", - etag=etag, - match_condition=MatchConditions.IfNotModified -) -``` - ---- - -## RU Cost Analysis - -### Microsoft Documentation Findings - -From the [FAQ](https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update-faq): - -> "Partial Document Update is normalized into request unit billing in the same way as other database operations. **Users shouldn't expect a significant reduction in RU.**" - -### Key Performance Benefits - -While RU cost may not dramatically decrease, partial updates provide: - -1. **Reduced Network Bandwidth** - Only changed fields transmitted -2. **Lower End-to-End Latency** - Smaller payloads, faster processing -3. **Atomic Conditional Updates** - Server-side filter predicates -4. **Multi-Region Conflict Resolution** - Automatic path-level merging -5. **Reduced Client CPU** - No read-modify-write cycle needed - -### Estimated Impact for GTC - -| Document Type | Typical Size | Fields Updated | Bandwidth Savings | -|---------------|--------------|----------------|-------------------| -| GroundTruthItem | 5-50 KB | 2-4 fields | 80-95% | -| CurationInstructions | 1-5 KB | Full document | None | -| Tags document | <1 KB | tags array | Minimal | -| AssignmentDocument | <1 KB | Full document | Minimal | - -For large GroundTruthItems with extensive history/refs, the bandwidth savings could be significant. - ---- - -## Recommended Opportunities - -### Priority 1: Status/Assignment Updates (High Impact, Low Risk) - -**Target:** `upsert_gt` when only status-related fields change - -```python -# New method: patch_status -async def patch_status( - self, item_id: str, partition_key: list, - status: GroundTruthStatus, - assigned_to: str | None = None, - reviewed_at: datetime | None = None, - updated_by: str | None = None -) -> bool: - now = datetime.now(timezone.utc).isoformat() - operations = [ - {"op": "set", "path": "/status", "value": status.value}, - {"op": "set", "path": "/updatedAt", "value": now}, - ] - if assigned_to is not None: - operations.append({"op": "set", "path": "/assignedTo", "value": assigned_to}) - # ... etc - return await self._patch_with_fallback(item_id, partition_key, operations) -``` - -**API Endpoints Affected:** -- `PUT /v1/assignments/{dataset}/{bucket}/{item_id}` (approval) -- `PUT /v1/ground-truths/{dataset}/{bucket}/{item_id}` (status change) - -### Priority 2: Field-Specific Updates (Medium Impact) - -**Target:** Single-field updates like `edited_question`, `answer`, `comment` - -```python -async def patch_fields( - self, item_id: str, partition_key: list, - fields: dict[str, Any], etag: str | None = None -) -> GroundTruthItem: - operations = [ - {"op": "set", "path": f"/{k}", "value": v} - for k, v in fields.items() - ] - operations.append({"op": "set", "path": "/updatedAt", "value": now}) - # ... -``` - -### Priority 3: Tags Updates (Medium Impact) - -**Target:** `tags_repo.py` operations - -```python -# Instead of read-modify-write: -operations = [{"op": "add", "path": "/tags/-", "value": new_tag}] -``` - -### Lower Priority / Complex Cases - -- **References array** - Complex nested updates, may need full replacement -- **History array** - Deep nesting with refs inside, likely needs full document -- **Curation instructions** - Usually full document updates - ---- - -## Implementation Considerations - -### 1. Emulator Compatibility - -The existing `assign_to` implementation shows the pattern: -- Try `patch_item` first -- Fall back to read-modify-replace for emulator - -```python -if self.is_cosmos_emulator_in_use(): - return await self._assign_to_with_read_modify_replace(item_id, user_id) -return await self._assign_to_with_patch(item_id, user_id) -``` - -### 2. ETag Handling - -Patch operations support ETag for optimistic concurrency: - -```python -await container.patch_item( - item=item_id, - partition_key=pk, - patch_operations=ops, - etag=etag, - match_condition=MatchConditions.IfNotModified -) -``` - -### 3. Error Handling - -- **412 Precondition Failed** - Filter predicate not satisfied -- **404 Not Found** - Item doesn't exist (patch_item requires existence) -- **400 Bad Request** - Invalid path or operation - -### 4. Testing Strategy - -1. Unit tests for patch operation building -2. Integration tests against emulator (with fallback verification) -3. Integration tests against live Cosmos (if available) - ---- - -## Conclusion - -The codebase already has a working patch implementation for assignments. Expanding this pattern to status updates and field-specific edits would: - -1. **Reduce network bandwidth** by 80-95% for large documents -2. **Improve latency** for common update operations -3. **Enable atomic conditional updates** without read-modify-write cycles -4. **Simplify conflict resolution** in multi-region scenarios - -**Recommended next steps:** -1. Extract common patch helper method from `assign_to` -2. Implement `patch_status` for approval/status changes -3. Implement `patch_fields` for targeted field updates -4. Add comprehensive emulator fallback testing - ---- - -## References - -- [Partial document update in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update) -- [Get started with partial document update](https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update-getting-started) -- [Partial document update FAQ](https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update-faq) -- [Python SDK ContainerProxy.patch_item](https://learn.microsoft.com/en-us/python/api/azure-cosmos/azure.cosmos.containerproxy) -- Existing implementation: [cosmos_repo.py#L1765-L1810](backend/app/adapters/repos/cosmos_repo.py#L1765) diff --git a/.copilot-tracking/subagent/20260122/pii-detection-research.md b/.copilot-tracking/subagent/20260122/pii-detection-research.md deleted file mode 100644 index d9ebdf2..0000000 --- a/.copilot-tracking/subagent/20260122/pii-detection-research.md +++ /dev/null @@ -1,379 +0,0 @@ -# PII Detection Research - -**Date:** 2026-01-22 -**Story:** SA-669 - GTC Needs PII Check -**Status:** Research Complete - -## Executive Summary - -This document captures research findings for implementing PII detection in the Ground Truth Curator's bulk import flow. The feature should scan imported content for personally identifiable information (email addresses and phone numbers first) and warn users without blocking import. - ---- - -## 1. Current Import Flow Analysis - -### Bulk Import Endpoint - -**File:** [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L54-L114) - -The `import_bulk()` endpoint processes ground truth items through these steps: - -``` -1. Receive items via POST /v1/ground-truths -2. Generate IDs for items without one (randomname) -3. Validate items via validate_bulk_items() ← CURRENT VALIDATION HOOK -4. Filter invalid items, collect errors -5. Optionally set approval metadata if approve=true -6. Apply computed tags to each item -7. Persist via container.repo.import_bulk_gt() -8. Return ImportBulkResponse with imported count, errors, and uuids -``` - -### Current Validation Service - -**File:** [backend/app/services/validation_service.py](backend/app/services/validation_service.py) - -The validation service currently: - -- Validates manual tags against the tag registry -- Returns a dict mapping item ID to list of validation errors -- Uses async/concurrent validation for performance -- Pre-fetches tag registry once for all items (efficiency pattern) - -**Key functions:** - -- `validate_ground_truth_item(item, valid_tags_cache)` - validates single item -- `validate_bulk_items(items)` - validates list concurrently - -### Fields Containing Scannable Content - -From [backend/app/domain/models.py](backend/app/domain/models.py#L52-L120): - -| Field | Type | Description | PII Scan Priority | -|-------|------|-------------|-------------------| -| `synth_question` | str | Primary question text | **High** | -| `edited_question` | str | User-edited question | **High** | -| `answer` | str | Answer content | **High** | -| `comment` | str | Curator notes | **High** | -| `history[].msg` | str | Multi-turn messages | **High** | -| `refs[].content` | str | Reference content | Medium | -| `refs[].keyExcerpt` | str | Key excerpt text | Medium | -| `contextUsedForGeneration` | str | Context source | Medium | - ---- - -## 2. Python PII Detection Libraries - -### Microsoft Presidio (Recommended) - -**Package:** `presidio-analyzer` -**Repository:** https://github.com/microsoft/presidio -**License:** MIT - -**Pros:** - -- Microsoft-maintained, enterprise-grade -- Extensible recognizer architecture -- Supports custom patterns and ML models -- Good out-of-box support for email, phone, SSN, credit cards -- Active maintenance and community - -**Cons:** - -- Heavier dependency footprint (spaCy optional but recommended) -- Requires model downloads for best accuracy - -**Usage Example:** - -```python -from presidio_analyzer import AnalyzerEngine - -analyzer = AnalyzerEngine() -results = analyzer.analyze( - text="Contact john.doe@example.com or call 555-123-4567", - entities=["EMAIL_ADDRESS", "PHONE_NUMBER"], - language="en" -) -# Returns list of RecognizerResult with entity_type, start, end, score -``` - -### Scrubadub - -**Package:** `scrubadub` -**Repository:** https://github.com/datascopeanalytics/scrubadub - -**Pros:** - -- Lightweight, pure Python -- Simple API -- Good for basic patterns - -**Cons:** - -- Less actively maintained -- Fewer entity types -- Lower accuracy than Presidio - -### Regex-Only Approach - -For MVP/Phase 1, simple regex patterns could suffice: - -```python -import re - -EMAIL_PATTERN = re.compile( - r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' -) -PHONE_PATTERN = re.compile( - r'\b(?:\+?1[-.\s]?)?\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}\b' -) -``` - -**Pros:** Zero dependencies, fast, simple -**Cons:** Higher false positive/negative rates, harder to extend - -### Recommendation - -**Phase 1:** Start with regex patterns for email and phone (per story requirements) -**Phase 2:** Migrate to Presidio for broader PII coverage and better accuracy - ---- - -## 3. Patterns to Detect (Per SA-669) - -Story states: "Detection focuses on high-signal patterns first (email addresses and phone numbers)." - -### Phase 1 Patterns - -| Pattern | Regex | Examples | -|---------|-------|----------| -| Email | `[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}` | user@domain.com | -| Phone (US) | `(?:\+?1[-.\s]?)?\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}` | 555-123-4567, (555) 123-4567 | - -### Future Patterns (Phase 2+) - -- SSN: `\d{3}-\d{2}-\d{4}` -- Credit card: Luhn-validated 16-digit numbers -- IP addresses: `\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}` -- Names (ML-based via Presidio) - ---- - -## 4. Warning Flow Design - -### Requirements from SA-669 - -> "If potential PII is detected, the system warns the user but still allows the import to proceed." - -### Proposed Response Model - -Extend `ImportBulkResponse` to include PII warnings: - -```python -class PIIWarning(BaseModel): - item_id: str - field: str - pattern_type: str # "email", "phone", etc. - snippet: str # Masked snippet showing context - position: int # Character position in field - -class ImportBulkResponse(BaseModel): - imported: int - errors: list[str] - uuids: list[str] - pii_warnings: list[PIIWarning] = Field(default_factory=list) # NEW -``` - -### Flow Diagram - -``` -POST /v1/ground-truths - │ - ▼ - Generate IDs - │ - ▼ - Validate Tags (existing) - │ - ▼ -┌────────────────────────┐ -│ PII Detection (NEW) │ -│ - Scan text fields │ -│ - Collect warnings │ -│ - Continue import │ -└────────────────────────┘ - │ - ▼ - Apply Computed Tags - │ - ▼ - Persist to Cosmos DB - │ - ▼ - Return Response with: - - imported count - - errors - - uuids - - pii_warnings ◄── NEW -``` - ---- - -## 5. Recommended Integration Points - -### Option A: Extend `validation_service.py` (Recommended) - -Add PII scanning alongside tag validation: - -```python -# validation_service.py - -async def scan_for_pii(item: GroundTruthItem) -> list[PIIWarning]: - """Scan item content fields for PII patterns.""" - warnings = [] - fields_to_scan = [ - ("synthQuestion", item.synth_question), - ("editedQuestion", item.edited_question), - ("answer", item.answer), - ("comment", item.comment), - ] - - # Also scan history messages - for idx, turn in enumerate(item.history or []): - fields_to_scan.append((f"history[{idx}].msg", turn.msg)) - - for field_name, content in fields_to_scan: - if content: - warnings.extend(_detect_pii_in_text(item.id, field_name, content)) - - return warnings - -async def validate_bulk_items_with_pii( - items: list[GroundTruthItem] -) -> tuple[dict[str, list[str]], list[PIIWarning]]: - """Validate items and scan for PII.""" - validation_errors = await validate_bulk_items(items) - - # Scan for PII concurrently - pii_tasks = [scan_for_pii(item) for item in items] - pii_results = await asyncio.gather(*pii_tasks) - - all_warnings = [] - for warnings in pii_results: - all_warnings.extend(warnings) - - return validation_errors, all_warnings -``` - -### Option B: New `pii_service.py` - -Create a dedicated service (better separation of concerns): - -```python -# app/services/pii_service.py - -class PIIDetectionService: - def __init__(self): - self._email_pattern = re.compile(...) - self._phone_pattern = re.compile(...) - - def scan_text(self, text: str) -> list[PIIMatch]: - """Scan text for PII patterns.""" - ... - - async def scan_item(self, item: GroundTruthItem) -> list[PIIWarning]: - """Scan all text fields in a ground truth item.""" - ... - - async def scan_bulk(self, items: list[GroundTruthItem]) -> list[PIIWarning]: - """Scan multiple items concurrently.""" - ... -``` - -### Recommendation - -**Option B (new service)** is preferred because: - -1. Follows existing service patterns (see `tagging_service.py`, `search_service.py`) -2. Easier to test in isolation -3. Cleaner separation from tag validation concerns -4. Easier to evolve (e.g., swap regex for Presidio later) - ---- - -## 6. Implementation Checklist - -### Backend Changes - -- [ ] Create `app/services/pii_service.py` with regex-based detection -- [ ] Add `PIIWarning` model to `app/domain/models.py` -- [ ] Extend `ImportBulkResponse` with `pii_warnings` field -- [ ] Call PII service in `import_bulk()` endpoint -- [ ] Add unit tests for PII detection patterns -- [ ] Add integration tests for bulk import with PII warnings - -### Configuration - -- [ ] Add `PII_DETECTION_ENABLED` feature flag (default: True) -- [ ] Add `PII_PATTERNS` config for enabled pattern types - -### Documentation - -- [ ] Update API docs with new response field -- [ ] Document PII detection patterns and limitations - ---- - -## 7. Test Cases - -### Unit Tests - -```python -def test_detect_email_in_question(): - item = GroundTruthItem( - synthQuestion="Contact support@company.com for help" - ) - warnings = scan_for_pii(item) - assert len(warnings) == 1 - assert warnings[0].pattern_type == "email" - -def test_detect_phone_in_answer(): - item = GroundTruthItem( - answer="Call us at 555-123-4567" - ) - warnings = scan_for_pii(item) - assert len(warnings) == 1 - assert warnings[0].pattern_type == "phone" - -def test_no_pii_returns_empty(): - item = GroundTruthItem( - synthQuestion="How do I reset my password?" - ) - warnings = scan_for_pii(item) - assert len(warnings) == 0 -``` - -### Integration Tests - -```python -async def test_bulk_import_returns_pii_warnings(async_client): - payload = [{ - "datasetName": "test", - "synthQuestion": "Email john@example.com for details" - }] - response = await async_client.post("/v1/ground-truths", json=payload) - assert response.status_code == 200 - data = response.json() - assert data["imported"] == 1 # Import succeeds - assert len(data["pii_warnings"]) == 1 # Warning returned -``` - ---- - -## 8. References - -- **Story:** SA-669 - GTC Needs PII Check -- **Presidio Docs:** https://microsoft.github.io/presidio/ -- **Existing Validation:** [backend/app/services/validation_service.py](backend/app/services/validation_service.py) -- **Import Endpoint:** [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py) -- **Domain Models:** [backend/app/domain/models.py](backend/app/domain/models.py) diff --git a/.copilot-tracking/subagent/20260122/query-optimization-research.md b/.copilot-tracking/subagent/20260122/query-optimization-research.md deleted file mode 100644 index 41ed0d9..0000000 --- a/.copilot-tracking/subagent/20260122/query-optimization-research.md +++ /dev/null @@ -1,277 +0,0 @@ ---- -topic: query-optimization -jtbd: JTBD-008 -date: 2026-01-22 -status: complete -stories: SA-247, SA-248 ---- - -# Research: Query Optimization - -## Context - -The query optimization effort replaces expensive cross-partition queries with efficient patterns. This research identifies all Cosmos DB queries in the GTC codebase, analyzes their partition key usage, and provides recommendations for optimization. - -## Sources Consulted - -### Codebase - -- [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py): Main repository with all Cosmos DB queries -- [config.py](backend/app/core/config.py): Configuration including pagination limits and `PAGINATION_TAG_FETCH_MAX` -- [assignments.py](backend/app/api/v1/assignments.py): Assignment API endpoints -- [tags_repo.py](backend/app/adapters/repos/tags_repo.py): Tags repository (uses point reads) - -### Documentation - -- [Optimize request cost in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/optimize-cost-reads-writes): Point reads cost ~1 RU/KB, queries vary significantly -- [Query an Azure Cosmos DB container](https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-query-container): Cross-partition queries fan out to all physical partitions -- [Partitioning and horizontal scaling](https://learn.microsoft.com/en-us/azure/cosmos-db/partitioning-overview): Partition key selection best practices - -## Key Findings - -### 1. Partition Key Strategy - -**Current Strategy**: MultiHash hierarchical key on `[/datasetName, /bucket]` - -```python -# From cosmos_repo.py line 205 -Partition key strategy: MultiHash hierarchical key on [/datasetName, /bucket]. -The `bucket` field is a UUID and is stored as its string representation. -``` - -**Implications**: - -- Single-partition queries require BOTH `datasetName` AND `bucket` values -- Queries filtering only by `datasetName` are still cross-partition (across buckets) -- Queries without either filter scan ALL partitions - -### 2. The Arbitrary 200 Limit (SA-248) - -Found in multiple locations as `min(limit, 200)` or `min(take, 200)`: - -| Location | Line | Context | -|----------|------|---------| -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1405) | 1405 | `list_unassigned()`: `max_item_count=min(limit, 200)` | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1623) | 1623 | `_query_unassigned_by_selector()`: `max_item_count=min(take, 200)` | -| [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L1678) | 1678 | `_query_unassigned_global_excluding_user()`: `max_item_count=min(take, 200)` | - -**Issue**: This hardcoded 200 limit caps how many unassigned items can be fetched per query, but: - -- It's undocumented and appears arbitrary -- The comment doesn't explain why 200 was chosen -- May cause issues if more items are needed for fair sampling across datasets -- Creates inconsistency with `PAGINATION_TAG_FETCH_MAX` (500) in config - -### 3. Cross-Partition Queries Identified - -**`enable_scan_in_query=True` appears 16+ times**, indicating cross-partition queries: - -| Category | Count | Notes | -|----------|-------|-------| -| Total cross-partition queries | 16 | All use `enable_scan_in_query=True` | -| Single-partition operations | 2 | Assignment lookups use `enable_scan_in_query=False` | -| Point reads | 3 | `read_item()` calls with full partition key | - -## Expensive Query Inventory - -| Query Location | Query Pattern | Issue | Recommendation | -|----------------|---------------|-------|----------------| -| [cosmos_repo.py:525](backend/app/adapters/repos/cosmos_repo.py#L525) | `list_all_gt()` - `SELECT * FROM c` with optional status filter | Full container scan, no partition key filter | Add pagination, consider batch processing | -| [cosmos_repo.py:759](backend/app/adapters/repos/cosmos_repo.py#L759) | `list_gt_paginated()` - ORDER BY with OFFSET/LIMIT | Cross-partition with sort | Already optimized with server-side pagination | -| [cosmos_repo.py:1350](backend/app/adapters/repos/cosmos_repo.py#L1350) | `stats()` - `SELECT c.status FROM c` | Full container scan for counts | Use Change Feed or materialized view | -| [cosmos_repo.py:1375](backend/app/adapters/repos/cosmos_repo.py#L1375) | `list_datasets()` - `SELECT DISTINCT VALUE c.datasetName` | Full container scan | Cache results, use Change Feed | -| [cosmos_repo.py:1404](backend/app/adapters/repos/cosmos_repo.py#L1404) | `list_unassigned()` - status filter only | Cross-partition, capped at 200 | Could use composite index | -| [cosmos_repo.py:1742](backend/app/adapters/repos/cosmos_repo.py#L1742) | `assign_to()` - `SELECT TOP 1 ... WHERE c.id = @id` | Cross-partition lookup by ID only | **Should use point read if PK known** | -| [cosmos_repo.py:1818](backend/app/adapters/repos/cosmos_repo.py#L1818) | `_assign_to_with_read_modify_replace()` - `SELECT TOP 1 * FROM c WHERE c.id = @id` | Cross-partition for emulator | Inherent emulator limitation | -| [cosmos_repo.py:1896](backend/app/adapters/repos/cosmos_repo.py#L1896) | `list_assigned()` - filter by `assignedTo` | Cross-partition by user | Consider separate index or container | -| [cosmos_repo.py:_get_filtered_count](backend/app/adapters/repos/cosmos_repo.py#L944) | `SELECT VALUE COUNT(1)` | Cross-partition aggregation | Cache or use Change Feed | - -## Point Read Opportunities - -Per Microsoft documentation, point reads cost ~1 RU per KB vs queries which can cost 3-10+ RU: - -| Current Pattern | Location | Optimization | -|-----------------|----------|--------------| -| Query by ID for assignment | Line 1742 | If `datasetName` and `bucket` are available, use `read_item()` | -| Get item after upsert | Multiple | Already uses `get_gt()` with point read ✓ | - -**Already optimized**: - -- `get_gt()` (line 1058) - Uses `read_item()` with full partition key -- `get_curation_instructions()` (line 1086) - Uses `read_item()` -- Tags repo (line 121) - Uses `read_item()` - -## Arbitrary Limit Analysis (SA-248) - -### Current Behavior - -The 200 limit appears in three methods related to unassigned item sampling: - -```python -# cosmos_repo.py line 1405 -max_item_count=min(limit, 200) - -# cosmos_repo.py line 1623 -max_item_count=min(take, 200) - -# cosmos_repo.py line 1678 -max_item_count=min(take, 200) -``` - -### Implications - -1. **Fairness**: When sampling across datasets with different sizes, the 200 cap may prevent fair distribution -2. **Performance**: The limit exists to prevent runaway queries but lacks documentation -3. **Inconsistency**: Config has `PAGINATION_TAG_FETCH_MAX=500` but these use hardcoded 200 -4. **No server-side continuation**: If more items are needed, the code breaks out of the loop rather than using continuation tokens - -### Recommendation - -1. Make the limit configurable via `Settings` (e.g., `SAMPLING_QUERY_MAX_ITEMS`) -2. Document the rationale (RU budget, memory constraints, etc.) -3. Consider using continuation tokens for larger sampling needs -4. Align with `PAGINATION_TAG_FETCH_MAX` or document why they differ - -## Recommendations for Spec - -### High Priority - -1. **Replace ID-only queries with point reads** when partition key is available - - `assign_to()` queries by ID then patches; if caller provides dataset/bucket, use point read - - Estimated savings: ~2-5 RU per operation - -2. **Make the 200 limit configurable** - - Add `SAMPLING_QUERY_MAX_ITEMS` to config - - Document the tradeoff between RU cost and sampling fairness - -3. **Add composite indexes** for common query patterns: - - `(status, assignedTo)` for unassigned queries - - `(datasetName, status)` for dataset-scoped queries - -### Medium Priority - -4. **Cache `stats()` results** using Change Feed or time-based invalidation - - Currently scans entire container for 3 counts - - Could use materialized counters updated via Change Feed - -5. **Cache `list_datasets()` results** - - Dataset list changes infrequently - - Use TTL-based cache or invalidate on import - -6. **Use continuation tokens** in sampling methods instead of hard caps - - More robust for larger datasets - - Better RU efficiency with pagination - -### Low Priority - -7. **Consider secondary container** for assignment tracking - - Current cross-partition `list_assigned()` could be single-partition with PK=`userId` - - Already have `assignments` container but it duplicates data - -8. **Monitor RU consumption** per query type - - Add diagnostics logging for RU charges - - Identify optimization candidates based on actual usage - -## Query Efficiency Summary - -| Query Type | Count | Partition Efficiency | Action Needed | -|------------|-------|---------------------|---------------| -| Point reads | 3 | ✅ Single partition | None | -| Single-partition queries | 2 | ✅ Single partition | None | -| Cross-partition with filter | 10 | ⚠️ Partial | Add indexes | -| Full container scans | 4 | ❌ All partitions | Cache or redesign | - -## RU Monitoring Status - -### Current State - -**No RU monitoring implemented.** The codebase does not capture or log Request Unit (RU) consumption from Cosmos DB queries. - -The observability implementation ([OBSERVABILITY_IMPLEMENTATION.md](backend/docs/OBSERVABILITY_IMPLEMENTATION.md)) uses OpenTelemetry with Azure Monitor but does not include Cosmos DB RU metrics. - -### Recommendation - -Add RU logging for expensive operations: - -```python -async def _execute_query_with_metrics( - self, - query: str, - parameters: list, - operation_name: str -) -> tuple[list, float]: - """Execute query and log RU consumption.""" - items = [] - total_ru = 0.0 - - iterator = self._gt_container.query_items( - query=query, - parameters=parameters, - enable_scan_in_query=True, - ) - - async for item in iterator: - items.append(item) - - # Get RU charge from response headers - total_ru = getattr(iterator, '_last_response_headers', {}).get( - 'x-ms-request-charge', 0 - ) - - self._logger.info( - "cosmos.query.metrics", - extra={ - "operation": operation_name, - "ru_charge": total_ru, - "item_count": len(items), - } - ) - - return items, total_ru -``` - -## Indexing Policy Analysis - -The current indexing policy ([indexing-policy.json](backend/scripts/indexing-policy.json)) includes composite indexes for common sort patterns but lacks optimization for assignment queries: - -**Current composite indexes**: -- `reviewedAt` + `id` (both directions) -- `updatedAt` + `id` -- `status` + `reviewedAt` + `id` -- `totalReferences` + `id` (both directions) -- `status` + `totalReferences` + `id` - -**Recommended additions**: -```json -[ - {"path": "/status", "order": "ascending"}, - {"path": "/assignedTo", "order": "ascending"} -] -``` - -This would optimize the `list_unassigned()` and `list_assigned()` queries that filter by status and assignedTo. - -## Implementation Priorities - -### Phase 1 (SA-248 - Immediate) -1. Remove `min(limit, 200)` cap from sampling methods -2. Add configurable `SAMPLING_QUERY_MAX_ITEMS` setting -3. Use continuation tokens for proper pagination - -### Phase 2 (SA-247 - Short-term) -1. Add RU logging for expensive queries -2. Cache `stats()` and `list_datasets()` results -3. Add composite index for `(status, assignedTo)` - -### Phase 3 (Future) -1. Consider global secondary index for status-only queries -2. Evaluate Change Feed for materialized views -3. Implement automatic query analysis/alerting - -## References - -- [Azure Cosmos DB Query Optimization](https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-query-container#avoid-cross-partition-queries) -- [Partition Key Design Best Practices](https://learn.microsoft.com/en-us/azure/cosmos-db/partitioning-overview) -- [Request Units in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/request-units) -- [Composite Indexes](https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#composite-indexes) -- Codebase: [cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py), [config.py](backend/app/core/config.py), [indexing-policy.json](backend/scripts/indexing-policy.json) diff --git a/.copilot-tracking/subagent/20260122/reference-identity-research.md b/.copilot-tracking/subagent/20260122/reference-identity-research.md deleted file mode 100644 index ac3eab8..0000000 --- a/.copilot-tracking/subagent/20260122/reference-identity-research.md +++ /dev/null @@ -1,172 +0,0 @@ -# Reference Identity Research - -**Date:** 2026-01-22 -**Topic:** Reference identity system using chunk ID from search index as primary uniqueness key - -## Executive Summary - -The current reference system uses **URL as the primary de-duplication key** in the frontend, with a secondary `id` field that is assigned at display time (sequential `ref_0`, `ref_1`, etc.) rather than from the search index. The search index **does provide a chunk ID** (via `chunk_id` field) from the inference adapter, but it is only partially propagated through the system. - -## Findings - -### 1. Current Reference Data Model - -#### Backend Model ([backend/app/domain/models.py](../../../backend/app/domain/models.py#L13-L35)) - -```python -class Reference(BaseModel): - url: str = Field(description="Reference URL (required, non-empty)") - title: str | None = Field(default=None) - content: str | None = None - keyExcerpt: str | None = None - type: str | None = None - bonus: bool = False - messageIndex: Optional[int] = None -``` - -**Key observation:** The backend `Reference` model has **no `id` field**. URL is the only required identifier. - -#### Frontend Model ([frontend/src/models/groundTruth.ts](../../../frontend/src/models/groundTruth.ts#L16-L27)) - -```typescript -export type Reference = { - id: string; // Required in frontend - title?: string; - url: string; // Required - snippet?: string; - visitedAt?: string | null; - keyParagraph?: string; - bonus?: boolean; - messageIndex?: number; -}; -``` - -**Key observation:** Frontend requires an `id` field, but this is **generated locally** and not persisted. - -### 2. URL-Based De-duplication Location - -#### Primary De-duplication: [frontend/src/models/gtHelpers.ts](../../../frontend/src/models/gtHelpers.ts#L33-L46) - -```typescript -export function dedupeReferences( - existing: Reference[], - chosen: Reference[], -): Reference[] { - const makeKey = (r: Reference) => - r.messageIndex !== undefined ? `${r.url}::turn${r.messageIndex}` : r.url; - - const map = new Map(existing.map((r) => [makeKey(r), r] as const)); - for (const r of chosen) { - const key = makeKey(r); - if (!map.has(key)) { - map.set(key, r); - } - } - return Array.from(map.values()); -} -``` - -**De-duplication key:** `URL` (or `URL::turnN` for multi-turn contexts) - -#### TurnReferencesModal duplicate check: [frontend/src/components/app/editor/TurnReferencesModal.tsx](../../../frontend/src/components/app/editor/TurnReferencesModal.tsx#L85) - -```typescript -const urlsInTurn = new Set(turnRefs.map((r) => normalizeUrl(r.url))); -``` - -### 3. Backend Storage/Persistence - -References are stored as part of `GroundTruthItem` documents in Cosmos DB: - -- **Top-level refs:** `GroundTruthItem.refs: list[Reference]` -- **Turn-level refs:** `GroundTruthItem.history[].refs: list[Reference]` - -The backend persists all reference fields **except** the frontend-only `id`. The Reference model validates that URL cannot be empty ([backend/app/domain/models.py](../../../backend/app/domain/models.py#L31-L34)). - -### 4. Search Index Fields - Chunk ID Availability - -#### Chat/Inference Adapter: [backend/app/adapters/gtc_inference_adapter.py](../../../backend/app/adapters/gtc_inference_adapter.py#L103-L128) - -```python -def _extract_references(self, calls: list[dict[str, Any]]) -> list[dict[str, Any]]: - for call in calls: - results = call.get("results", []) - for doc in results: - ref = { - "id": doc.get("chunk_id") or doc.get("id"), # ✅ chunk_id IS available - "title": doc.get("title"), - "url": doc.get("url"), - "snippet": doc.get("content"), - } - references.append(ref) -``` - -**The chunk ID is extracted from search results** as `chunk_id` (preferred) or `id` (fallback). - -#### Azure AI Search Tool Processing: [backend/app/adapters/inference/inference.py](../../../backend/app/adapters/inference/inference.py#L419) - -```python -call["results"].append({"title": titles[i], "url": urls[i], "chunk_id": ids[i]}) -``` - -**The search index provides:** `titles[]`, `urls[]`, `ids[]` (chunk IDs) in the metadata. - -#### Frontend Search Service: [frontend/src/services/search.ts](../../../frontend/src/services/search.ts#L24-L53) - -```typescript -function mapWireToReference(x: SearchResultWire): Reference | null { - // ... - let id: string = randId("ref"); // Default: random ID - if (typeof o.id === "string" && o.id) id = o.id; // Use provided ID if available - else if (doc && typeof doc.id === "string") id = doc.id as string; - return { id, title, url, snippet, visitedAt: null, keyParagraph: "" }; -} -``` - -**Current behavior:** Uses the ID from search results if available, but falls back to random ID. - -### 5. Downstream Systems Affected by Identity Key Change - -| System | Current Usage | Impact of Change | -|--------|---------------|------------------| -| **De-duplication** ([gtHelpers.ts](../../../frontend/src/models/gtHelpers.ts)) | Uses URL | Must switch to chunk ID | -| **Reference Updates** ([useReferencesEditor.ts](../../../frontend/src/hooks/useReferencesEditor.ts)) | Uses `ref.id` for patch targeting | Would use chunk ID instead | -| **Export Pipeline** ([backend/app/exports/pipeline.py](../../../backend/app/exports/pipeline.py)) | Outputs refs with URL as key field | May need to include chunk ID | -| **API Ground Truth Mapping** ([groundTruths.ts](../../../frontend/src/services/groundTruths.ts#L63-L100)) | Generates sequential `ref_N` IDs | Would need to preserve chunk ID from storage | -| **Turn References Modal** ([TurnReferencesModal.tsx](../../../frontend/src/components/app/editor/TurnReferencesModal.tsx)) | Checks URL for duplicates | Would check chunk ID | -| **SelectedTab** ([SelectedTab.tsx](../../../frontend/src/components/app/ReferencesPanel/SelectedTab.tsx)) | Displays and manages by `ref.id` | Unchanged (uses existing id field) | - -### 6. Gap Analysis - -| Layer | Current State | Required for Chunk ID Identity | -|-------|--------------|-------------------------------| -| **Search Index** | ✅ Provides `chunk_id` | No change needed | -| **Inference Adapter** | ✅ Extracts `chunk_id` as `id` | No change needed | -| **Backend Reference Model** | ❌ No `id` field | Add optional `id` field | -| **Frontend Search Service** | ⚠️ Uses `id` if present, fallback to random | Ensure consistent propagation | -| **API Mapping** | ❌ Generates sequential IDs | Preserve chunk ID from storage | -| **De-duplication** | ❌ Uses URL | Switch to chunk ID | -| **Backend Persistence** | ❌ Doesn't store `id` | Store chunk ID in Reference | - -## Recommendations - -1. **Add `id` field to backend Reference model** (optional, string) -2. **Persist chunk ID** when saving references from chat/search -3. **Update de-duplication logic** to use `id` (chunk ID) instead of URL -4. **Update API mapping** to preserve stored chunk ID instead of generating sequential IDs -5. **Maintain URL as fallback** for legacy data without chunk IDs - -## Files Referenced - -- [backend/app/domain/models.py](../../../backend/app/domain/models.py) - Backend Reference model -- [frontend/src/models/groundTruth.ts](../../../frontend/src/models/groundTruth.ts) - Frontend Reference type -- [frontend/src/models/gtHelpers.ts](../../../frontend/src/models/gtHelpers.ts) - De-duplication logic -- [frontend/src/services/search.ts](../../../frontend/src/services/search.ts) - Search result mapping -- [frontend/src/services/groundTruths.ts](../../../frontend/src/services/groundTruths.ts) - API-to-frontend mapping -- [frontend/src/hooks/useReferencesEditor.ts](../../../frontend/src/hooks/useReferencesEditor.ts) - Reference editing hook -- [frontend/src/components/app/editor/TurnReferencesModal.tsx](../../../frontend/src/components/app/editor/TurnReferencesModal.tsx) - Turn references UI -- [frontend/src/components/app/ReferencesPanel/SelectedTab.tsx](../../../frontend/src/components/app/ReferencesPanel/SelectedTab.tsx) - Selected references UI -- [backend/app/adapters/gtc_inference_adapter.py](../../../backend/app/adapters/gtc_inference_adapter.py) - Inference adapter -- [backend/app/adapters/inference/inference.py](../../../backend/app/adapters/inference/inference.py) - Azure AI Search processing -- [backend/app/exports/pipeline.py](../../../backend/app/exports/pipeline.py) - Export pipeline -- [specs/reference-management.md](../../../specs/reference-management.md) - Reference management spec diff --git a/.copilot-tracking/subagent/20260122/reference-management-research.md b/.copilot-tracking/subagent/20260122/reference-management-research.md deleted file mode 100644 index ee3cc9a..0000000 --- a/.copilot-tracking/subagent/20260122/reference-management-research.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -topic: reference-management -jtbd: JTBD-001 -date: 2026-01-22 -status: complete ---- - -# Research: Reference Management - -## Context - -The reference management system supports adding, visiting, annotating, and removing supporting references that back ground-truth items. - -## Sources Consulted - -### URLs -- (None) - -### Codebase -- [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts): Maps top-level and per-history references into a unified reference list with `id`, `title`, `url`, `snippet`, `keyParagraph`, `visitedAt`, `bonus`, and `messageIndex`. -- [frontend/CODEBASE.md](frontend/CODEBASE.md): Documents reference workflow behaviors including search, URL de-duplication, visited tracking, and key-paragraph editing. - -### Documentation -- [.copilot-tracking/research/20260121-high-level-requirements-research.md](.copilot-tracking/research/20260121-high-level-requirements-research.md): Consolidates reference-related requirements and notes documentation gaps. -- [frontend/src/components/app/defaultCurateInstructions.md](frontend/src/components/app/defaultCurateInstructions.md): Contains user-facing curation instructions including key paragraph constraints. - -## Key Findings - -1. References include a `keyParagraph` field with a minimum length constraint (≥40 characters) for approval eligibility. -2. The UI tracks whether a reference has been visited (opened in a new tab) and uses this for approval gating. -3. URL de-duplication is performed in the UI to prevent duplicate references. -4. The frontend model unifies top-level `refs` and per-history `refs` into one reference list. -5. References can be marked as "bonus" and can be associated with specific conversation turns via `messageIndex`. - -## Existing Patterns - -| Pattern | Location | Relevance | -|---------|----------|-----------| -| Reference mapping and normalization | [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts) | Defines current shape of reference objects in UI | -| Approval gating on reference completeness | [frontend/CODEBASE.md](frontend/CODEBASE.md) | Defines behavioral constraints for saving/approving | - -## Open Questions - -- (None) - -## Recommendations for Spec - -- Specify the reference data shape (id, title, url, snippet, keyParagraph, visitedAt, bonus, messageIndex). -- Specify the approval gating rules: at least one selected reference, all visited, keyParagraph ≥40 chars. -- Specify URL de-duplication as a UI behavior. diff --git a/.copilot-tracking/subagent/20260122/tag-filtering-research.md b/.copilot-tracking/subagent/20260122/tag-filtering-research.md deleted file mode 100644 index 42f4d62..0000000 --- a/.copilot-tracking/subagent/20260122/tag-filtering-research.md +++ /dev/null @@ -1,240 +0,0 @@ -# Research: Tag Filtering System - -**Topic:** tag-filtering -**Date:** 2026-01-22 -**Status:** Complete - -## Summary - -The tag filtering system in Ground Truth Curator allows users to filter items by tags in the Explorer view. Currently, the system supports **include-only** filtering with AND logic. A planned enhancement (SA-363) will add tri-state selection (include/exclude/neutral) and boolean logic for advanced filtering. - -## Key Findings - -### 1. Current Explorer Tag Filter UI - -**Location:** [frontend/src/components/app/QuestionsExplorer.tsx](frontend/src/components/app/QuestionsExplorer.tsx) - -The Explorer component maintains tag filter state: - -```typescript -// Filter state (unapplied) -const [selectedTags, setSelectedTags] = useState([]); - -// Applied filter state -const [appliedFilter, setAppliedFilter] = useState({ - status: "all", - dataset: "all", - tags: [], - // ... -}); -``` - -**Current UI behavior:** -- Tags are displayed in a collapsible section "Filter by Tags" -- Manual tags and computed tags are shown separately (manual in violet, computed in slate with lock icon) -- Clicking a tag toggles it between selected (include) and unselected (neutral) -- Selected tags show a badge count and "Clear all" option -- Multiple selected tags use **AND logic** ("items must have ALL selected tags") -- Tags fetched via `fetchTagsWithComputed()` which returns `{ manualTags: string[], computedTags: string[] }` - -### 2. Tag State Management - -**Tag toggle function:** -```typescript -const handleTagToggle = (tag: string) => { - setSelectedTags((prev) => - prev.includes(tag) ? prev.filter((t) => t !== tag) : [...prev, tag], - ); -}; -``` - -**Current limitation:** Binary state only (selected vs unselected) - no exclusion state. - -### 3. Tag-Related Fields on Ground Truth Items - -**Location:** [backend/app/domain/models.py](backend/app/domain/models.py#L76-L86) - -```python -class GroundTruthItem(BaseModel): - # Tag fields: manualTags are user-provided, computedTags are system-generated - manual_tags: list[str] = Field(default_factory=list, alias="manualTags") - computed_tags: list[str] = Field(default_factory=list, alias="computedTags") - - @computed_field - @property - def tags(self) -> list[str]: - """Return a merged, sorted view of manual and computed tags.""" - merged = set(self.manual_tags or []) | set(self.computed_tags or []) - return sorted(merged) -``` - -**Key points:** -- `manualTags`: User-applied tags (editable) -- `computedTags`: System-generated tags from plugins (read-only) -- `tags`: Computed property merging both (for backward compatibility) - -### 4. Backend API Tag Filtering - -**Location:** [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L162-L230) - -The `list_all_ground_truths` endpoint accepts tags as a comma-separated string: - -```python -@router.get("", response_model=GroundTruthListResponse) -async def list_all_ground_truths( - tags: str | None = Query(default=None), - # ... -): - # Tag validation - MAX_TAGS_PER_QUERY = 10 - MAX_TAG_LENGTH = 100 - - if tags is not None: - raw_tags = [tag.strip() for tag in tags.split(",")] - cleaned = [tag for tag in raw_tags if tag] - # Validation checks... - tag_list = cleaned if cleaned else None -``` - -**Frontend sends tags:** -```typescript -// In groundTruths.ts -if (params.tags?.length) query.tags = params.tags.join(","); -``` - -### 5. Cosmos DB Query for Tag Filtering - -**Location:** [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py#L562-L571) - -```python -def _build_query_filter(self, ..., tags: list[str] | None, ...): - if include_tags and tags: - normalized = [tag for tag in (tag.strip() for tag in tags) if tag] - for idx, tag in enumerate(normalized): - pname = f"@tag{idx}" - # Search across manualTags and computedTags - clauses.append( - f"(ARRAY_CONTAINS(c.manualTags, {pname}) OR " - f"ARRAY_CONTAINS(c.computedTags, {pname}))" - ) - params.append({"name": pname, "value": tag}) -``` - -**Current query pattern:** -- Each tag becomes an AND clause -- Searches both `manualTags` and `computedTags` arrays -- Uses `ARRAY_CONTAINS()` function (not supported in Cosmos Emulator) - -### 6. Emulator Limitation - -**Location:** [backend/docs/cosmos-emulator-limitations.md](backend/docs/cosmos-emulator-limitations.md) - -> `ARRAY_CONTAINS SQL Function Not Supported` - Tag filtering tests must be skipped on emulator and run against real Cosmos DB. - -## Current Filter Capabilities - -| Capability | Status | Notes | -|------------|--------|-------| -| Include tags (AND) | ✅ Supported | Items must have ALL selected tags | -| Exclude tags (NOT) | ❌ Not supported | Planned in SA-363 | -| OR logic | ❌ Not supported | Planned in SA-363 | -| Boolean expressions | ❌ Not supported | Planned in SA-363 | -| Manual tags | ✅ Supported | Violet styling in UI | -| Computed tags | ✅ Supported | Slate styling with lock icon | - -## Patterns Supporting Tri-State Selection (SA-363) - -### Frontend Changes Needed - -1. **State structure change:** -```typescript -// Current: string[] (selected tags) -// Proposed: Map or similar -interface TagFilterState { - include: string[]; - exclude: string[]; -} -``` - -2. **UI toggle pattern:** -- Click 1: Neutral → Include (checkmark) -- Click 2: Include → Exclude (X indicator) -- Click 3: Exclude → Neutral (cleared) - -3. **Query parameter format:** -```typescript -// Option A: Separate params -tags=tag1,tag2&excludeTags=tag3,tag4 - -// Option B: Prefixed syntax -tags=+tag1,+tag2,-tag3,-tag4 -``` - -### Backend Changes Needed - -1. **API parameter changes:** -```python -@router.get("") -async def list_all_ground_truths( - tags: str | None = Query(default=None), # Include tags - exclude_tags: str | None = Query(default=None, alias="excludeTags"), # New -): -``` - -2. **Cosmos query for exclusion:** -```python -# NOT ARRAY_CONTAINS pattern -for idx, tag in enumerate(excluded_tags): - pname = f"@excludeTag{idx}" - clauses.append( - f"NOT (ARRAY_CONTAINS(c.manualTags, {pname}) OR " - f"ARRAY_CONTAINS(c.computedTags, {pname}))" - ) -``` - -### Advanced Boolean Logic (SA-363) - -The PRD specifies support for: -``` -has frequency:common AND NOT(has difficulty:easy) -``` - -This would require: -1. A query DSL parser on the backend -2. Translation to Cosmos SQL WHERE clauses -3. Frontend text input with validation - -## Recommendations for Implementation - -1. **Phase 1: Tri-state UI** - - Update `selectedTags` to `tagFilters: Map` - - Add visual indicators for include/exclude states - - Implement three-click toggle pattern - -2. **Phase 2: Backend exclude support** - - Add `excludeTags` query parameter - - Update `_build_query_filter()` with NOT clauses - - Add integration tests (requires real Cosmos DB) - -3. **Phase 3: Boolean query input (optional)** - - Add text input for advanced queries - - Implement parser with AND/OR/NOT/parentheses - - Add validation and error display - -## Related Files - -| File | Purpose | -|------|---------| -| [frontend/src/components/app/QuestionsExplorer.tsx](frontend/src/components/app/QuestionsExplorer.tsx) | Explorer UI with tag filter | -| [frontend/src/services/tags.ts](frontend/src/services/tags.ts) | Tag fetching and validation | -| [frontend/src/services/groundTruths.ts](frontend/src/services/groundTruths.ts) | API calls with tag params | -| [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py) | List endpoint with tag filtering | -| [backend/app/adapters/repos/cosmos_repo.py](backend/app/adapters/repos/cosmos_repo.py) | Cosmos queries with ARRAY_CONTAINS | -| [backend/app/domain/models.py](backend/app/domain/models.py) | GroundTruthItem with tag fields | -| [prd-refined-2.json](prd-refined-2.json) | SA-363 requirements for tri-state | - -## Open Questions - -1. Should the URL encoding for tag filters use separate params or a prefix syntax? -2. How to handle emulator limitations for exclude queries in development? -3. Should the boolean query input be a separate mode or integrated with chip selection? diff --git a/.copilot-tracking/subagent/20260122/tag-glossary-research.md b/.copilot-tracking/subagent/20260122/tag-glossary-research.md deleted file mode 100644 index 495ec3e..0000000 --- a/.copilot-tracking/subagent/20260122/tag-glossary-research.md +++ /dev/null @@ -1,209 +0,0 @@ -# Tag Glossary Research - -**Date:** 2026-01-22 -**Topic:** tag-glossary - -## Summary - -The system has a well-designed tag architecture with clear separation between **manual tags** (user-editable) and **computed tags** (auto-generated). However, there is **no existing infrastructure for tag definitions, descriptions, or a glossary**. Tags are stored as simple strings without metadata. - ---- - -## 1. Current Tag System Overview - -### Manual Tags - -Manual tags are user-editable and stored in the `manualTags` field on ground truth items. Default manual tags are configured via JSON file. - -**Location:** [backend/app/domain/manual_tags.json](backend/app/domain/manual_tags.json) - -**Current Manual Tag Groups:** - -| Group | Exclusive | Values | -|-------|-----------|--------| -| source | Yes | sme, sa, synthetic, sme_curated, user, other | -| answerability | Yes | answerable, not_answerable, should_not_answer | -| topic | No | general, compatibility, install, license, performance, security | -| intent | No | informational, action, feedback, clarification, other | -| expertise | Yes | expert, novice | -| difficulty | Yes | easy, medium, hard | - -### Computed Tags - -Computed tags are auto-generated by plugins and stored in `computedTags` field. They are read-only in the UI. - -**Plugin Location:** [backend/app/plugins/computed_tags/](backend/app/plugins/computed_tags/) - -**Current Computed Tag Plugins:** - -| Plugin | Tag Key | Description (from code comments) | -|--------|---------|----------------------------------| -| DatasetPlugin | `dataset:_dynamic` | Tags documents with their dataset name | -| QuestionLengthShortPlugin | `question_length:short` | Questions with ≤10 words | -| QuestionLengthMediumPlugin | `question_length:medium` | Questions with 11-30 words | -| QuestionLengthLongPlugin | `question_length:long` | Questions with >30 words | -| SingleTurnPlugin | `turns:singleturn` | Documents with no/minimal history | -| MultiTurnPlugin | `turns:multiturn` | Documents with >2 history turns | -| NoAnswerPlugin | `answer:no_answer` | Ground truth answer is "NO_ANSWER" | -| RetrievalBehaviorNoRefsPlugin | `retrieval_behavior:no_refs` | Zero references | -| RetrievalBehaviorSinglePlugin | `retrieval_behavior:single` | Exactly one reference | -| RetrievalBehaviorTwoRefsPlugin | `retrieval_behavior:two_refs` | Exactly two references | -| RetrievalBehaviorRichPlugin | `retrieval_behavior:rich` | Three or more references | -| ReferenceTypeArticlePlugin | `reference_type:article` | Contains CS# pattern URL | -| ReferenceTypeHelpcenterPlugin | `reference_type:helpcenter` | Contains /help URL | - ---- - -## 2. Tag Definition Storage - -### Current State - -- **Manual Tags:** Stored in JSON config ([manual_tags.json](backend/app/domain/manual_tags.json)) with only `group`, `tags`, and `mutuallyExclusive` fields -- **Computed Tags:** Defined in Python plugin classes with descriptions only in docstrings -- **No description/definition field** exists in any tag model -- **No glossary endpoint** or UI exists - -### Key Files - -| Purpose | Location | -|---------|----------| -| Manual tag config | [backend/app/domain/manual_tags.json](backend/app/domain/manual_tags.json) | -| Manual tag provider | [backend/app/domain/manual_tags_provider.py](backend/app/domain/manual_tags_provider.py) | -| Tag schema & rules | [backend/app/domain/tags.py](backend/app/domain/tags.py) | -| Tag API endpoints | [backend/app/api/v1/tags.py](backend/app/api/v1/tags.py) | -| Computed tag base | [backend/app/plugins/base.py](backend/app/plugins/base.py) | -| Plugin registry | [backend/app/plugins/registry.py](backend/app/plugins/registry.py) | - ---- - -## 3. UI Tag Display - -### Components - -| Component | Location | Purpose | -|-----------|----------|---------| -| TagChip | [frontend/src/components/common/TagChip.tsx](frontend/src/components/common/TagChip.tsx) | Display individual tag with computed vs manual styling | -| TagsEditor | [frontend/src/components/app/editor/TagsEditor.tsx](frontend/src/components/app/editor/TagsEditor.tsx) | Add/remove manual tags, display computed tags (read-only) | -| InspectItemModal | [frontend/src/components/modals/InspectItemModal.tsx](frontend/src/components/modals/InspectItemModal.tsx) | Shows tags in item inspection view | - -### Tag Service - -[frontend/src/services/tags.ts](frontend/src/services/tags.ts) provides: -- `fetchTagSchema()` - Get tag groups with exclusive rules -- `fetchTagsWithComputed()` - Get manual and computed tags separately -- `validateExclusiveTags()` - Validate exclusive group rules -- `addTags()` - Add new manual tags to global registry - -### Current UI Behavior - -1. **Computed tags:** Displayed with lock icon and slate color scheme; read-only -2. **Manual tags:** Displayed with violet color scheme; removable with X button -3. **TagsEditor:** Shows "Auto-generated" label for computed tags section -4. **No tooltips or definitions** are displayed for any tags - ---- - -## 4. API Endpoints - -| Endpoint | Purpose | -|----------|---------| -| `GET /v1/tags/schema` | Returns tag groups with values and exclusive rules | -| `GET /v1/tags` | Returns `{ tags: [...], computedTags: [...] }` | -| `POST /v1/tags` | Add tags to global registry | -| `DELETE /v1/tags` | Remove tags from global registry | - -### Schema Response Shape - -```typescript -interface TagSchemaResponse { - version: string; // "v1" - groups: Array<{ - name: string; - values: string[]; - exclusive: boolean; - depends_on: Array<{ group: string; value: string }>; - }>; -} -``` - -**Note:** No `description` field exists in the schema response. - ---- - -## 5. Gaps for Glossary Feature - -### Missing Infrastructure - -| Gap | Description | Impact | -|-----|-------------|--------| -| No description field in manual tag config | JSON only has group/tags/exclusive | Cannot store manual tag definitions | -| No metadata in ComputedTagPlugin | Only `tag_key` and `compute()` | Computed tag descriptions only in docstrings | -| No glossary API endpoint | No way to fetch all tag definitions | Frontend cannot display definitions | -| No UI for viewing definitions | Tags displayed without context | Users don't know what tags mean | -| No UI for managing definitions | No admin interface | Definitions cannot be edited | - -### Required Changes - -#### Backend - -1. **Extend manual tag JSON schema:** - ```json - { - "group": "source", - "description": "Where the ground truth originated", - "tags": [ - { "value": "sme", "description": "Created by subject matter expert" }, - { "value": "synthetic", "description": "AI-generated content" } - ] - } - ``` - -2. **Add metadata to ComputedTagPlugin:** - ```python - class ComputedTagPlugin(ABC): - @property - @abstractmethod - def tag_key(self) -> str: ... - - @property - @abstractmethod - def description(self) -> str: - """Human-readable description for glossary.""" - ... - ``` - -3. **Create glossary API endpoint:** - - `GET /v1/tags/glossary` returning all tags with definitions - - Merge manual tag definitions with computed tag descriptions - -#### Frontend - -1. **TagChip enhancement:** Add tooltip with tag definition on hover -2. **Glossary component:** Full-page or modal view of all tag definitions -3. **TagsEditor enhancement:** Show definition when selecting tags -4. **Admin UI (optional):** Allow editing manual tag definitions - ---- - -## 6. Design Recommendations - -### Minimal Viable Glossary - -1. Add `description` field to manual tag JSON (backward compatible) -2. Add `description` property to `ComputedTagPlugin` base class -3. Create `GET /v1/tags/glossary` endpoint merging both sources -4. Add tooltips to `TagChip` component showing definitions - -### Full Glossary Feature - -1. All above plus: -2. Dedicated glossary page/modal in UI -3. Admin interface for managing definitions -4. Consider storing definitions in database for runtime updates - ---- - -## References - -- [Computed Tags Design](docs/computed-tags-design.md) - Full architecture -- [Manual Tags Design](docs/manual-tags-design.md) - Provider pattern and validation diff --git a/.copilot-tracking/subagent/20260122/validation-error-clarity-research.md b/.copilot-tracking/subagent/20260122/validation-error-clarity-research.md deleted file mode 100644 index f6e5b65..0000000 --- a/.copilot-tracking/subagent/20260122/validation-error-clarity-research.md +++ /dev/null @@ -1,257 +0,0 @@ -# Validation Error Clarity Research - -**Date:** 2026-01-22 -**Jira Reference:** SA-334 "Key Paragraph too large for generation error is not clear to the user" - ---- - -## Executive Summary - -The validation error clarity system has **significant gaps**. The 2000-character limit for key paragraphs is enforced **only in the frontend UI** (character counter display) but **not in the backend validation**. When errors occur, the frontend displays generic messages because `mapApiErrorToMessage()` extracts only the `detail` or `message` field from API errors without semantic mapping to user-friendly guidance. - ---- - -## Research Questions & Findings - -### 1. What is the key paragraph validation in the backend (2000 char limit)? - -**Finding: The 2000-character limit is NOT enforced in the backend.** - -- Backend `Reference` model at [backend/app/domain/models.py](backend/app/domain/models.py#L12-L24): - ```python - class Reference(BaseModel): - url: str = Field(description="Reference URL (required, non-empty)") - title: str | None = Field(default=None) - content: str | None = None - keyExcerpt: str | None = None # <-- No max_length validation - type: str | None = None - bonus: bool = False - messageIndex: Optional[int] = None - ``` - -- The `keyExcerpt` field (maps to `keyParagraph` in frontend) has **no length constraints** defined. - -- The 2000-character limit exists **only in the frontend UI display** at [frontend/src/components/app/editor/TurnReferencesModal.tsx](frontend/src/components/app/editor/TurnReferencesModal.tsx#L341): - ```tsx - - {len}/40 (2000 max) - - ``` - -- This is purely informational - **no validation prevents submission of longer text**. - -### 2. How does the backend return validation errors? - -**Finding: Generic HTTPException pattern with `detail` field.** - -The backend uses FastAPI's `HTTPException` with a `detail` parameter: - -- Example from [backend/app/api/v1/ground_truths.py](backend/app/api/v1/ground_truths.py#L234-L236): - ```python - raise HTTPException( - status_code=400, - detail=f"Tag '{tag[:50]}...' exceeds maximum length of {MAX_TAG_LENGTH} characters.", - ) - ``` - -- Validation errors return HTTP 422 with `HTTPValidationError` schema containing: - ```json - { - "detail": [ - { - "type": "string", - "loc": ["body", "field_name"], - "msg": "validation error message", - "input": "..." - } - ] - } - ``` - -- Chat endpoint uses safe error messages at [backend/app/api/v1/chat.py](backend/app/api/v1/chat.py#L20-L24): - ```python - SAFE_ERROR_MESSAGES = { - "invalid_input": "Invalid request format", - "service_unavailable": "Service temporarily unavailable", - "processing_error": "Unable to process request", - } - ``` - -### 3. How does the frontend currently display validation errors? - -**Finding: Generic error display with minimal user guidance.** - -- Error mapping utility at [frontend/src/services/http.ts](frontend/src/services/http.ts#L26-L36): - ```typescript - export function mapApiErrorToMessage(err: unknown): string { - const e = err as Partial }>; - if (e && typeof e === "object" && typeof e.status === "number") { - const data = e.data as Record | undefined; - const detail = - (typeof data?.detail === "string" && data.detail) || - (typeof data?.message === "string" && data.message) || - ""; - return `${e.status} ${e.statusText ?? "Error"}${detail ? ` – ${detail}` : ""}`; - } - return "Network or unexpected error"; - } - ``` - -- The `save()` function in [frontend/src/hooks/useGroundTruth.ts](frontend/src/hooks/useGroundTruth.ts#L248-L252) returns errors as-is: - ```typescript - } catch (e) { - const msg = e instanceof Error ? e.message : String(e); - return { ok: false, error: msg }; - } - ``` - -- **No error transformation** maps technical errors to user-friendly messages with remediation guidance. - -### 4. What error message mapping/transformation exists? - -**Finding: No semantic error mapping exists in either layer.** - -- Frontend does **not** have an error code registry or mapping table. -- Backend uses `detail` strings directly without error codes. -- The only pattern observed is `TagsModal.tsx` which has local `validationError` state for immediate UI feedback, but this doesn't apply to save operations. - -### 5. Where is key paragraph handled in the UI? - -**Locations identified:** - -| Component | File | Lines | Purpose | -|-----------|------|-------|---------| -| `TurnReferencesModal` | [frontend/src/components/app/editor/TurnReferencesModal.tsx](frontend/src/components/app/editor/TurnReferencesModal.tsx#L304-L341) | 304-341 | Primary key paragraph editor with character counter | -| `useGroundTruth` | [frontend/src/hooks/useGroundTruth.ts](frontend/src/hooks/useGroundTruth.ts#L122) | 122, 154 | Trims keyParagraph in reference mapping | -| API mapper | [frontend/src/adapters/apiMapper.ts](frontend/src/adapters/apiMapper.ts#L34) | 34, 64, 117, 127, 155 | Maps between `keyParagraph` (frontend) and `keyExcerpt` (backend) | - -**Key UI behavior:** -- Character counter shows `{len}/40 (2000 max)` but this is **advisory only** -- Textarea has no `maxLength` attribute -- No client-side validation before submission - ---- - -## Gap Analysis - -### Current State vs Desired State - -| Aspect | Current State | Desired State | -|--------|--------------|---------------| -| Backend validation | None for keyExcerpt length | 2000 char limit enforced | -| Error format | Generic `detail` string | Structured error with code, field, and remediation | -| Frontend mapping | Pass-through display | Semantic mapping to user-friendly messages | -| UI feedback | Post-submission error | Real-time validation + clear guidance | - -### Root Causes of SA-334 - -1. **Missing backend validation**: The 2000-character limit mentioned in SA-334 doesn't exist in backend code -2. **Generic error handling**: `mapApiErrorToMessage()` produces messages like `"400 Bad Request – Invalid request format"` without context -3. **No error code system**: Cannot map backend errors to specific UI guidance -4. **Frontend-only limit display**: The `(2000 max)` indicator suggests a limit that isn't enforced - ---- - -## Recommendations - -### Immediate (SA-334 Fix) - -1. **Add backend validation** for `keyExcerpt`: - ```python - # backend/app/domain/models.py - keyExcerpt: str | None = Field(default=None, max_length=2000) - ``` - -2. **Add frontend validation** in `TurnReferencesModal.tsx`: - ```tsx - // Add maxLength to textarea -