Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates documentation to reflect Phase 2 NER (Named Entity Recognition) completion with comprehensive test coverage (49 passing tests). The changes include:
- Adding test projects to the solution for unit and integration testing
- Implementing entity mention detection and resolution services
- Adding web UI views and controllers for entities, documents, assertions, and questionable items
- Integrating NER functionality into the document ingestion pipeline
- Adding comprehensive documentation (spec.md, TESTING.md, SHOELACE_FORM_FIX.md)
Key Changes:
- NER Implementation: Entity mention detection using pattern-based heuristics (capitalization, acronyms) and fuzzy matching for entity resolution
- Test Coverage: 49 unit tests covering services, repositories, and controllers
- UI Implementation: Complete web interface for managing entities, documents, assertions, and reviewing questionable items
- Documentation: Detailed specification, testing strategy, and Shoelace form fix documentation
Reviewed changes
Copilot reviewed 56 out of 57 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Palimpsest.sln | Added test project references (Unit and Integration) |
| src/Palimpsest.Web/obj/* | NuGet cache and build artifacts with local machine paths |
| src/Palimpsest.Web/Views/**/*.cshtml | New UI views for entities, documents, universes, assertions, and questionable items |
| src/Palimpsest.Web/Controllers/*.cs | New controllers for entities, documents, assertions, and questionable items |
| src/Palimpsest.Web/Program.cs | Registered new services and repositories; added partial Program class |
| src/Palimpsest.Infrastructure/Services/*.cs | NER services: EntityMentionService and EntityResolutionService |
| src/Palimpsest.Infrastructure/Repositories/*.cs | New repositories for entity aliases, mentions, and questionable items |
| src/Palimpsest.Tests.Unit/**/*.cs | 49 unit tests for services, repositories, and controllers |
| src/Palimpsest.Tests.Integration/**/*.cs | Integration test infrastructure and tests |
| spec.md, TESTING.md, SHOELACE_FORM_FIX.md | Comprehensive documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| foreach (var alias in existingAliases) | ||
| { | ||
| if (!alias.Alias.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase) && | ||
| !newAliases.Contains(alias.Alias)) | ||
| { | ||
| await _entityAliasRepository.DeleteAsync(alias.AliasId); | ||
| } | ||
| } |
There was a problem hiding this comment.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
| foreach (var aliasName in newAliases) | ||
| { | ||
| if (!existingAliasNames.Contains(aliasName) && | ||
| !aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| var alias = new EntityAlias | ||
| { | ||
| AliasId = Guid.NewGuid(), | ||
| EntityId = entity.EntityId, | ||
| Alias = aliasName, | ||
| AliasNorm = aliasName.ToLowerInvariant(), | ||
| CreatedAt = DateTime.UtcNow | ||
| }; | ||
| await _entityAliasRepository.CreateAsync(alias); | ||
| } | ||
| } |
There was a problem hiding this comment.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
| foreach (var aliasName in dto.Aliases) | ||
| { | ||
| if (!string.IsNullOrWhiteSpace(aliasName) && | ||
| !aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| var alias = new EntityAlias | ||
| { | ||
| AliasId = Guid.NewGuid(), | ||
| EntityId = entity.EntityId, | ||
| Alias = aliasName, | ||
| AliasNorm = aliasName.ToLowerInvariant(), | ||
| CreatedAt = DateTime.UtcNow | ||
| }; | ||
| await _entityAliasRepository.CreateAsync(alias); | ||
| } | ||
| } |
There was a problem hiding this comment.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
| foreach (var aliasName in aliases) | ||
| { | ||
| if (!string.IsNullOrWhiteSpace(aliasName) && | ||
| !aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| var alias = new EntityAlias | ||
| { | ||
| AliasId = Guid.NewGuid(), | ||
| EntityId = entity.EntityId, | ||
| Alias = aliasName, | ||
| AliasNorm = aliasName.ToLowerInvariant(), | ||
| CreatedAt = DateTime.UtcNow | ||
| }; | ||
| await _entityAliasRepository.CreateAsync(alias); | ||
| } | ||
| } |
There was a problem hiding this comment.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
| foreach (var aliasLine in aliasLines) | ||
| { | ||
| var aliasName = aliasLine.Trim(); | ||
| if (!string.IsNullOrWhiteSpace(aliasName) && | ||
| !aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| var alias = new EntityAlias | ||
| { | ||
| AliasId = Guid.NewGuid(), | ||
| EntityId = entity.EntityId, | ||
| Alias = aliasName, | ||
| AliasNorm = aliasName.ToLowerInvariant(), | ||
| CreatedAt = DateTime.UtcNow | ||
| }; | ||
| await _entityAliasRepository.CreateAsync(alias); | ||
| } | ||
| } |
There was a problem hiding this comment.
This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.
| foreach (var line in aliasLines) | ||
| { | ||
| var trimmed = line.Trim(); | ||
| if (!string.IsNullOrWhiteSpace(trimmed)) | ||
| { | ||
| newAliases.Add(trimmed); | ||
| } | ||
| } |
There was a problem hiding this comment.
This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.
| ["__RequestVerificationToken"] = antiForgeryToken | ||
| }; | ||
|
|
||
| var content = new FormUrlEncodedContent(formData); |
There was a problem hiding this comment.
Disposable 'FormUrlEncodedContent' is created but not disposed.
| var content = new FormUrlEncodedContent(formData); | |
| using var content = new FormUrlEncodedContent(formData); |
| var postResponse = await _client.PostAsync( | ||
| $"/universes/setactive?id={universe.UniverseId}", | ||
| new FormUrlEncodedContent(formData)); |
There was a problem hiding this comment.
Disposable 'FormUrlEncodedContent' is created but not disposed.
| var postResponse = await _client.PostAsync( | |
| $"/universes/setactive?id={universe.UniverseId}", | |
| new FormUrlEncodedContent(formData)); | |
| using var formContent = new FormUrlEncodedContent(formData); | |
| var postResponse = await _client.PostAsync( | |
| $"/universes/setactive?id={universe.UniverseId}", | |
| formContent); |
- Created unit test project (Palimpsest.Tests.Unit) with 17 passing tests - UniverseRepositoryTests (8 tests) - DocumentRepositoryTests (6 tests) - UniverseContextServiceTests (4 tests) - Added Moq, FluentAssertions, and InMemory EF Core dependencies - Created integration test project (Palimpsest.Tests.Integration) - Custom WebApplicationFactory with in-memory database - UniversesController integration tests (7 tests) - Microsoft.AspNetCore.Mvc.Testing framework setup - Added missing controllers - DocumentsController: Create, Index, Details, Delete with file upload support - EntitiesController: Index, Details, Search with universe scoping - AssertionsController: Index, Details for viewing assertions - QuestionableController: Index for conflict management (stub) - Added missing views - Universes/Details.cshtml: Tabbed interface with overview, documents, entities, assertions, questionable items - Enhanced repositories - Added SearchByNameAsync to IEntityRepository and EntityRepository - Implemented name and alias search with pagination - Fixed application issues - Updated Program.cs to expose partial Program class for testing - Updated UniversesController Details action to include ViewBag data - Created wwwroot directory for static files - Added comprehensive code review documentation - CODE_REVIEW_SUMMARY.md with architecture review and recommendations
- Added @RenderSectionAsync for Scripts section in _Layout.cshtml - Removed unnecessary SignalR script from Create.cshtml - Added missing using directives for InMemory database tests This fixes the error when clicking Create Universe button.
- Removed InMemory database tests that failed with pgvector - Added UniversesControllerTests with 7 tests using Moq (all passing) - Updated test project to reference Web project - Created TESTING.md with strategy and examples - Unit tests now properly mock dependencies instead of using database Test Results: 11 passing tests (UniversesController: 7, UniverseContextService: 4)
- Created Documents/Create.cshtml with file upload and text paste options - Created Documents/Index.cshtml showing document list - Created Documents/Details.cshtml with tabs for overview, versions, segments - All views use Shoelace components matching existing UI style
- Created Entities/Index.cshtml with search functionality - Created Entities/Details.cshtml with tabs for overview, aliases, mentions, assertions - Created Assertions/Index.cshtml showing assertion list with filters - Created Assertions/Details.cshtml with tabs for overview, entities, context - Created Questionable/Index.cshtml for conflict management - All views follow existing Shoelace UI patterns
- Fixed Assertion property names: SubjectEntity, ObjectEntity, Epistemic, EvidenceSegment - Fixed Confidence from nullable to non-nullable float - Fixed TimeScopeKind property reference - Added @using directives for enum types - Constructed assertion text from subject-predicate-object triple - Fixed all Domain.Enums references to use imported enum types
- EpistemicCategory: Observed, Believed, Inferred (not CanonicalFact, Speculation, Contradiction) - Severity: Info, Warn, Error (not Critical, High, Medium, Low) - QuestionableItemStatus: Open, Dismissed, Resolved (not New, UnderReview) - EntityType: Added Object, Org, EventLike (not Thing) - Segment.Text (not SegmentText), Segment.Version (not DocumentVersion) - EntityAlias.Alias (not AliasName) - QuestionableItem.Details JSON string (not Description/Notes properties)
Applied same fix to universe creation form: - Added hidden input fields for Name, AuthorName, Description - JavaScript intercepts form submission and copies values from Shoelace components - Ensures universe creation works correctly This completes the fix for all Shoelace-based forms in the application.
- Explains the root cause of the form submission issue - Documents the solution approach - Provides best practices for future forms - Includes testing instructions
- Breaks down all missing functionality into 8 phases, 28 tasks - Each task includes: priority, complexity, dependencies, effort estimate - Covers entity detection, resolution, dossiers, LLM extraction, conflicts - Includes UI enhancements, performance optimization, testing - Total estimated effort: 120-164 hours (20-27 agent days) - Provides clear starting point for next agent - Documents critical path and parallelizable work
- Created IEntityMentionService and EntityMentionService for detecting entity mentions in text - Detects capitalized sequences and all-caps acronyms with confidence scoring - Filters out common words and sentence-start capitalization - Created IEntityResolutionService and EntityResolutionService for resolving mentions to entities - Implements exact match, fuzzy match, and ambiguous match resolution - Creates new entities when no match is found - Creates QuestionableItems for ambiguous entity identity cases - Created EntityMentionRepository and EntityAliasRepository - EntityAliasRepository supports pg_trgm fuzzy matching with similarity scoring - Created QuestionableItemRepository for conflict management - Integrated mention detection and resolution into IngestionService pipeline - Pipeline now tracks: segmentation -> mention_detection -> entity_resolution -> complete - Job progress JSON includes mention counts and resolution statistics - Registered all new services and repositories in DI container - All code compiles successfully
- Created comprehensive PHASE_2_SUMMARY.md documenting all implemented features - Updated INGESTION_PIPELINE_TRACKER.md to reflect Phase 2 completion - Documents entity mention detection, alias matching, and resolution logic - Includes pipeline flow, confidence thresholds, and example processing - Notes known limitations and next steps for Phase 3
- EntityMentionServiceTests: 12 tests for mention detection * Single/multi-word name detection * Common word filtering * Acronym detection (all-caps sequences) * Possessive form handling * Mid-sentence capitalization * Batch processing across segments * Duplicate span removal * Span position tracking - EntityResolutionServiceTests: 9 tests for resolution logic * Path 1: No matches → Create new entity * Path 2: Single high confidence (≥0.85) → Auto-resolve * Path 3: Multiple candidates (≥0.75) → QuestionableItem * Path 4: Single medium confidence → Resolve with adjusted confidence * Path 5: Below threshold → Leave unresolved * Batch processing * Entity type inference (Organization/Person) - EntityAliasRepositoryTests: 28 tests for fuzzy matching * Exact match detection * Similarity threshold filtering * Case insensitivity * Levenshtein distance calculations * Normalization (whitespace trimming, lowercase) * Real-world examples (nicknames, compound names) * Acronym variations * Possessive form similarity * Multi-word name handling * Name ordering comparisons All 49 tests passing ✓
- Created IngestionPipelineIntegrationTests with 4 test cases - Tests for EntityMentionService mention detection - Tests for EntityResolutionService entity creation and resolution - Tests for exact alias matching and batch processing Note: Tests currently cannot run with in-memory database because: - In-memory DB doesn't support pgvector (Vector type) - In-memory DB doesn't support pg_trgm (fuzzy matching) Comprehensive unit test coverage (49 tests) validates all NER functionality. Integration tests would require test PostgreSQL instance configuration.
…on workflows
Entity Management:
- EntitiesController: Create, Edit, Delete, Import, UpdateMentionResolution actions
- Create.cshtml: Manual entity creation with canonical name, type, aliases
- Edit.cshtml: Update entity with alias diff logic (add/remove)
- Delete.cshtml: Safe deletion with mention unlinking and impact analysis
- Import.cshtml: Bulk import supporting JSON and CSV formats
- Added Create/Import buttons to Entities/Index.cshtml
Bulk Import:
- JSON format: [{name, type, aliases[]}]
- CSV format: name,type,aliases (semicolon-separated)
- Duplicate detection by canonical name
- Automatic alias creation (canonical + additional)
- ImportEntityDto class for deserialization
Mention Validation:
- Enhanced Entities/Details.cshtml with Mentions tab
- Shows all EntityMention records with surface form, confidence, context
- Actions: Approve (confirm), Reject (unlink), Remap (link to different entity)
- Remap dialog with real-time entity search
- UpdateMentionResolution AJAX endpoint for instant updates
- Added Edit/Delete buttons to Details page
QuestionableItem Resolution:
- QuestionableController: Index, Details, Resolve, Dismiss, CreateAndResolve
- Index.cshtml: List with status filtering (Open/Resolved/Dismissed)
- Details.cshtml: Review interface with three resolution paths:
1. Link to existing entity (search and select)
2. Create new entity (redirect to creation form)
3. Dismiss as false positive (with optional notes)
- Shows mention context, confidence scores, segment text
Repository Enhancements:
- EntityAliasRepository: Added UpdateAsync, DeleteAsync methods
- IEntityAliasRepository: Added method signatures
Bug Fixes:
- Fixed enum values to match domain: QuestionableItemStatus.Open (not Pending)
- Fixed property names: ItemId (not QuestionableItemId), Confidence (non-nullable)
- Fixed Severity enum: Info/Warn/Error (not Critical/High/Medium/Low)
- Fixed QuestionableItemType: Identity/Conflict/LowConfidence/Constraint
- Added data-entity-id attributes to entity cards for JavaScript integration
Workflow Support:
- Pre-ingestion: Import character lists before document processing
- Post-ingestion: Review QuestionableItems, validate mentions, remap as needed
- All views use Shoelace 2.12.0 components with proper form participation
- Updated INGESTION_PIPELINE_TRACKER.md with completed Phase 2 tasks - Marked all 4 Phase 2 tasks as complete with actual effort hours - Added Task 2.5 documenting Entity Management UI implementation - Updated README.md with comprehensive Implementation Status section - Added detailed entity management workflows (import, validation, resolution) - Updated Running Tests section with 49 passing unit tests - Documented test coverage for EntityMentionService, EntityResolutionService, EntityAliasRepository - Phase 2 complete: Full NER pipeline with user management interfaces
- QuestionableController: Use Resolution JSON field instead of ResolutionStatus/ResolutionNotes - QuestionableController: Remove ObjectKind/ObjectId references (properties don't exist) - Entities/Details.cshtml: Fix Confidence property (float, not nullable) - Entities/Details.cshtml: Fix EntityMentionId → MentionId - QuestionableController: Add missing closing braces
af618b0 to
59015d4
Compare
|
@copilot please do another review, we did a rebase and a new push. |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Refactor foreach loops to explicit LINQ and fix resource disposal
Updates README and tracker to reflect Phase 2 NER completion with 49 passing tests