Skip to content

Update documentation for Phase 2 completion#8

Closed
JDRay42 wants to merge 23 commits intomainfrom
docs/phase2-completion-update
Closed

Update documentation for Phase 2 completion#8
JDRay42 wants to merge 23 commits intomainfrom
docs/phase2-completion-update

Conversation

@JDRay42
Copy link
Copy Markdown
Owner

@JDRay42 JDRay42 commented Jan 6, 2026

Updates README and tracker to reflect Phase 2 NER completion with 49 passing tests

Copilot AI review requested due to automatic review settings January 6, 2026 00:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates documentation to reflect Phase 2 NER (Named Entity Recognition) completion with comprehensive test coverage (49 passing tests). The changes include:

  • Adding test projects to the solution for unit and integration testing
  • Implementing entity mention detection and resolution services
  • Adding web UI views and controllers for entities, documents, assertions, and questionable items
  • Integrating NER functionality into the document ingestion pipeline
  • Adding comprehensive documentation (spec.md, TESTING.md, SHOELACE_FORM_FIX.md)

Key Changes:

  • NER Implementation: Entity mention detection using pattern-based heuristics (capitalization, acronyms) and fuzzy matching for entity resolution
  • Test Coverage: 49 unit tests covering services, repositories, and controllers
  • UI Implementation: Complete web interface for managing entities, documents, assertions, and reviewing questionable items
  • Documentation: Detailed specification, testing strategy, and Shoelace form fix documentation

Reviewed changes

Copilot reviewed 56 out of 57 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/Palimpsest.sln Added test project references (Unit and Integration)
src/Palimpsest.Web/obj/* NuGet cache and build artifacts with local machine paths
src/Palimpsest.Web/Views/**/*.cshtml New UI views for entities, documents, universes, assertions, and questionable items
src/Palimpsest.Web/Controllers/*.cs New controllers for entities, documents, assertions, and questionable items
src/Palimpsest.Web/Program.cs Registered new services and repositories; added partial Program class
src/Palimpsest.Infrastructure/Services/*.cs NER services: EntityMentionService and EntityResolutionService
src/Palimpsest.Infrastructure/Repositories/*.cs New repositories for entity aliases, mentions, and questionable items
src/Palimpsest.Tests.Unit/**/*.cs 49 unit tests for services, repositories, and controllers
src/Palimpsest.Tests.Integration/**/*.cs Integration test infrastructure and tests
spec.md, TESTING.md, SHOELACE_FORM_FIX.md Comprehensive documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +251 to +258
foreach (var alias in existingAliases)
{
if (!alias.Alias.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase) &&
!newAliases.Contains(alias.Alias))
{
await _entityAliasRepository.DeleteAsync(alias.AliasId);
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.

Copilot uses AI. Check for mistakes.
Comment on lines +265 to +280
foreach (var aliasName in newAliases)
{
if (!existingAliasNames.Contains(aliasName) &&
!aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase))
{
var alias = new EntityAlias
{
AliasId = Guid.NewGuid(),
EntityId = entity.EntityId,
Alias = aliasName,
AliasNorm = aliasName.ToLowerInvariant(),
CreatedAt = DateTime.UtcNow
};
await _entityAliasRepository.CreateAsync(alias);
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.

Copilot uses AI. Check for mistakes.
Comment on lines +441 to +456
foreach (var aliasName in dto.Aliases)
{
if (!string.IsNullOrWhiteSpace(aliasName) &&
!aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase))
{
var alias = new EntityAlias
{
AliasId = Guid.NewGuid(),
EntityId = entity.EntityId,
Alias = aliasName,
AliasNorm = aliasName.ToLowerInvariant(),
CreatedAt = DateTime.UtcNow
};
await _entityAliasRepository.CreateAsync(alias);
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.

Copilot uses AI. Check for mistakes.
Comment on lines +518 to +533
foreach (var aliasName in aliases)
{
if (!string.IsNullOrWhiteSpace(aliasName) &&
!aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase))
{
var alias = new EntityAlias
{
AliasId = Guid.NewGuid(),
EntityId = entity.EntityId,
Alias = aliasName,
AliasNorm = aliasName.ToLowerInvariant(),
CreatedAt = DateTime.UtcNow
};
await _entityAliasRepository.CreateAsync(alias);
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.

Copilot uses AI. Check for mistakes.
Comment on lines +133 to +149
foreach (var aliasLine in aliasLines)
{
var aliasName = aliasLine.Trim();
if (!string.IsNullOrWhiteSpace(aliasName) &&
!aliasName.Equals(entity.CanonicalName, StringComparison.OrdinalIgnoreCase))
{
var alias = new EntityAlias
{
AliasId = Guid.NewGuid(),
EntityId = entity.EntityId,
Alias = aliasName,
AliasNorm = aliasName.ToLowerInvariant(),
CreatedAt = DateTime.UtcNow
};
await _entityAliasRepository.CreateAsync(alias);
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.

Copilot uses AI. Check for mistakes.
Comment on lines +240 to +247
foreach (var line in aliasLines)
{
var trimmed = line.Trim();
if (!string.IsNullOrWhiteSpace(trimmed))
{
newAliases.Add(trimmed);
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.

Copilot uses AI. Check for mistakes.
["__RequestVerificationToken"] = antiForgeryToken
};

var content = new FormUrlEncodedContent(formData);
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disposable 'FormUrlEncodedContent' is created but not disposed.

Suggested change
var content = new FormUrlEncodedContent(formData);
using var content = new FormUrlEncodedContent(formData);

Copilot uses AI. Check for mistakes.
Comment on lines +173 to +175
var postResponse = await _client.PostAsync(
$"/universes/setactive?id={universe.UniverseId}",
new FormUrlEncodedContent(formData));
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disposable 'FormUrlEncodedContent' is created but not disposed.

Suggested change
var postResponse = await _client.PostAsync(
$"/universes/setactive?id={universe.UniverseId}",
new FormUrlEncodedContent(formData));
using var formContent = new FormUrlEncodedContent(formData);
var postResponse = await _client.PostAsync(
$"/universes/setactive?id={universe.UniverseId}",
formContent);

Copilot uses AI. Check for mistakes.
JDRay42 added 17 commits January 5, 2026 17:52
- Created unit test project (Palimpsest.Tests.Unit) with 17 passing tests
  - UniverseRepositoryTests (8 tests)
  - DocumentRepositoryTests (6 tests)
  - UniverseContextServiceTests (4 tests)
  - Added Moq, FluentAssertions, and InMemory EF Core dependencies

- Created integration test project (Palimpsest.Tests.Integration)
  - Custom WebApplicationFactory with in-memory database
  - UniversesController integration tests (7 tests)
  - Microsoft.AspNetCore.Mvc.Testing framework setup

- Added missing controllers
  - DocumentsController: Create, Index, Details, Delete with file upload support
  - EntitiesController: Index, Details, Search with universe scoping
  - AssertionsController: Index, Details for viewing assertions
  - QuestionableController: Index for conflict management (stub)

- Added missing views
  - Universes/Details.cshtml: Tabbed interface with overview, documents, entities, assertions, questionable items

- Enhanced repositories
  - Added SearchByNameAsync to IEntityRepository and EntityRepository
  - Implemented name and alias search with pagination

- Fixed application issues
  - Updated Program.cs to expose partial Program class for testing
  - Updated UniversesController Details action to include ViewBag data
  - Created wwwroot directory for static files

- Added comprehensive code review documentation
  - CODE_REVIEW_SUMMARY.md with architecture review and recommendations
- Added @RenderSectionAsync for Scripts section in _Layout.cshtml
- Removed unnecessary SignalR script from Create.cshtml
- Added missing using directives for InMemory database tests

This fixes the error when clicking Create Universe button.
- Removed InMemory database tests that failed with pgvector
- Added UniversesControllerTests with 7 tests using Moq (all passing)
- Updated test project to reference Web project
- Created TESTING.md with strategy and examples
- Unit tests now properly mock dependencies instead of using database

Test Results: 11 passing tests (UniversesController: 7, UniverseContextService: 4)
- Created Documents/Create.cshtml with file upload and text paste options
- Created Documents/Index.cshtml showing document list
- Created Documents/Details.cshtml with tabs for overview, versions, segments
- All views use Shoelace components matching existing UI style
- Created Entities/Index.cshtml with search functionality
- Created Entities/Details.cshtml with tabs for overview, aliases, mentions, assertions
- Created Assertions/Index.cshtml showing assertion list with filters
- Created Assertions/Details.cshtml with tabs for overview, entities, context
- Created Questionable/Index.cshtml for conflict management
- All views follow existing Shoelace UI patterns
- Fixed Assertion property names: SubjectEntity, ObjectEntity, Epistemic, EvidenceSegment
- Fixed Confidence from nullable to non-nullable float
- Fixed TimeScopeKind property reference
- Added @using directives for enum types
- Constructed assertion text from subject-predicate-object triple
- Fixed all Domain.Enums references to use imported enum types
- EpistemicCategory: Observed, Believed, Inferred (not CanonicalFact, Speculation, Contradiction)
- Severity: Info, Warn, Error (not Critical, High, Medium, Low)
- QuestionableItemStatus: Open, Dismissed, Resolved (not New, UnderReview)
- EntityType: Added Object, Org, EventLike (not Thing)
- Segment.Text (not SegmentText), Segment.Version (not DocumentVersion)
- EntityAlias.Alias (not AliasName)
- QuestionableItem.Details JSON string (not Description/Notes properties)
Applied same fix to universe creation form:
- Added hidden input fields for Name, AuthorName, Description
- JavaScript intercepts form submission and copies values from Shoelace components
- Ensures universe creation works correctly

This completes the fix for all Shoelace-based forms in the application.
- Explains the root cause of the form submission issue
- Documents the solution approach
- Provides best practices for future forms
- Includes testing instructions
- Breaks down all missing functionality into 8 phases, 28 tasks
- Each task includes: priority, complexity, dependencies, effort estimate
- Covers entity detection, resolution, dossiers, LLM extraction, conflicts
- Includes UI enhancements, performance optimization, testing
- Total estimated effort: 120-164 hours (20-27 agent days)
- Provides clear starting point for next agent
- Documents critical path and parallelizable work
- Created IEntityMentionService and EntityMentionService for detecting entity mentions in text
- Detects capitalized sequences and all-caps acronyms with confidence scoring
- Filters out common words and sentence-start capitalization
- Created IEntityResolutionService and EntityResolutionService for resolving mentions to entities
- Implements exact match, fuzzy match, and ambiguous match resolution
- Creates new entities when no match is found
- Creates QuestionableItems for ambiguous entity identity cases
- Created EntityMentionRepository and EntityAliasRepository
- EntityAliasRepository supports pg_trgm fuzzy matching with similarity scoring
- Created QuestionableItemRepository for conflict management
- Integrated mention detection and resolution into IngestionService pipeline
- Pipeline now tracks: segmentation -> mention_detection -> entity_resolution -> complete
- Job progress JSON includes mention counts and resolution statistics
- Registered all new services and repositories in DI container
- All code compiles successfully
- Created comprehensive PHASE_2_SUMMARY.md documenting all implemented features
- Updated INGESTION_PIPELINE_TRACKER.md to reflect Phase 2 completion
- Documents entity mention detection, alias matching, and resolution logic
- Includes pipeline flow, confidence thresholds, and example processing
- Notes known limitations and next steps for Phase 3
- EntityMentionServiceTests: 12 tests for mention detection
  * Single/multi-word name detection
  * Common word filtering
  * Acronym detection (all-caps sequences)
  * Possessive form handling
  * Mid-sentence capitalization
  * Batch processing across segments
  * Duplicate span removal
  * Span position tracking

- EntityResolutionServiceTests: 9 tests for resolution logic
  * Path 1: No matches → Create new entity
  * Path 2: Single high confidence (≥0.85) → Auto-resolve
  * Path 3: Multiple candidates (≥0.75) → QuestionableItem
  * Path 4: Single medium confidence → Resolve with adjusted confidence
  * Path 5: Below threshold → Leave unresolved
  * Batch processing
  * Entity type inference (Organization/Person)

- EntityAliasRepositoryTests: 28 tests for fuzzy matching
  * Exact match detection
  * Similarity threshold filtering
  * Case insensitivity
  * Levenshtein distance calculations
  * Normalization (whitespace trimming, lowercase)
  * Real-world examples (nicknames, compound names)
  * Acronym variations
  * Possessive form similarity
  * Multi-word name handling
  * Name ordering comparisons

All 49 tests passing ✓
- Created IngestionPipelineIntegrationTests with 4 test cases
- Tests for EntityMentionService mention detection
- Tests for EntityResolutionService entity creation and resolution
- Tests for exact alias matching and batch processing

Note: Tests currently cannot run with in-memory database because:
- In-memory DB doesn't support pgvector (Vector type)
- In-memory DB doesn't support pg_trgm (fuzzy matching)

Comprehensive unit test coverage (49 tests) validates all NER functionality.
Integration tests would require test PostgreSQL instance configuration.
…on workflows

Entity Management:
- EntitiesController: Create, Edit, Delete, Import, UpdateMentionResolution actions
- Create.cshtml: Manual entity creation with canonical name, type, aliases
- Edit.cshtml: Update entity with alias diff logic (add/remove)
- Delete.cshtml: Safe deletion with mention unlinking and impact analysis
- Import.cshtml: Bulk import supporting JSON and CSV formats
- Added Create/Import buttons to Entities/Index.cshtml

Bulk Import:
- JSON format: [{name, type, aliases[]}]
- CSV format: name,type,aliases (semicolon-separated)
- Duplicate detection by canonical name
- Automatic alias creation (canonical + additional)
- ImportEntityDto class for deserialization

Mention Validation:
- Enhanced Entities/Details.cshtml with Mentions tab
- Shows all EntityMention records with surface form, confidence, context
- Actions: Approve (confirm), Reject (unlink), Remap (link to different entity)
- Remap dialog with real-time entity search
- UpdateMentionResolution AJAX endpoint for instant updates
- Added Edit/Delete buttons to Details page

QuestionableItem Resolution:
- QuestionableController: Index, Details, Resolve, Dismiss, CreateAndResolve
- Index.cshtml: List with status filtering (Open/Resolved/Dismissed)
- Details.cshtml: Review interface with three resolution paths:
  1. Link to existing entity (search and select)
  2. Create new entity (redirect to creation form)
  3. Dismiss as false positive (with optional notes)
- Shows mention context, confidence scores, segment text

Repository Enhancements:
- EntityAliasRepository: Added UpdateAsync, DeleteAsync methods
- IEntityAliasRepository: Added method signatures

Bug Fixes:
- Fixed enum values to match domain: QuestionableItemStatus.Open (not Pending)
- Fixed property names: ItemId (not QuestionableItemId), Confidence (non-nullable)
- Fixed Severity enum: Info/Warn/Error (not Critical/High/Medium/Low)
- Fixed QuestionableItemType: Identity/Conflict/LowConfidence/Constraint
- Added data-entity-id attributes to entity cards for JavaScript integration

Workflow Support:
- Pre-ingestion: Import character lists before document processing
- Post-ingestion: Review QuestionableItems, validate mentions, remap as needed
- All views use Shoelace 2.12.0 components with proper form participation
- Updated INGESTION_PIPELINE_TRACKER.md with completed Phase 2 tasks
- Marked all 4 Phase 2 tasks as complete with actual effort hours
- Added Task 2.5 documenting Entity Management UI implementation
- Updated README.md with comprehensive Implementation Status section
- Added detailed entity management workflows (import, validation, resolution)
- Updated Running Tests section with 49 passing unit tests
- Documented test coverage for EntityMentionService, EntityResolutionService, EntityAliasRepository
- Phase 2 complete: Full NER pipeline with user management interfaces
- QuestionableController: Use Resolution JSON field instead of ResolutionStatus/ResolutionNotes
- QuestionableController: Remove ObjectKind/ObjectId references (properties don't exist)
- Entities/Details.cshtml: Fix Confidence property (float, not nullable)
- Entities/Details.cshtml: Fix EntityMentionId → MentionId
- QuestionableController: Add missing closing braces
@JDRay42 JDRay42 force-pushed the docs/phase2-completion-update branch from af618b0 to 59015d4 Compare January 6, 2026 02:00
@JDRay42
Copy link
Copy Markdown
Owner Author

JDRay42 commented Jan 6, 2026

@copilot please do another review, we did a rebase and a new push.

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 6, 2026

@JDRay42 I've opened a new pull request, #9, to work on those changes. Once the pull request is ready, I'll request review from you.

@JDRay42
Copy link
Copy Markdown
Owner Author

JDRay42 commented Jan 6, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 6, 2026

@JDRay42 I've opened a new pull request, #10, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits January 6, 2026 18:35
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Copilot AI and others added 3 commits January 6, 2026 18:39
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Co-authored-by: JDRay42 <10964706+JDRay42@users.noreply.github.com>
Refactor foreach loops to explicit LINQ and fix resource disposal
@JDRay42 JDRay42 closed this Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants