-
Notifications
You must be signed in to change notification settings - Fork 0
Integrate 3-level hierarchy with knowledge graph domain and dual-pass architecture #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
research-developer
wants to merge
20
commits into
main
Choose a base branch
from
dataset-kg-3level
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
4441471
Implement KnowledgeGraphTripleDataset for relational reasoning
research-developer 2b5f1f2
Add comprehensive tests for KnowledgeGraphTripleDataset
research-developer 467d2ff
Add knowledge graph dataset example and visualization
research-developer 46cdacc
Add evaluation metrics for knowledge graph reasoning
research-developer 9128a19
Add implementation summary and documentation
research-developer e0acbc5
Merge main foundation components (NSM-17, NSM-15, NSM-16) into KG branch
research-developer e20e45e
Merge hierarchical model (pooling + WHY/WHAT) from main
research-developer a718fca
Merge branch 'main' into dataset-knowledge-graph
research-developer 58d7ba0
Add NSM-23 training script and metrics for knowledge graph domain
research-developer b0bf9ea
Merge branch 'main' into dataset-knowledge-graph
research-developer 62423f9
Fix KG dataset: Add balanced negative sampling (50/50 split)
research-developer ddc8ff8
Merge validation framework from main into dataset-knowledge-graph
research-developer 9dacbe9
Update .gitignore to ignore generated files (NSM-28)
research-developer cb99aa3
Fix KG label shape to match test expectations
research-developer 295d918
Enable 3-level hierarchy for knowledge graph domain (Phase 1.5)
research-developer 64c4432
Improve 3-level hierarchy documentation and metrics
research-developer 8812248
Merge main into dataset-kg-3level
research-developer 3e2f6b6
Add merge verification test for single-pass and dual-pass modes
research-developer 622468a
Make dual-pass mode the default for 3-level hierarchy
research-developer 076b687
Fix critical issues identified in PR review
research-developer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,245 @@ | ||
| # Knowledge Graph Dataset Implementation Summary | ||
|
|
||
| ## Overview | ||
|
|
||
| Successfully implemented a comprehensive Knowledge Graph dataset generator for NSM Phase 1 exploration (NSM-10). This is one of three parallel dataset explorations to empirically validate the best domain for 2-level hierarchical reasoning. | ||
|
|
||
| ## What Was Implemented | ||
|
|
||
| ### 1. KnowledgeGraphTripleDataset (`nsm/data/knowledge_graph_dataset.py`) | ||
|
|
||
| **Core Features:** | ||
| - Generates 20K synthetic triples across 5K entities | ||
| - 50+ predicate types spanning biographical, geographic, creative, and conceptual relations | ||
| - 2-level hierarchy: L1 (facts/instances) and L2 (types/categories) | ||
| - Confidence scores 0.5-1.0 for partial observability | ||
| - 6 entity categories: People, Places, Organizations, Concepts, Awards, Dates | ||
|
|
||
| **Entity Generation:** | ||
| - Named entities: Einstein, Curie, Newton, Paris, London, MIT, Harvard, etc. | ||
| - Rich biographical data: born_in, works_at, won, studied_at | ||
| - Geographic hierarchies: city → country → continent | ||
| - Type hierarchies: Person → Living_Being → Entity | ||
|
|
||
| **Reasoning Support:** | ||
| - Multi-hop query generation (2-hop paths) | ||
| - Type consistency checking pairs | ||
| - Analogical reasoning support | ||
| - Link prediction labels | ||
|
|
||
| ### 2. Comprehensive Tests (`tests/data/test_kg_dataset.py`) | ||
|
|
||
| **21 Test Cases:** | ||
| - Dataset generation and initialization | ||
| - Triple structure validation | ||
| - Level distribution (L1 vs L2) | ||
| - Confidence variance and diversity | ||
| - Predicate type coverage (50+) | ||
| - Entity diversity and categories | ||
| - Named entity inclusion | ||
| - Multi-hop reasoning paths | ||
| - Type hierarchy validation | ||
| - PyG interface compliance | ||
| - Caching and reproducibility | ||
|
|
||
| **Test Results:** | ||
| - ✅ 21/21 tests passing | ||
| - ✅ 98% code coverage | ||
| - ✅ Reproducibility verified | ||
| - ✅ PyG DataLoader compatible | ||
|
|
||
| ### 3. Example Script (`examples/knowledge_graph_example.py`) | ||
|
|
||
| **16 Demonstration Sections:** | ||
| 1. Dataset creation and statistics | ||
| 2. Sample triples (L1 and L2) | ||
| 3. Predicate type distribution | ||
| 4. Entity category breakdown | ||
| 5. Named entity examples | ||
| 6. PyG graph structure | ||
| 7. Multi-hop reasoning queries | ||
| 8. Type consistency pairs | ||
| 9. Biographical reasoning chains | ||
| 10. Type hierarchy display | ||
| 11. Confidence distribution | ||
| 12. PyG DataLoader batching | ||
| 13. Reasoning pattern examples | ||
| 14. Instance-of relations | ||
| 15. Geographic chains | ||
| 16. Professional relations | ||
|
|
||
| ### 4. Evaluation Metrics (`nsm/evaluation/kg_metrics.py`) | ||
|
|
||
| **Metrics Implemented:** | ||
| - **Link Prediction:** Hits@K, MRR, Mean/Median Rank | ||
| - **Analogical Reasoning:** A:B :: C:D with vector arithmetic | ||
| - **Type Consistency:** Precision, Recall, F1, Confusion Matrix | ||
| - **Multi-hop Reasoning:** Exact match, Hits@K, Average Precision | ||
| - **Calibration:** ECE, MCE, Calibration Curves | ||
|
|
||
| ## Design Decisions | ||
|
|
||
| ### 1. Entity-Centric Knowledge Representation | ||
| **Rationale:** Knowledge graphs excel at entity relationships and type hierarchies, making them ideal for testing hierarchical abstraction in NSM. | ||
|
|
||
| ### 2. 50+ Predicate Types | ||
| **Rationale:** Rich relation vocabulary enables diverse reasoning patterns and tests R-GCN's basis decomposition (NSM-17). | ||
|
|
||
| ### 3. Confidence Variance (0.5-1.0) | ||
| **Rationale:** Partial observability tests NSM's confidence propagation (NSM-12) and provenance semiring implementation. | ||
|
|
||
| ### 4. Named Entity Inclusion | ||
| **Rationale:** Real-world entities (Einstein, Paris) make debugging and interpretation easier during development. | ||
|
|
||
| ### 5. Reproducible Generation with Seeds | ||
| **Rationale:** Essential for comparing across exploration branches (NSM-10, NSM-12, NSM-11). | ||
|
|
||
| ## Integration Points | ||
|
|
||
| ### With NSM-18 (PyG Infrastructure): | ||
| - ✅ Extends `BaseSemanticTripleDataset` | ||
| - ✅ Uses `GraphConstructor` for graph building | ||
| - ✅ Compatible with `TripleVocabulary` | ||
| - ✅ Returns PyG `Data` objects | ||
|
|
||
| ### With NSM-17 (R-GCN): | ||
| - ✅ Edge types for 50+ predicates | ||
| - ✅ Confidence as edge attributes | ||
| - ✅ Typed relations ready for basis decomposition | ||
|
|
||
| ### With NSM-12 (Confidence Exploration): | ||
| - ✅ Wide confidence range (0.5-1.0) | ||
| - ✅ Product semiring evaluation ready | ||
| - ✅ Calibration metrics implemented | ||
|
|
||
| ### With NSM-14 (Training Loop): | ||
| - ✅ Link prediction labels | ||
| - ✅ Batch loading compatible | ||
| - ✅ Evaluation metrics ready | ||
|
|
||
| ## Testing Results | ||
|
|
||
| ``` | ||
| ======================== 21 passed, 3 warnings in 4.43s ======================== | ||
|
|
||
| Coverage: | ||
| nsm/data/knowledge_graph_dataset.py: 98% | ||
| nsm/data/dataset.py: 69% | ||
|
|
||
| Key Metrics: | ||
| - 5000 triples generated | ||
| - 1298+ unique entities | ||
| - 66 predicates (50+ expected, extras from random generation) | ||
| - L1/L2 ratio: ~87%/13% (facts vs types) | ||
| - Average confidence: 0.77 | ||
| ``` | ||
|
|
||
| ## Commits Made | ||
|
|
||
| 1. **4441471** - Implement KnowledgeGraphTripleDataset for relational reasoning | ||
| - Core dataset class with entity/predicate generation | ||
| - Multi-hop query support | ||
| - Type hierarchy implementation | ||
| - Fix for PyTorch 2.6 weights_only parameter | ||
|
|
||
| 2. **2b5f1f2** - Add comprehensive tests for KnowledgeGraphTripleDataset | ||
| - 21 test cases covering all functionality | ||
| - 98% code coverage | ||
| - Reproducibility and caching tests | ||
|
|
||
| 3. **467d2ff** - Add knowledge graph dataset example and visualization | ||
| - 16 demonstration sections | ||
| - Reasoning chain examples | ||
| - PyG DataLoader integration | ||
|
|
||
| 4. **46cdacc** - Add evaluation metrics for knowledge graph reasoning | ||
| - Link prediction (Hits@K, MRR) | ||
| - Analogical reasoning | ||
| - Type consistency checking | ||
| - Calibration metrics (ECE/MCE) | ||
|
|
||
| ## Next Steps for NSM-10 Evaluation | ||
|
|
||
| ### Comparison Criteria (from CLAUDE.md): | ||
| 1. **Task accuracy (40%):** Link prediction, type inference | ||
| 2. **Calibration (20%):** ECE on confidence scores | ||
| 3. **Multi-hop (20%):** 2-hop reasoning accuracy | ||
| 4. **Interpretability (20%):** Debugging and explainability | ||
|
|
||
| ### Evaluation Protocol: | ||
| ```bash | ||
| # Run evaluation suite | ||
| python -m tests.evaluation_suite --dataset knowledge_graph --output results/kg.json | ||
|
|
||
| # Compare with other branches | ||
| python compare_results.py results/kg.json results/planning.json results/causal.json | ||
| ``` | ||
|
|
||
| ### Expected Strengths: | ||
| - ✅ Rich predicate diversity (50+ types) | ||
| - ✅ Clear type hierarchies (instance_of, subclass_of) | ||
| - ✅ Multi-hop paths (2-hop queries) | ||
| - ✅ Entity-centric interpretability | ||
|
|
||
| ### Potential Weaknesses: | ||
| - ⚠️ Less hierarchical structure than planning domain | ||
| - ⚠️ May need deeper hierarchies for full NSM evaluation | ||
| - ⚠️ Random relation generation may create noise | ||
|
|
||
| ## Files Changed | ||
|
|
||
| ``` | ||
| nsm/data/knowledge_graph_dataset.py (new, 682 lines) | ||
| nsm/data/dataset.py (modified, +1 line for weights_only fix) | ||
| tests/data/test_kg_dataset.py (new, 394 lines) | ||
| examples/knowledge_graph_example.py (new, 216 lines) | ||
| nsm/evaluation/__init__.py (new, 13 lines) | ||
| nsm/evaluation/kg_metrics.py (new, 449 lines) | ||
| ``` | ||
|
|
||
| **Total:** 1,755 lines of new code | ||
|
|
||
| ## Domain Properties | ||
|
|
||
| ### Level 1 (Facts/Instances): | ||
| - **Biographical:** born_in, died_in, works_at, studied_at, won | ||
| - **Geographic:** located_in, capital_of, borders, adjacent_to | ||
| - **Creative:** created, authored, composed, designed, invented | ||
| - **Professional:** employed_by, founded, leads, member_of | ||
| - **Temporal:** occurred_in, started_on, ended_on | ||
|
|
||
| ### Level 2 (Types/Categories): | ||
| - **Type hierarchy:** instance_of, subclass_of, category_of | ||
| - **Generalizations:** typically_has, usually_in, commonly_has | ||
| - **Abstract:** related_to, similar_to, implies, requires, enables | ||
|
|
||
| ### Mathematical Foundation: | ||
| ``` | ||
| Knowledge Graph G = (E, R, T) where: | ||
| - E: Set of entities (5K) | ||
| - R: Set of typed relations (50+) | ||
| - T ⊆ E × R × E: Set of triples (20K) | ||
| - L: Level function L: T → {1, 2} | ||
| - C: Confidence function C: T → [0.5, 1.0] | ||
| ``` | ||
|
|
||
| ## Conclusion | ||
|
|
||
| ✅ **Implementation Complete** | ||
| - Fully functional KG dataset generator | ||
| - Comprehensive test coverage (21/21 passing) | ||
| - Rich evaluation metrics | ||
| - Ready for NSM-10 parallel exploration | ||
|
|
||
| ✅ **NSM-18 Integration Verified** | ||
| - Compatible with BaseSemanticTripleDataset | ||
| - PyG Data objects working | ||
| - Vocabulary and graph construction validated | ||
|
|
||
| ✅ **Ready for Evaluation** | ||
| - Evaluation metrics implemented | ||
| - Comparison protocol defined | ||
| - Documentation complete | ||
|
|
||
| **Branch:** dataset-knowledge-graph | ||
| **Status:** ✅ Ready for evaluation and PR (once NSM-10 exploration complete) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Branch-Specific Summary Document Not Ignored
This summary document should be ignored according to the .gitignore rules added in this same PR (line 146: "*_SUMMARY.md"). The file KG_IMPLEMENTATION_SUMMARY.md matches this pattern and should not be committed. This appears to be a branch-specific summary document that was meant to be excluded per NSM-26.