Fix references system: proper loading, structure, and source tracking#193
Merged
Fix references system: proper loading, structure, and source tracking#193
Conversation
- Fix references.yml not being loaded (removed always-false guard condition) - Fix references.yml YAML structure (proper indentation, all keys under references:) - Fix YAML syntax error (escaped quotes in fetch_land_use entry) - Fix key mismatches: filtered_events_square → filter_events_within_square, iccp_climate_data → iccp_climate_model, added DestinE mapping - Expand IPCC references with all AR6 working group reports - Modify query_rag() to return (response, sources) tuple for source tracking - Update RAG agents to only add references for actually used sources - Add reference fields to ERA5 tools and get_data_components outputs - Collect and propagate references through data_analysis_agent to final output - Merge agent-collected references into references['used'] for display
koldunovn
approved these changes
Feb 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix references system and add proper source tracking
Summary
Fixed critical bug where references.yml was never loaded due to faulty condition
Fixed YAML structure issues preventing proper reference lookup
Implemented source tracking for RAG queries to only cite actually used documents
Added reference propagation from tools through agents to final output
Changes
References Loading Fix
climsight.py: Removed if not references guard that was always false (dict was pre-initialized)
references.yml: Fixed YAML structure - all keys now properly indented under references:, fixed escaped quotes
Key Mismatches Fixed
climsight_engine.py:
filtered_events_square → filter_events_within_square
iccp_climate_data → iccp_climate_model
Added DestinE → destine_climate_model mapping
RAG Source Tracking
rag.py: query_rag() now returns (response, sources_list) tuple with actual document sources
climsight_engine.py: RAG agents only add references for documents actually retrieved, not all references in category
Tool Reference Propagation
era5_climatology_tool.py: Added reference field to output
era5_retrieval_tool.py: Added reference field to both cache hit and fresh download outputs
get_data_components.py: Added references list from data sources with fallback references
data_analysis_agent.py: Collects references from all tool outputs and propagates via state.references
climsight_engine.py: Merges output['references'] into references['used'] for display in References tab