Skip to content

Fix references system: proper loading, structure, and source tracking#193

Merged
koldunovn merged 1 commit intomainfrom
references
Feb 4, 2026
Merged

Fix references system: proper loading, structure, and source tracking#193
koldunovn merged 1 commit intomainfrom
references

Conversation

@kuivi
Copy link
Copy Markdown
Collaborator

@kuivi kuivi commented Feb 4, 2026

  • Fix references.yml not being loaded (removed always-false guard condition)
  • Fix references.yml YAML structure (proper indentation, all keys under references:)
  • Fix YAML syntax error (escaped quotes in fetch_land_use entry)
  • Fix key mismatches: filtered_events_square → filter_events_within_square, iccp_climate_data → iccp_climate_model, added DestinE mapping
  • Expand IPCC references with all AR6 working group reports
  • Modify query_rag() to return (response, sources) tuple for source tracking
  • Update RAG agents to only add references for actually used sources
  • Add reference fields to ERA5 tools and get_data_components outputs
  • Collect and propagate references through data_analysis_agent to final output
  • Merge agent-collected references into references['used'] for display
    Fix references system and add proper source tracking

Summary
Fixed critical bug where references.yml was never loaded due to faulty condition
Fixed YAML structure issues preventing proper reference lookup
Implemented source tracking for RAG queries to only cite actually used documents
Added reference propagation from tools through agents to final output
Changes
References Loading Fix
climsight.py: Removed if not references guard that was always false (dict was pre-initialized)
references.yml: Fixed YAML structure - all keys now properly indented under references:, fixed escaped quotes
Key Mismatches Fixed
climsight_engine.py:
filtered_events_square → filter_events_within_square
iccp_climate_data → iccp_climate_model
Added DestinE → destine_climate_model mapping
RAG Source Tracking
rag.py: query_rag() now returns (response, sources_list) tuple with actual document sources
climsight_engine.py: RAG agents only add references for documents actually retrieved, not all references in category
Tool Reference Propagation
era5_climatology_tool.py: Added reference field to output
era5_retrieval_tool.py: Added reference field to both cache hit and fresh download outputs
get_data_components.py: Added references list from data sources with fallback references
data_analysis_agent.py: Collects references from all tool outputs and propagates via state.references
climsight_engine.py: Merges output['references'] into references['used'] for display in References tab

- Fix references.yml not being loaded (removed always-false guard condition)
- Fix references.yml YAML structure (proper indentation, all keys under references:)
- Fix YAML syntax error (escaped quotes in fetch_land_use entry)
- Fix key mismatches: filtered_events_square → filter_events_within_square,
  iccp_climate_data → iccp_climate_model, added DestinE mapping
- Expand IPCC references with all AR6 working group reports
- Modify query_rag() to return (response, sources) tuple for source tracking
- Update RAG agents to only add references for actually used sources
- Add reference fields to ERA5 tools and get_data_components outputs
- Collect and propagate references through data_analysis_agent to final output
- Merge agent-collected references into references['used'] for display
@kuivi kuivi requested review from dmpantiu and koldunovn February 4, 2026 15:49
@koldunovn koldunovn merged commit c199dbd into main Feb 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants