Merged
Conversation
…d data analysis This commit restructures the multi-agent workflow to improve modularity and enable true parallel execution of information gathering agents. Key Changes: - Split smart_agent responsibilities: * smart_agent: Info gathering only (Wikipedia, RAG, ECOCROP) * data_analysis_agent: Data extraction & visualization (stub for now) - Updated routing logic in climsight_engine.py: * smart_agent now runs in true parallel with other agents (not triggered from data_agent) * Removed route_fromdata() function * All parallel agents → data_analysis_agent → combine_agent - Fixed NoSessionContext errors: * Added exception handling in stream_handler.py * Added exception handling in streamlit_interface.py * Parallel agents can now safely call update_progress() from worker threads - Added data_analysis_response field to AgentState Files modified: 5 Files added: 1
- Removed prompt sections referencing get_data_components and python_repl - Removed get_data_components tool definition (244 lines) - Removed tool output processing for get_data_components and python_repl - Prompt now correctly matches available tools (wikipedia_search, RAG_search, ECOCROP_search only) This fixes the mismatch where the prompt instructed the model to use tools that were no longer exposed in the agent's tool list.
…l improvements Major changes: - Implement full data_analysis_agent with dynamic tool prompt based on config - Make ERA5 mandatory when enabled - use as ground truth for climate model validation - Remove redundant get_data_components when Python_REPL is enabled - Preserve user query in analysis_brief with "USER QUESTION:" header - Add configurable filter step via use_filter_step in config - Accumulate multiple Wikipedia/RAG results in smart_agent (was overwriting) - Add Wikipedia call limit (10 max) to prevent excessive API calls New files: - agent_helpers.py: Helper utilities for tool-based agents - sandbox_utils.py: Sandbox directory management - config.py: Configuration utilities - utils.py: Logging and history utilities - tools/era5_retrieval_tool.py: ERA5 data download tool - tools/get_data_components.py: Climate data extraction tool - tools/visualization_tools.py: File listing and helper tools - tools/reflection_tools.py: Agent reflection tool - tools/package_tools.py: Package installation tool Type changes in AgentState: - wikipedia_tool_response: str -> list (accumulate multiple results) - rag_search_response: str -> list (accumulate multiple results) - Added: data_analysis_prompt_text, data_analysis_images, thread_id, uuid_main_dir, results_dir, climate_data_dir, era5_data_dir Python REPL improvements: - Rewrite to use JupyterKernelExecutor (PangaeaGPT pattern) - Auto-load datasets from sandbox paths - Better plot detection and results directory handling
- Fix NextGEMSProvider: normalize negative longitudes to 0-360 range before KDTree query (fixes wrong temperatures for western hemisphere) - Remove "Provide additional information" toggle, always show extra info - Minor config and gitignore updates
add DestinE config (time periods, variable mapping/suffixes) and provider implementation with unstructured grid interpolation and unit conversions wire DestinE into provider factory and availability list gate ERA5 retrieval tool on Arraylake API key, pass via config from Streamlit UI allow era5 retrieval tool creation with bound API key while keeping env-based fallback
- Fix regex to handle leading whitespace before code blocks (python_repl.py) - Reset is_initialized flag on kernel restart to re-run initialization (python_repl.py) - Add sandbox_path parameter for relative path resolution (image_viewer.py) - Pass sandbox_path in data_analysis_agent.py call - Implement atomic writes for ERA5 cache to prevent corruption (era5_retrieval_tool.py)
koldunovn
approved these changes
Feb 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
a new data-analysis pipeline, ERA5 tooling, and a new DestinE climate data source, plus sandboxed analysis outputs and UI/CLI wiring
Summary
This PR introduces a major refactoring of the agent architecture, separating the smart agent into focused information-gathering and data-analysis components, while adding comprehensive ERA5 observational data integration and a new high-resolution DestinE climate data provider.
Changes
New Data Analysis Agent & Workflow Integration
Sandbox & Session Management
ERA5 Tooling
New Analysis Tools
Python REPL Upgrade
DestinE Climate Data Provider
Smart Agent Refactoring
Configuration & Dependencies
UI/CLI Updates
Diff Summary
28 files changed, 3601 insertions(+), 655 deletions(-)
New files (11):
Modified files (17):
Authors: @dmpantiu , @kuivi
Co-authored-by: @dmpantiu
Co-authored-by: @kuivi
p.s.
Optional: download DestinE data (large ~12 GB, not downloaded by default)
python download_data.py DestinE