CliDyn · koldunovn · Feb 3, 2026 · Jan 16, 2026 · Jan 16, 2026 · Jan 20, 2026
diff --git a/.gitignore b/.gitignore
@@ -2,7 +2,7 @@ data_climate_foresight.tar
 data/ 
 *.ipynb
 __pycache__
-climsight.log
+climsight.logwhe
 climsight_evaluation.log
 cache/
 evaluation/evaluation_report.txt
@@ -12,4 +12,5 @@ rag_articles/
 .*
 *.log
 venv311
-venv
+venv
+tmp/
diff --git a/README.md b/README.md
@@ -2,8 +2,7 @@
 
 ClimSight is an advanced tool that integrates Large Language Models (LLMs) with climate data to provide localized climate insights for decision-making. ClimSight transforms complex climate data into actionable insights for agriculture, urban planning, disaster management, and policy development.
 
-The target audience includes researchers, providers of climate services, policymakers, agricultural planners, urban developers, and other stakeholders who require detailed climate information to support decision-making. ClimSight is designed to democratize access to climate 
-data, empowering users with insights relevant to their specific contexts.
+The target audience includes researchers, providers of climate services, policymakers, agricultural planners, urban developers, and other stakeholders who require detailed climate information to support decision-making. ClimSight is designed to democratize access to climate data, empowering users with insights relevant to their specific contexts.
 
 ![Image](https://github.com/user-attachments/assets/f9f89735-ef08-4c91-bc03-112c8e4c0896)
 
@@ -15,61 +14,11 @@ ClimSight distinguishes itself through several key advancements:
 - **Real-World Applications**: ClimSight is validated through practical examples, such as assessing climate risks for specific agricultural activities and urban planning scenarios.
 
 
-## Installation Options
+## Installation
 
-You can use ClimSight in three ways:
-1. Run a pre-built Docker container (simplest approach)
-2. Build and run a Docker container from source
-3. Install the Python package (via pip or conda/mamba)
+### Recommended: Building from source with conda/mamba
 
-Using ClimSight requires an OpenAI API key unless using the `skipLLMCall` mode for testing. The API key is only needed when running the application, not during installation.
-
-## Batch Processing
-
-For batch processing of climate questions, the `sequential` directory contains specialized tools for generating, validating, and processing questions in bulk. These tools are particularly useful for research and analysis requiring multiple climate queries. See the [sequential/README.md](sequential/README.md) for detailed usage instructions.
-
-## 1. Running with Docker (Pre-built Container)
-
-The simplest way to get started is with our pre-built Docker container:
-
-```bash
-# Make sure your OpenAI API key is set as an environment variable
-export OPENAI_API_KEY="your-api-key-here"
-
-# Pull and run the container
-docker pull koldunovn/climsight:stable
-docker run -p 8501:8501 -e OPENAI_API_KEY=$OPENAI_API_KEY koldunovn/climsight:stable
-```
-
-Then open `http://localhost:8501/` in your browser.
-
-## 2. Building and Running from Source with Docker
-
-If you prefer to build from the latest source:
-
-```bash
-# Clone the repository
-git clone https://github.com/CliDyn/climsight.git
-cd climsight
-
-# Download required data
-python download_data.py
-
-# Build and run the container
-docker build -t climsight .
-docker run -p 8501:8501 -e OPENAI_API_KEY=$OPENAI_API_KEY climsight
-```
-
-Visit `http://localhost:8501/` in your browser once the container is running.
-
-For testing without OpenAI API calls:
-```bash
-docker run -p 8501:8501 -e STREAMLIT_ARGS="skipLLMCall" climsight
-```
-
-## 3. Python Package Installation
-
-### Option A: Building from source with conda/mamba
+This is the recommended installation method to get the latest features and updates.
 
 ```bash
 # Clone the repository
@@ -82,47 +31,53 @@ conda activate climsight
 
 # Download required data
 python download_data.py
+
+# Optional: download DestinE data (large ~12 GB, not downloaded by default)
+python download_data.py DestinE
 ```
 
-### Option B: Using pip
+### Alternative: Using pip from source
 
-It's recommended to create a virtual environment to avoid dependency conflicts:
 ```bash
-# Option 1: Install from source
+# Clone the repository
 git clone https://github.com/CliDyn/climsight.git
 cd climsight
 
 # Create and activate a virtual environment
 python -m venv venv
 source venv/bin/activate  # On Windows: venv\Scripts\activate
 
-# Install ClimSight
-pip install -e .
+# Install dependencies
+pip install -r requirements.txt
+
+# Download required data
 python download_data.py
+
+# Optional: download DestinE data (large ~12 GB, not downloaded by default)
+python download_data.py DestinE
 ```
 
-Or if you prefer to set up without cloning the repository:
+### Running with Docker (Stable Release v1.0.0)
+
+The Docker container provides a stable release (v1.0.0) of ClimSight. For the latest features, please install from source as described above.
 
 ```bash
-# Option 2: Install from PyPI
-# Create and activate a virtual environment
-python -m venv climsight_env
-source climsight_env/bin/activate  # On Windows: climsight_env\Scripts\activate
+# Make sure your OpenAI API key is set as an environment variable
+export OPENAI_API_KEY="your-api-key-here"
 
-# Install the package
-pip install climsight
+# Pull and run the container
+docker pull koldunovn/climsight:stable
+docker run -p 8501:8501 -e OPENAI_API_KEY=$OPENAI_API_KEY koldunovn/climsight:stable
+```
 
-# Create a directory for data
-mkdir -p climsight
-cd climsight
+Then open `http://localhost:8501/` in your browser.
 
-# Download necessary configuration files
-wget https://raw.githubusercontent.com/CliDyn/climsight/main/data_sources.yml
-wget https://raw.githubusercontent.com/CliDyn/climsight/main/download_data.py
-wget https://raw.githubusercontent.com/CliDyn/climsight/main/config.yml
+### Using pip from PyPI (Stable Release v1.0.0)
 
-# Download the required data (about 8 GB)
-python download_data.py
+The PyPI package provides a stable release (v1.0.0) of ClimSight. For the latest features, please install from source as described above.
+
+```bash
+pip install climsight
 ```
 
 ## Configuration
@@ -131,50 +86,54 @@ ClimSight will automatically use a `config.yml` file from the current directory.
 
 ```yaml
 # Key settings you can modify in config.yml:
-# - LLM model (gpt-4, ...)
+# - LLM model (gpt-4, gpt-5, ...)
 # - Climate data sources
 # - RAG database configuration
 # - Agent parameters
+# - ERA5 data retrieval settings
 ```
-## Running ClimSight
 
-### If installed with conda/mamba from source:
+## API Keys
 
-```bash
-# Run from the repository root
-streamlit run src/climsight/climsight.py
-```
+### OpenAI API Key
 
-### If installed with pip:
+ClimSight requires an OpenAI API key for LLM functionality. You can set it as an environment variable:
 
 ```bash
-# Make sure you're in the directory with your data and config
-climsight
+export OPENAI_API_KEY="your-api-key-here"
 ```
 
-You can optionally set your OpenAI API key as an environment variable:
+Alternatively, you can enter your API key directly in the browser interface when prompted.
+
+### Arraylake API Key (Optional - for ERA5 Data)
+
+If you want to use ERA5 time series data retrieval (enabled via the "Enable ERA5 data" toggle in the UI), you need an Arraylake API key from [Earthmover](https://earthmover.io/). This allows downloading ERA5 reanalysis data for detailed historical climate analysis.
+
 ```bash
-export OPENAI_API_KEY="your-api-key-here"
+export ARRAYLAKE_API_KEY="your-arraylake-api-key-here"
 ```
 
-Otherwise, you can enter your API key directly in the browser interface when prompted.
+You can also enter the Arraylake API key in the browser interface when the ERA5 data option is enabled.
 
-### Testing without an OpenAI API key:
+## Running ClimSight
 
 ```bash
-# From source:
-streamlit run src/climsight/climsight.py skipLLMCall
-
-# Or if installed with pip:
-climsight skipLLMCall
+# Run from the repository root
+streamlit run src/climsight/climsight.py
 ```
 
 The application will open in your browser automatically. Just type your climate-related questions and press "Generate" to get insights.
 
 <img width="800" alt="ClimSight Interface" src="https://github.com/koldunovn/climsight/assets/3407313/569a4c38-a601-4014-b10d-bd34c59b91bb">
 
+## Batch Processing
+
+For batch processing of climate questions, the `sequential` directory contains specialized tools for generating, validating, and processing questions in bulk. These tools are particularly useful for research and analysis requiring multiple climate queries. See the [sequential/README.md](sequential/README.md) for detailed usage instructions.
+
 ## Citation
 
 If you use or refer to ClimSight in your work, please cite:
 
+Kuznetsov, I., Jost, A.A., Pantiukhin, D. et al. Transforming climate services with LLMs and multi-source data integration. _npj Clim. Action_ **4**, 97 (2025). https://doi.org/10.1038/s44168-025-00300-y
+
 Koldunov, N., Jung, T. Local climate services for all, courtesy of large language models. _Commun Earth Environ_ **5**, 13 (2024). https://doi.org/10.1038/s43247-023-01199-1
diff --git a/config.yml b/config.yml
@@ -2,16 +2,27 @@
 #model_type: "openai" #"openai / local / aitta
 llm_rag:
   model_type: "openai" 
-  model_name: "gpt-4.1-nano"  # used only for RAGs
+  model_name: "gpt-5-mini"  # used only for RAGs
 llm_smart: #used  only in smart_agent
   model_type: "openai" 
-  model_name: "gpt-4.1-nano" # used only for smart agent
+  model_name: "gpt-5.2" # used only for smart agent
 llm_combine: #used  only in combine_agent and intro
   model_type: "openai" 
-  model_name: "gpt-4.1-nano" # used only for combine agent ("mkchaou/climsight-calm_ft_Q3_13k")
+  model_name: "gpt-5.2" # used only for combine agent ("mkchaou/climsight-calm_ft_Q3_13k")
+llm_dataanalysis: #used only in data_analysis_agent
+  model_type: "openai"
+  model_name: "gpt-5.2"
+  use_filter_step: true  # Set to false to skip context filtering LLM call
 climatemodel_name: "AWI_CM"
 llmModeKey: "agent_llm" #"agent_llm" #"direct_llm"
 use_smart_agent: false
+use_era5_data: false  # Download ERA5 time series from CDS API (requires credentials)
+use_powerful_data_analysis: false
+
+# ERA5 Climatology Configuration (pre-computed observational baseline)
+era5_climatology:
+  enabled: true  # Always use ERA5 climatology as ground truth baseline
+  path: "data/era5/era5_climatology_2015_2025.zarr"  # Path to pre-computed climatology
 
 # Climate Data Source Configuration
 # Options: "nextGEMS", "ICCP", "AWI_CM"
@@ -126,6 +137,50 @@ climate_data_sources:
       longitude: "lon"
       time: "month"
 
+  DestinE:
+    enabled: true
+    coordinate_system: "unstructured"
+    description: "DestinE IFS-FESOM high-resolution climate simulations (SSP3-7.0)"
+    data_path: "./data/DestinE/"
+    # Time periods configuration
+    time_periods:
+      historical:
+        pattern: "ifs-fesom_baseline_hist_sfc_high_monthly_1990_2014_mean"
+        years_of_averaging: "1990-2014"
+        description: "DestinE IFS-FESOM historical baseline simulation"
+        is_main: true
+        source: "Destination Earth Climate DT, IFS-FESOM coupled model"
+      2015_2019:
+        pattern: "ifs-fesom_projections_ssp3-7.0_sfc_high_monthly_2015_2019_mean"
+        years_of_averaging: "2015-2019"
+        description: "DestinE IFS-FESOM SSP3-7.0 near-term projection"
+        is_main: false
+        source: "Destination Earth Climate DT, IFS-FESOM coupled model, SSP3-7.0"
+      2020_2029:
+        pattern: "ifs-fesom_projections_ssp3-7.0_sfc_high_monthly_2020_2029_mean"
+        years_of_averaging: "2020-2029"
+        description: "DestinE IFS-FESOM SSP3-7.0 mid-term projection"
+        is_main: false
+        source: "Destination Earth Climate DT, IFS-FESOM coupled model, SSP3-7.0"
+      2040_2049:
+        pattern: "ifs-fesom_projections_ssp3-7.0_sfc_high_monthly_2040_2049_mean"
+        years_of_averaging: "2040-2049"
+        description: "DestinE IFS-FESOM SSP3-7.0 far-term projection"
+        is_main: false
+        source: "Destination Earth Climate DT, IFS-FESOM coupled model, SSP3-7.0"
+    # Variable mapping: display_name -> netcdf_variable
+    variable_mapping:
+      Temperature: avg_2t
+      Total Precipitation: avg_tprate
+      Wind U: avg_10u
+      Wind V: avg_10v
+    # Variable file suffixes (to construct full filenames)
+    variable_suffixes:
+      avg_2t: "_avg_2t.nc"
+      avg_tprate: "_avg_tprate.nc"
+      avg_10u: "_avg_10u.nc"
+      avg_10v: "_avg_10v.nc"
+
 # Legacy settings (kept for backwards compatibility, will be migrated automatically)
 data_settings:
     data_path: "./data/"

diff --git a/data_sources.yml b/data_sources.yml
@@ -120,4 +120,10 @@ sources:
     url: 'https://swift.dkrz.de/v1/dkrz_035d8f6ff058403bb42f8302e6badfbc/climsight/awi_cm.zip?temp_url_sig=f40cc2f349b24482a6f7247d173ca194fad28950&temp_url_expires=2299-10-02T09:52:13Z'
     archive_type: 'zip'
     subdir: './'
+    citation:      
+
+  - filename: 'DestinE.zip'
+    url: 'https://swift.dkrz.de/v1/dkrz_035d8f6ff058403bb42f8302e6badfbc/climsight/DestinE.zip?temp_url_sig=f60ad2be0bf65479f489611255c066148dc4741c&temp_url_expires=2053-06-19T11:20:40Z'
+    archive_type: 'zip'
+    subdir: './'
     citation:      
diff --git a/download_data.py b/download_data.py
@@ -97,6 +97,11 @@ def main():
     # Parse command-line argument (--source_files)
     parser = argparse.ArgumentParser(description="Download and extract the raw source files of the RAG.")
     parser.add_argument('--source_files', type=bool, default=False, help='Whether to download and extract source files (IPCC text reports).')
+    parser.add_argument(
+        'datasets',
+        nargs='*',
+        help="Optional extra datasets to include (e.g. DestinE).",
+    )
     #parser.add_argument('--CMIP_OIFS', type=bool, default=False, help='Whether to download CMIP6 low resolution AWI model data and ECE4/OIFS data.')
     args = parser.parse_args()
 
@@ -112,6 +117,11 @@ def main():
         sources = [d for d in sources if d['filename'] != 'ipcc_text_reports.zip']
     #if not args.CMIP_OIFS:
     #    sources = [d for d in sources if d['filename'] != 'data_climate_foresight.zip']
+
+    # Skip DestinE unless explicitly requested (large dataset).
+    requested = {name.strip().lower() for name in args.datasets}
+    if 'destine' not in requested:
+        sources = [d for d in sources if d['filename'] != 'DestinE.zip']
 
     #make subdirs list and clean it
     subdirs = []
@@ -136,6 +146,12 @@ def main():
         url  = entry['url']
         subdir = os.path.join(base_path, entry['subdir'])
 
+        if not url:
+            files_skiped.append(file)
+            urls_skiped.append(url)
+            subdirs_skiped.append(subdir)
+            continue
+
         if download_file(url, file):
             extract_arch(file, subdir)
             files_downloaded.append(file)