Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ data_climate_foresight.tar
data/
*.ipynb
__pycache__
climsight.log
climsight.logwhe
climsight_evaluation.log
cache/
evaluation/evaluation_report.txt
Expand All @@ -12,4 +12,5 @@ rag_articles/
.*
*.log
venv311
venv
venv
tmp/
149 changes: 54 additions & 95 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

ClimSight is an advanced tool that integrates Large Language Models (LLMs) with climate data to provide localized climate insights for decision-making. ClimSight transforms complex climate data into actionable insights for agriculture, urban planning, disaster management, and policy development.

The target audience includes researchers, providers of climate services, policymakers, agricultural planners, urban developers, and other stakeholders who require detailed climate information to support decision-making. ClimSight is designed to democratize access to climate
data, empowering users with insights relevant to their specific contexts.
The target audience includes researchers, providers of climate services, policymakers, agricultural planners, urban developers, and other stakeholders who require detailed climate information to support decision-making. ClimSight is designed to democratize access to climate data, empowering users with insights relevant to their specific contexts.

![Image](https://github.com/user-attachments/assets/f9f89735-ef08-4c91-bc03-112c8e4c0896)

Expand All @@ -15,61 +14,11 @@ ClimSight distinguishes itself through several key advancements:
- **Real-World Applications**: ClimSight is validated through practical examples, such as assessing climate risks for specific agricultural activities and urban planning scenarios.


## Installation Options
## Installation

You can use ClimSight in three ways:
1. Run a pre-built Docker container (simplest approach)
2. Build and run a Docker container from source
3. Install the Python package (via pip or conda/mamba)
### Recommended: Building from source with conda/mamba

Using ClimSight requires an OpenAI API key unless using the `skipLLMCall` mode for testing. The API key is only needed when running the application, not during installation.

## Batch Processing

For batch processing of climate questions, the `sequential` directory contains specialized tools for generating, validating, and processing questions in bulk. These tools are particularly useful for research and analysis requiring multiple climate queries. See the [sequential/README.md](sequential/README.md) for detailed usage instructions.

## 1. Running with Docker (Pre-built Container)

The simplest way to get started is with our pre-built Docker container:

```bash
# Make sure your OpenAI API key is set as an environment variable
export OPENAI_API_KEY="your-api-key-here"

# Pull and run the container
docker pull koldunovn/climsight:stable
docker run -p 8501:8501 -e OPENAI_API_KEY=$OPENAI_API_KEY koldunovn/climsight:stable
```

Then open `http://localhost:8501/` in your browser.

## 2. Building and Running from Source with Docker

If you prefer to build from the latest source:

```bash
# Clone the repository
git clone https://github.com/CliDyn/climsight.git
cd climsight

# Download required data
python download_data.py

# Build and run the container
docker build -t climsight .
docker run -p 8501:8501 -e OPENAI_API_KEY=$OPENAI_API_KEY climsight
```

Visit `http://localhost:8501/` in your browser once the container is running.

For testing without OpenAI API calls:
```bash
docker run -p 8501:8501 -e STREAMLIT_ARGS="skipLLMCall" climsight
```

## 3. Python Package Installation

### Option A: Building from source with conda/mamba
This is the recommended installation method to get the latest features and updates.

```bash
# Clone the repository
Expand All @@ -82,47 +31,53 @@ conda activate climsight

# Download required data
python download_data.py

# Optional: download DestinE data (large ~12 GB, not downloaded by default)
python download_data.py DestinE
```

### Option B: Using pip
### Alternative: Using pip from source

It's recommended to create a virtual environment to avoid dependency conflicts:
```bash
# Option 1: Install from source
# Clone the repository
git clone https://github.com/CliDyn/climsight.git
cd climsight

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

# Install ClimSight
pip install -e .
# Install dependencies
pip install -r requirements.txt

# Download required data
python download_data.py

# Optional: download DestinE data (large ~12 GB, not downloaded by default)
python download_data.py DestinE
```

Or if you prefer to set up without cloning the repository:
### Running with Docker (Stable Release v1.0.0)

The Docker container provides a stable release (v1.0.0) of ClimSight. For the latest features, please install from source as described above.

```bash
# Option 2: Install from PyPI
# Create and activate a virtual environment
python -m venv climsight_env
source climsight_env/bin/activate # On Windows: climsight_env\Scripts\activate
# Make sure your OpenAI API key is set as an environment variable
export OPENAI_API_KEY="your-api-key-here"

# Install the package
pip install climsight
# Pull and run the container
docker pull koldunovn/climsight:stable
docker run -p 8501:8501 -e OPENAI_API_KEY=$OPENAI_API_KEY koldunovn/climsight:stable
```

# Create a directory for data
mkdir -p climsight
cd climsight
Then open `http://localhost:8501/` in your browser.

# Download necessary configuration files
wget https://raw.githubusercontent.com/CliDyn/climsight/main/data_sources.yml
wget https://raw.githubusercontent.com/CliDyn/climsight/main/download_data.py
wget https://raw.githubusercontent.com/CliDyn/climsight/main/config.yml
### Using pip from PyPI (Stable Release v1.0.0)

# Download the required data (about 8 GB)
python download_data.py
The PyPI package provides a stable release (v1.0.0) of ClimSight. For the latest features, please install from source as described above.

```bash
pip install climsight
```

## Configuration
Expand All @@ -131,50 +86,54 @@ ClimSight will automatically use a `config.yml` file from the current directory.

```yaml
# Key settings you can modify in config.yml:
# - LLM model (gpt-4, ...)
# - LLM model (gpt-4, gpt-5, ...)
# - Climate data sources
# - RAG database configuration
# - Agent parameters
# - ERA5 data retrieval settings
```
## Running ClimSight

### If installed with conda/mamba from source:
## API Keys

```bash
# Run from the repository root
streamlit run src/climsight/climsight.py
```
### OpenAI API Key

### If installed with pip:
ClimSight requires an OpenAI API key for LLM functionality. You can set it as an environment variable:

```bash
# Make sure you're in the directory with your data and config
climsight
export OPENAI_API_KEY="your-api-key-here"
```

You can optionally set your OpenAI API key as an environment variable:
Alternatively, you can enter your API key directly in the browser interface when prompted.

### Arraylake API Key (Optional - for ERA5 Data)

If you want to use ERA5 time series data retrieval (enabled via the "Enable ERA5 data" toggle in the UI), you need an Arraylake API key from [Earthmover](https://earthmover.io/). This allows downloading ERA5 reanalysis data for detailed historical climate analysis.

```bash
export OPENAI_API_KEY="your-api-key-here"
export ARRAYLAKE_API_KEY="your-arraylake-api-key-here"
```

Otherwise, you can enter your API key directly in the browser interface when prompted.
You can also enter the Arraylake API key in the browser interface when the ERA5 data option is enabled.

### Testing without an OpenAI API key:
## Running ClimSight

```bash
# From source:
streamlit run src/climsight/climsight.py skipLLMCall

# Or if installed with pip:
climsight skipLLMCall
# Run from the repository root
streamlit run src/climsight/climsight.py
```

The application will open in your browser automatically. Just type your climate-related questions and press "Generate" to get insights.

<img width="800" alt="ClimSight Interface" src="https://github.com/koldunovn/climsight/assets/3407313/569a4c38-a601-4014-b10d-bd34c59b91bb">

## Batch Processing

For batch processing of climate questions, the `sequential` directory contains specialized tools for generating, validating, and processing questions in bulk. These tools are particularly useful for research and analysis requiring multiple climate queries. See the [sequential/README.md](sequential/README.md) for detailed usage instructions.

## Citation

If you use or refer to ClimSight in your work, please cite:

Kuznetsov, I., Jost, A.A., Pantiukhin, D. et al. Transforming climate services with LLMs and multi-source data integration. _npj Clim. Action_ **4**, 97 (2025). https://doi.org/10.1038/s44168-025-00300-y

Koldunov, N., Jung, T. Local climate services for all, courtesy of large language models. _Commun Earth Environ_ **5**, 13 (2024). https://doi.org/10.1038/s43247-023-01199-1
61 changes: 58 additions & 3 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,27 @@
#model_type: "openai" #"openai / local / aitta
llm_rag:
model_type: "openai"
model_name: "gpt-4.1-nano" # used only for RAGs
model_name: "gpt-5-mini" # used only for RAGs
llm_smart: #used only in smart_agent
model_type: "openai"
model_name: "gpt-4.1-nano" # used only for smart agent
model_name: "gpt-5.2" # used only for smart agent
llm_combine: #used only in combine_agent and intro
model_type: "openai"
model_name: "gpt-4.1-nano" # used only for combine agent ("mkchaou/climsight-calm_ft_Q3_13k")
model_name: "gpt-5.2" # used only for combine agent ("mkchaou/climsight-calm_ft_Q3_13k")
llm_dataanalysis: #used only in data_analysis_agent
model_type: "openai"
model_name: "gpt-5.2"
use_filter_step: true # Set to false to skip context filtering LLM call
climatemodel_name: "AWI_CM"
llmModeKey: "agent_llm" #"agent_llm" #"direct_llm"
use_smart_agent: false
use_era5_data: false # Download ERA5 time series from CDS API (requires credentials)
use_powerful_data_analysis: false

# ERA5 Climatology Configuration (pre-computed observational baseline)
era5_climatology:
enabled: true # Always use ERA5 climatology as ground truth baseline
path: "data/era5/era5_climatology_2015_2025.zarr" # Path to pre-computed climatology

# Climate Data Source Configuration
# Options: "nextGEMS", "ICCP", "AWI_CM"
Expand Down Expand Up @@ -126,6 +137,50 @@ climate_data_sources:
longitude: "lon"
time: "month"

DestinE:
enabled: true
coordinate_system: "unstructured"
description: "DestinE IFS-FESOM high-resolution climate simulations (SSP3-7.0)"
data_path: "./data/DestinE/"
# Time periods configuration
time_periods:
historical:
pattern: "ifs-fesom_baseline_hist_sfc_high_monthly_1990_2014_mean"
years_of_averaging: "1990-2014"
description: "DestinE IFS-FESOM historical baseline simulation"
is_main: true
source: "Destination Earth Climate DT, IFS-FESOM coupled model"
2015_2019:
pattern: "ifs-fesom_projections_ssp3-7.0_sfc_high_monthly_2015_2019_mean"
years_of_averaging: "2015-2019"
description: "DestinE IFS-FESOM SSP3-7.0 near-term projection"
is_main: false
source: "Destination Earth Climate DT, IFS-FESOM coupled model, SSP3-7.0"
2020_2029:
pattern: "ifs-fesom_projections_ssp3-7.0_sfc_high_monthly_2020_2029_mean"
years_of_averaging: "2020-2029"
description: "DestinE IFS-FESOM SSP3-7.0 mid-term projection"
is_main: false
source: "Destination Earth Climate DT, IFS-FESOM coupled model, SSP3-7.0"
2040_2049:
pattern: "ifs-fesom_projections_ssp3-7.0_sfc_high_monthly_2040_2049_mean"
years_of_averaging: "2040-2049"
description: "DestinE IFS-FESOM SSP3-7.0 far-term projection"
is_main: false
source: "Destination Earth Climate DT, IFS-FESOM coupled model, SSP3-7.0"
# Variable mapping: display_name -> netcdf_variable
variable_mapping:
Temperature: avg_2t
Total Precipitation: avg_tprate
Wind U: avg_10u
Wind V: avg_10v
# Variable file suffixes (to construct full filenames)
variable_suffixes:
avg_2t: "_avg_2t.nc"
avg_tprate: "_avg_tprate.nc"
avg_10u: "_avg_10u.nc"
avg_10v: "_avg_10v.nc"

# Legacy settings (kept for backwards compatibility, will be migrated automatically)
data_settings:
data_path: "./data/"
Expand Down
6 changes: 6 additions & 0 deletions data_sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -120,4 +120,10 @@ sources:
url: 'https://swift.dkrz.de/v1/dkrz_035d8f6ff058403bb42f8302e6badfbc/climsight/awi_cm.zip?temp_url_sig=f40cc2f349b24482a6f7247d173ca194fad28950&temp_url_expires=2299-10-02T09:52:13Z'
archive_type: 'zip'
subdir: './'
citation:

- filename: 'DestinE.zip'
url: 'https://swift.dkrz.de/v1/dkrz_035d8f6ff058403bb42f8302e6badfbc/climsight/DestinE.zip?temp_url_sig=f60ad2be0bf65479f489611255c066148dc4741c&temp_url_expires=2053-06-19T11:20:40Z'
archive_type: 'zip'
subdir: './'
citation:
16 changes: 16 additions & 0 deletions download_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,11 @@ def main():
# Parse command-line argument (--source_files)
parser = argparse.ArgumentParser(description="Download and extract the raw source files of the RAG.")
parser.add_argument('--source_files', type=bool, default=False, help='Whether to download and extract source files (IPCC text reports).')
parser.add_argument(
'datasets',
nargs='*',
help="Optional extra datasets to include (e.g. DestinE).",
)
#parser.add_argument('--CMIP_OIFS', type=bool, default=False, help='Whether to download CMIP6 low resolution AWI model data and ECE4/OIFS data.')
args = parser.parse_args()

Expand All @@ -112,6 +117,11 @@ def main():
sources = [d for d in sources if d['filename'] != 'ipcc_text_reports.zip']
#if not args.CMIP_OIFS:
# sources = [d for d in sources if d['filename'] != 'data_climate_foresight.zip']

# Skip DestinE unless explicitly requested (large dataset).
requested = {name.strip().lower() for name in args.datasets}
if 'destine' not in requested:
sources = [d for d in sources if d['filename'] != 'DestinE.zip']

#make subdirs list and clean it
subdirs = []
Expand All @@ -136,6 +146,12 @@ def main():
url = entry['url']
subdir = os.path.join(base_path, entry['subdir'])

if not url:
files_skiped.append(file)
urls_skiped.append(url)
subdirs_skiped.append(subdir)
continue

if download_file(url, file):
extract_arch(file, subdir)
files_downloaded.append(file)
Expand Down
Loading