Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions cron-scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# OMEGA Cron Scripts

Automated cron job scripts for continuous testing and CDash reporting of OMEGA ocean modeling projects across multiple HPC systems.

## Overview

This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components:
This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two types of OMEGA tests:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.


- **Omega** - Next-generation ocean model
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Omega** - Next-generation ocean model
- **Omega CTests**

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

- **Polaris** - MPAS-Ocean model with Omega integration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Polaris** - MPAS-Ocean model with Omega integration
- **Polaris** - Omega tests on MPAS meshes```

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.


## Supported Systems

| Machine | Location | Compilers |
|---------|----------|-----------|
| Frontier | ORNL | craygnu, craycray, crayamd (with mphipcc variants) |
| Chrysalis | ANL (LCRC) | gnu, intel |
| pm-gpu | NERSC (Perlmutter GPU) | gnugpu |
| pm-cpu | NERSC (Perlmutter CPU) | gnu |

## Repository Structure

```
cron-scripts/
├── launch_all.sh # Main entry point
├── machines/ # Machine-specific configurations
│ ├── config_machine.sh # Auto-detection dispatcher
│ ├── config_frontier.sh
│ ├── config_chrysalis.sh
│ ├── config_pm-gpu.sh
│ └── config_pm-cpu.sh
└── tasks/ # Scheduled job definitions
├── omega_cdash/ # Omega model CDash testing
│ ├── launch_omega_cdash.sh
│ └── job_*.sbatch
└── polaris_cdash/ # Polaris model CDash testing
├── launch_polaris_ctest.sh
├── polaris_cdash.py
└── CTestScript.txt
```

## Usage

### Run on auto-detected machine

```bash
./launch_all.sh
```

### Run on a specific machine

```bash
./launch_all.sh -m frontier
./launch_all.sh -m chrysalis
./launch_all.sh -m pm-gpu
./launch_all.sh -m pm-cpu
```

### Set up in crontab

```bash
# Run daily at 1 AM
0 1 * * * /path/to/cron-scripts/launch_all.sh
```

## How It Works

1. `launch_all.sh` auto-detects the machine via hostname or accepts a `-m` flag
2. Sources the appropriate machine configuration (compilers, paths, modules)
3. Uses file locking to prevent concurrent executions
4. Discovers and executes all `launch*.sh` scripts in task subdirectories
5. Each task clones/updates repos, submits SBATCH jobs, and reports to CDash

## Environment Variables

| Variable | Description |
|----------|-------------|
| `CRONJOB_BASEDIR` | Root directory for job outputs |
| `CRONJOB_MACHINE` | Detected/specified machine name |
| `CRONJOB_LOGDIR` | Log directory location |
| `E3SM_COMPILERS` | Space-separated list of compilers to test |

## Adding a New Machine

1. Create `machines/config_<machine>.sh` with:
- `CRONJOB_BASEDIR` path
- `E3SM_COMPILERS` list
- Module loads and environment setup
2. Add hostname pattern to `machines/config_machine.sh`
3. Create machine-specific SBATCH scripts in task directories if needed

## Adding a New Task

1. Create a new directory under `tasks/`
2. Add a `launch_<taskname>.sh` script
3. The script will be auto-discovered and executed by `launch_all.sh`

## CDash Integration

Test results are submitted to:
- E3SM project: https://my.cdash.org/submit.php?project=E3SM
- Omega project: https://my.cdash.org/submit.php?project=omega
60 changes: 60 additions & 0 deletions cron-scripts/launch_all.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/usr/bin/env bash
set -eo pipefail

HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"

# --- Parse command-line arguments ---
CLI_MACHINE=""
while [[ $# -gt 0 ]]; do
case "$1" in
-m|--machine)
CLI_MACHINE="$2"
shift 2
;;
*)
echo "ERROR: Unknown option '$1'" >&2
echo "Usage: $SCRIPT_NAME [-m|--machine MACHINE_NAME]"
exit 1
;;
esac
done

echo "[$(date)] Starting $SCRIPT_NAME"

# set CRONJOB_BASEDIR and machine-specific variables
# pass -m through so config_machine.sh uses CLI override if provided
if [[ -n "$CLI_MACHINE" ]]; then
source "${HERE}/machines/config_machine.sh" -m "$CLI_MACHINE"
else
source "${HERE}/machines/config_machine.sh"
fi

export CRONJOB_LOGDIR="${CRONJOB_BASEDIR}/logs"
mkdir -p "$CRONJOB_LOGDIR"

export CRONJOB_DATE=$(date +"%d")
export CRONJOB_TIME=$(date +"%T")

LOCKFILE="/tmp/${USER}_cronjob.lock"
exec 9>"$LOCKFILE"
if ! flock -n 9; then
echo "[$(date)] launch_all.sh is already running, exiting."
exit 0
fi
#LOCKFILE="${HERE}/cronjob.lock"
#exec 9>"$LOCKFILE"
#if ! flock -n 9; then
# echo "[$(date)] launch_all.sh is already running, exiting."
# exit 0
#fi

# Run all launch*.sh scripts under immediate subdirectories of $HERE/tasks
while IFS= read -r script; do
/bin/bash "$script"
done < <(
find "$HERE/tasks" -mindepth 2 -maxdepth 2 \
-type f -name 'launch*.sh' | sort
)

echo "[$(date)] Finished $SCRIPT_NAME"
9 changes: 9 additions & 0 deletions cron-scripts/machines/config_chrysalis.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
set -eo pipefail

source /etc/bashrc

export CRONJOB_BASEDIR=/lcrc/globalscratch/${USER}/cronjobs
export E3SM_COMPILERS="gnu intel"

mkdir -p "$CRONJOB_BASEDIR"
16 changes: 16 additions & 0 deletions cron-scripts/machines/config_frontier.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/usr/bin/env bash

set -eo pipefail

module load cray-python cmake

export all_proxy=socks://proxy.ccs.ornl.gov:3128/
export ftp_proxy=ftp://proxy.ccs.ornl.gov:3128/
export http_proxy=http://proxy.ccs.ornl.gov:3128/
export https_proxy=http://proxy.ccs.ornl.gov:3128/
export no_proxy='localhost,127.0.0.0/8,*.ccs.ornl.gov'

export CRONJOB_BASEDIR=/lustre/orion/cli115/scratch/${USER}/cronjobs
export E3SM_COMPILERS="craygnu-mphipcc craycray-mphipcc crayamd-mphipcc craygnu craycray crayamd"

mkdir -p "$CRONJOB_BASEDIR"
72 changes: 72 additions & 0 deletions cron-scripts/machines/config_machine.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/usr/bin/env bash
set -eo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# --- Parse command-line arguments ---
usage() {
echo "Usage: $(basename "$0") [-m|--machine MACHINE_NAME] [-h|--help]"
echo " -m, --machine Override the auto-detected machine name"
echo " -h, --help Show this help message"
exit "${1:-0}"
}

CLI_MACHINE=""
while [[ $# -gt 0 ]]; do
case "$1" in
-m|--machine)
CLI_MACHINE="$2"
shift 2
;;
-h|--help)
usage 0
;;
*)
echo "ERROR: Unknown option '$1'" >&2
usage 1
;;
esac
done

# --- Get a stable hostname / FQDN (try multiple methods) ---
get_fqdn() {
local fqdn=""
fqdn="$(hostname -f 2>/dev/null || true)"
if [[ -z "$fqdn" || "$fqdn" == "(none)" ]]; then
fqdn="$(hostname --fqdn 2>/dev/null || true)"
fi
if [[ -z "$fqdn" || "$fqdn" == "(none)" ]]; then
fqdn="$(hostname 2>/dev/null || true)"
fi
echo "$fqdn"
}

FQDN="$(get_fqdn)"

# --- Determine CRONJOB_MACHINE ---
if [[ -n "$CLI_MACHINE" ]]; then
# Command-line argument takes highest priority
CRONJOB_MACHINE="$CLI_MACHINE"
else
# Fall back to FQDN-based detection
CRONJOB_MACHINE="unknown"
case "$FQDN" in
*.frontier.olcf.ornl.gov)
CRONJOB_MACHINE="frontier"
;;
*.polaris.alcf.anl.gov)
CRONJOB_MACHINE="polaris"
;;
*.perlmutter.nersc.gov)
CRONJOB_MACHINE="pm-gpu"
;;
*.lcrc.anl.gov)
CRONJOB_MACHINE="chrysalis"
;;
esac
fi

export CRONJOB_MACHINE
echo "FQDN=$FQDN"
echo "CRONJOB_MACHINE=$CRONJOB_MACHINE"

source "${SCRIPT_DIR}/config_${CRONJOB_MACHINE}.sh"
10 changes: 10 additions & 0 deletions cron-scripts/machines/config_pm-cpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -eo pipefail

module load cray-python cmake

export CRONJOB_BASEDIR=/pscratch/sd/${USER:0:1}/${USER}/omega/cronjobs_pm-cpu
export E3SM_COMPILERS="gnu"

mkdir -p "$CRONJOB_BASEDIR"
10 changes: 10 additions & 0 deletions cron-scripts/machines/config_pm-gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -eo pipefail

module load cray-python cmake

export CRONJOB_BASEDIR=/pscratch/sd/${USER:0:1}/${USER}/omega/cronjobs_pm-gpu
export E3SM_COMPILERS="gnugpu"

mkdir -p "$CRONJOB_BASEDIR"
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --qos=high
#SBATCH --time 02:00:00

source /etc/bashrc

exec bash $(dirname "$0")/run_omega_cdash.sh
10 changes: 10 additions & 0 deletions cron-scripts/tasks/omega_cdash/job_frontier_omega_cdash.sbatch
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH -q debug
#SBATCH --account=cli115
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get account from a place that's common with that used to generate polaris job scripts, currentlypolaris/machines/*.cfg option parallel/account?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbegeman , I think it would be better to get the account from a common place. Do any of the polaris/machines/*.cfg files have a parallel/account entry? I couldn’t find any account information in those files. While I do see a group entry in the cfg files, some values of group do not match the actual account name—for example, e3sm_g on PM-GPU.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like they don't but they could/should, e.g.,

if config.has_option('parallel', 'account'):

#SBATCH --time 02:00:00


source /etc/bashrc

exec bash $(dirname "$0")/run_omega_cdash.sh
16 changes: 16 additions & 0 deletions cron-scripts/tasks/omega_cdash/job_pm-cpu_omega_cdash.sbatch
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash -l
#SBATCH --job-name=OmegaSCron
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --output=/global/cfs/cdirs/e3sm/omega/cronjbos_pm-cpu/logs/OmegaSCronCPU_%j.out
#SBATCH --error=/global/cfs/cdirs/e3sm/omega/cronjobs_pm-cpu/logs/OmegaSCronCPU_%j.err
#SBATCH --constraint=cpu
#SBATCH --account=e3sm
#SBATCH --qos regular
#SBATCH --exclusive
#SBATCH --time 01:00:00


source /etc/bashrc

exec bash $(dirname "$0")/run_omega_cdash.sh
17 changes: 17 additions & 0 deletions cron-scripts/tasks/omega_cdash/job_pm-gpu_omega_cdash.sbatch
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash -l
#SBATCH --job-name=OmegaSCron
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=4
#SBATCH --output=/global/cfs/cdirs/e3sm/omega/cronjobs_pm-gpu/logs/OmegaSCronGPU_%j.out
#SBATCH --error=/global/cfs/cdirs/e3sm/omega/cronjobs_pm-gpu/logs/OmegaSCronGPU_%j.err
#SBATCH --constraint=gpu
#SBATCH --account=e3sm_g
#SBATCH --qos regular
#SBATCH --exclusive
#SBATCH --time 01:00:00


source /etc/bashrc

exec bash $(dirname "$0")/run_omega_cdash.sh
44 changes: 44 additions & 0 deletions cron-scripts/tasks/omega_cdash/launch_omega_cdash.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env bash
set -eo pipefail

HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"
echo "[$(date)] Starting $SCRIPT_NAME"

export OMEGA_CDASH_BASEDIR=${CRONJOB_BASEDIR}/tasks/omega_cdash
export TESTROOT="${OMEGA_CDASH_BASEDIR}/tests"
mkdir -p $OMEGA_CDASH_BASEDIR
mkdir -p $TESTROOT

export OMEGA_HOME="${OMEGA_CDASH_BASEDIR}/Omega"

if [[ ! -d $OMEGA_HOME ]]; then
cd ${OMEGA_CDASH_BASEDIR}
git clone https://github.com/E3SM-Project/Omega.git
fi

cd ${OMEGA_HOME}
git checkout develop
git fetch origin
git reset --hard origin/develop
git submodule update --init --recursive || true

if [[ ! -f ${TESTROOT}/OmegaMesh.nc ]]; then
wget -O ${TESTROOT}/OmegaMesh.nc https://web.lcrc.anl.gov/public/e3sm/inputdata/ocn/mpas-o/oQU240/ocean.QU.240km.151209.nc
fi

if [[ ! -f ${TESTROOT}/OmegaSphereMesh.nc ]]; then
wget -O ${TESTROOT}/OmegaSphereMesh.nc https://web.lcrc.anl.gov/public/e3sm/polaris/ocean/polaris_cache/global_convergence/icos/cosine_bell/Icos480/init/initial_state.230220.nc
fi

if [[ ! -f ${TESTROOT}/OmegaPlanarMesh.nc ]]; then
wget -O ${TESTROOT}/OmegaPlanarMesh.nc https://gist.github.com/mwarusz/f8caf260398dbe140d2102ec46a41268/raw/e3c29afbadc835797604369114321d93fd69886d/PlanarPeriodic48x48.nc
fi

sbatch \
--job-name=OmegaCdash \
--output="$CRONJOB_LOGDIR/omega_cdash_%j.out" \
--error="$CRONJOB_LOGDIR/omega_cdash_%j.err" \
${HERE}/job_${CRONJOB_MACHINE}_omega_cdash.sbatch

echo "[$(date)] Finished $SCRIPT_NAME"
Loading
Loading