Skip to content

adds cron scripts for nightly tests#482

Open
grnydawn wants to merge 7 commits intoE3SM-Project:mainfrom
grnydawn:ykim/cron-scripts
Open

adds cron scripts for nightly tests#482
grnydawn wants to merge 7 commits intoE3SM-Project:mainfrom
grnydawn:ykim/cron-scripts

Conversation

@grnydawn
Copy link
Copy Markdown
Contributor

@grnydawn grnydawn commented Mar 5, 2026

This PR adds cron scripts to Polaris for running nightly Omega and Polaris tests, and initiates discussion regarding this feature.

@xylar xylar added the cron Related to cron jobs (nightly testing) label Mar 5, 2026
@xylar xylar self-assigned this Mar 5, 2026
@xylar xylar requested review from cbegeman and xylar March 5, 2026 16:58
Copy link
Copy Markdown
Collaborator

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grnydawn, this looks reasonable to me. It's a lot to review, so all I did for now is a fairly quick skim. But I'm happy to be involved in updating and maintaining this infrastructure. I'm sure I'll get to know it better in that way.

@xylar
Copy link
Copy Markdown
Collaborator

xylar commented Mar 9, 2026

One more thing. Polaris has its own linting tools. See https://docs.e3sm.org/polaris/main/developers_guide/quick_start.html#code-style-for-polaris.

I'd prefer that we lint the files added here. This would involve creating a polaris development environment and making a small edit to each file (e.g. adding white space) and then letting the linter do its thing.

If need be, we could leave these files out of the lint checking, but I would prefer not to.

Would you like me to take care of the linting?

@xylar
Copy link
Copy Markdown
Collaborator

xylar commented Mar 10, 2026

@grnydawn, here is a commit you could cherry-pick if you want to fix most (maybe all?) of the linting issues:
xylar@65a55ab
If you make me a collaborator on your fork, I can push this to your branch.


## Overview

This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components:
This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two types of OMEGA tests:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.


This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components:

- **Omega** - Next-generation ocean model
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Omega** - Next-generation ocean model
- **Omega CTests**

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components:

- **Omega** - Next-generation ocean model
- **Polaris** - MPAS-Ocean model with Omega integration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Polaris** - MPAS-Ocean model with Omega integration
- **Polaris** - Omega tests on MPAS meshes```

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.


source /etc/bashrc

export CRONJOB_BASEDIR=/lcrc/globalscratch/ac.kimy/cronjobs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export CRONJOB_BASEDIR=/lcrc/globalscratch/ac.kimy/cronjobs
export CRONJOB_BASEDIR=/lcrc/globalscratch/${USER}/cronjobs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented.

export https_proxy=http://proxy.ccs.ornl.gov:3128/
export no_proxy='localhost,127.0.0.0/8,*.ccs.ornl.gov'

export CRONJOB_BASEDIR=/lustre/orion/cli115/scratch/grnydawn/cronjobs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export CRONJOB_BASEDIR=/lustre/orion/cli115/scratch/grnydawn/cronjobs
export CRONJOB_BASEDIR=/lustre/orion/cli115/scratch/${USER}/cronjobs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented.


module load cray-python cmake

export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-cpu
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-cpu
export CRONJOB_BASEDIR=/pscratch/sd/${USER:0:1}/${USER}/omega/cronjobs_pm-cpu

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented.


module load cray-python cmake

export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-gpu
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-gpu
export CRONJOB_BASEDIR=/pscratch/sd/${USER:0:1}/${USER}/omega/cronjobs_pm-gpu

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented.

#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH -q debug
#SBATCH --account=cli115
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get account from a place that's common with that used to generate polaris job scripts, currentlypolaris/machines/*.cfg option parallel/account?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbegeman , I think it would be better to get the account from a common place. Do any of the polaris/machines/*.cfg files have a parallel/account entry? I couldn’t find any account information in those files. While I do see a group entry in the cfg files, some values of group do not match the actual account name—for example, e3sm_g on PM-GPU.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like they don't but they could/should, e.g.,

if config.has_option('parallel', 'account'):

Comment on lines +10 to +26
if [[ "$CRONJOB_MACHINE" == "chrysalis" ]]; then
module load python cmake
PARMETIS_TPL="/lcrc/soft/climate/polaris/chrysalis/spack/dev_polaris_0_10_0_COMPILER_openmpi/var/spack/environments/dev_polaris_0_10_0_COMPILER_openmpi/.spack-env/view"

elif [[ "$CRONJOB_MACHINE" == "frontier" ]]; then
module load cray-python cmake git-lfs
PARMETIS_TPL="/ccs/proj/cli115/software/polaris/frontier/spack/dev_polaris_0_10_0_COMPILER_mpich/var/spack/environments/dev_polaris_0_10_0_COMPILER_mpich/.spack-env/view"

elif [[ "$CRONJOB_MACHINE" == "unknown" ]]; then
echo "CRONJOB_MACHINE is not set."
exit -1

else
echo "It seems that the cron job is not configured with CRONJOB_MACHINE."
exit -1

fi
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have some logic here for different machines, it would be good to pull out the pieces that are common to each job*.sbatch file to a single file for ease of maintenance

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented.

POLARIS_CDASH_BASEDIR=${CRONJOB_BASEDIR}/tasks/polaris_cdash
POLARIS_CDASH_TESTDIR="${POLARIS_CDASH_BASEDIR}/tests"
OMEGA_HOME="${POLARIS_CDASH_BASEDIR}/polaris/e3sm_submodules/Omega"
MINIFORGE3_HOME="${POLARIS_CDASH_BASEDIR}/miniforge3"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a command-line argument to launch_all.sh so we can use an existing miniforge install?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar Does this need to be changed to pixi?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it preferable to test the latest Polaris with a nightly Cronjob? Originally, I thought of using the Polaris code base that contains the cron-scripts sub-directory, but I realized that the repo might not be up to date. I probably still need to clone or update it to ensure I am using the latest version.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not understanding your question. When we merge this PR, the polaris code base would include the cron-scripts directory. So maybe the answer is yes, clone/update polaris nightly to use the latest main

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. To run the nightly tests, crontab should execute the launch_all.sh script every day. Since all files under the cron-scripts folder, including launch_all.sh, are part of the Polaris repository, we may need to update the Polaris repository to the latest version before running launch_all.sh.

Copy link
Copy Markdown
Collaborator

@cbegeman cbegeman Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I guess that's the downside of having cron-scripts in polaris.

@cbegeman
Copy link
Copy Markdown
Collaborator

cbegeman commented Mar 11, 2026

In order to use launch_all.sh, I needed cd ${source_path} here

print('Creating the polaris conda environment\n')
, otherwise the error is:

FileNotFoundError: [Errno 2] No such file or directory: '/pscratch/sd/c/cbegeman/polaris-repos/cron-scripts/cron-scripts/deploy/unsupported.txt'

because it's trying to run the command from wherever I executed launch_all.sh

@cbegeman
Copy link
Copy Markdown
Collaborator

I get the following error when I attempt to launch_all.sh -m pm-cpu

--------------------------------------------------------------------------------
Building Omega (dev) with gnu in /pscratch/sd/c/cbegeman/omega/cronjobs_pm-cpu/tasks/polaris_cdash/tests/gnu/omega_build
--------------------------------------------------------------------------------
...
-- Caching compiler settings in /pscratch/sd/c/cbegeman/omega/cronjobs_pm-cpu/tasks/polaris_cdash/tests/gnu/omega_build/SCORPIO_CMakeCache.txt...
...
CMake Error at /pscratch/sd/c/cbegeman/omega/cronjobs_pm-cpu/tasks/polaris_cdash/polaris/e3sm_submodules/Omega/externals/scorpio/cmake/SPIOTypeUtils.cmake:223 (message):
  Could not find a type for representing PIO Offsets
Call Stack (most recent call first):
  /pscratch/sd/c/cbegeman/omega/cronjobs_pm-cpu/tasks/polaris_cdash/polaris/e3sm_submodules/Omega/externals/scorpio/src/clib/CMakeLists.txt:287 (get_pio_offset_type)


-- Configuring incomplete, errors occurred!

@xylar xylar force-pushed the ykim/cron-scripts branch from 65a55ab to c12641d Compare March 16, 2026 17:03
@grnydawn
Copy link
Copy Markdown
Contributor Author

@cbegeman Thanks for the review. All the suggestions make sense to me. Since the error you noted above appears to be related to Phil’s recent PR (E3SM-Project/Omega#362), I’ll review the Phil's PR and let it be merged into Omega first, then update the branch for this PR.

@grnydawn
Copy link
Copy Markdown
Contributor Author

grnydawn commented Apr 1, 2026

@cbegeman , I have run the omega_nightly test suite on Chrysalis and PM-CPU using the latest Polaris, and all tests passed. The test results from PM-GPU include only two test tasks, which I believe is a separate issue related to slow Slurm jobs on PM-GPU. Interestingly, all tests passed on Chrysalis and PM-CPU both with and without Phil’s recent PR (E3SM-Project/Omega#362). I’m wondering if you are still seeing the issue you reported above.

@grnydawn
Copy link
Copy Markdown
Contributor Author

grnydawn commented Apr 1, 2026

@cbegeman , @xylar , I was trying to implement Carolyn’s suggestion, but I realized that she has pushed several commits to this PR. I’m wondering what the best way is to proceed with incorporating those changes using Git. I tried to pull the commits, but it didn’t work.

@xylar
Copy link
Copy Markdown
Collaborator

xylar commented Apr 1, 2026

I was trying to implement Carolyn’s suggestion, but I realized that she has pushed several commits to this PR. I’m wondering what the best way is to proceed with incorporating those changes using Git. I tried to pull the commits, but it didn’t work.

It sounds like you have both local changes and those on the remote branch. In such circumstances, you should fetch the remote branch:

git fetch --all -p

and then rebase your local branch onto the remote one, e.g.:

git rebase --interactive grnydawn/ykim/cron-scripts

You may then end up with merge conflicts you need to resolve in the usual way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cron Related to cron jobs (nightly testing)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants