Skip to content

Conversation

@poksumdo
Copy link
Collaborator

Target stack is EESSI/2021.12

@poksumdo poksumdo added the bot:build Instruct bot to build software stack label Feb 23, 2023
@eessi-bot-devel-trz42

This comment was marked as outdated.

@eessi-bot-devel-trz42

This comment was marked as outdated.

@poksumdo

This comment was marked as outdated.

@eessi-bot-devel-trz42

This comment was marked as outdated.

@eessi-bot-devel-trz42

This comment was marked as outdated.

@trz42
Copy link
Owner

trz42 commented Mar 6, 2023

Testing eessi bot PR#154:

  • rebuild first (jobs above where run in a different directory)
  • run into some issue because setting upload_to_s3_script was not renamed to tarball_upload_script
  • tarball getting uploaded to nessi-2022.11 after renaming setting

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 bot:build Instruct bot to build software stack and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 bot:build Instruct bot to build software stack labels Mar 6, 2023
@eessi-bot-devel-trz42

This comment was marked as outdated.

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 bot:build Instruct bot to build software stack labels Mar 6, 2023
@trz42
Copy link
Owner

trz42 commented Mar 7, 2023

Testing eessi bot PR#156: default time limit

  • only run event_handler.sh to prevent job manager from releasing jobs (releasing is not necessary)

Test cases

  • 1. run without any time limit set in app.cfg --> scontrol show job JOBID should report 24:00:00 hours as time limit

    [trz42@mgmt PR156]$ scontrol show job 3952 | grep -E '(JobId|TimeLimit)'
    JobId=3952 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
    

    Note, actual format for time limit being reported is days-hours:minutes:seconds. Hence, the result TimeLimit=1-00:00:00 is equal to 24 hours.

  • 2. run with time limit specified via --time=12:00:00 set via slurm_params --> scontrol show job JOBID should report 12:00:00 hours as time limit

    [trz42@mgmt PR156]$ scontrol show job 3953 | grep -E '(JobId|TimeLimit)'
    JobId=3953 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=12:00:00 TimeMin=N/A
    
  • 3. run with time limit specified via --time 09:00:00 set via slurm_params --> scontrol show job JOBID should report 09:00:00 hours as time limit

    [trz42@mgmt PR156]$ scontrol show job 3954 | grep -E '(JobId|TimeLimit)'
    JobId=3954 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=09:00:00 TimeMin=N/A
    
  • 4. run with time limit specified via -t 06:00:00 set via slurm_params --> scontrol show job JOBID should report 06:00:00 hours as time limit

    [trz42@mgmt PR156]$ scontrol show job 3955 | grep -E '(JobId|TimeLimit)'
    JobId=3955 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=06:00:00 TimeMin=N/A
    
  • 5. run with time limit specified via --time=03:00:00 set via arch_target_map --> scontrol show job JOBID should report 03:00:00 hours as time limit

    [trz42@mgmt PR156]$ scontrol show job 3956 | grep -E '(JobId|TimeLimit)'
    JobId=3956 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=03:00:00 TimeMin=N/A
    
  • 6. run with malformed time limit spec, e.g., --TimeLimit=01:00:00 set via slurm_params --> scontrol show job JOBID should report 1-00:00:00 as time limit
    That actually crashed the event handler because --TimeLimit is not a known argument for sbatch.

  • 7. run with malformed time limit spec, e.g., --Time=01:00:00 set via slurm_params --> scontrol show job JOBID should report 1-00:00:00 as time limit
    Same result as with the above case.

  • 8. one more test to check if the algorithm can be mislead: no time limit specified, however, a job name is provided via --job-name=real-test --> scontrol show job JOBID should report 1-00:00:00 as time limit
    FAILED

    [trz42@mgmt PR156]$ scontrol show job 3957 | grep -E '(JobId|TimeLimit)'
    JobId=3957 JobName=real-test
       RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
    
  • 9. yet another test to check if the algorithm works: no time limit specified, however, CPUs per task provided via --cpus-per-task=2 --> scontrol show job JOBID should report 1-00:00:00 as time limit
    FAILED

    [trz42@mgmt PR156]$ scontrol show job 3959 | grep -E '(JobId|TimeLimit)'
    JobId=3959 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
    

@trz42 trz42 added the bot:build Instruct bot to build software stack label Mar 7, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3952

date job status comment
Mar 07 06:45:25 PM UTC 2023 submitted job id 3952 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 7, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3953

date job status comment
Mar 07 06:51:49 PM UTC 2023 submitted job id 3953 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 7, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3954

date job status comment
Mar 07 06:56:27 PM UTC 2023 submitted job id 3954 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 7, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3955

date job status comment
Mar 07 06:59:07 PM UTC 2023 submitted job id 3955 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 7, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3957

date job status comment
Mar 07 07:20:10 PM UTC 2023 submitted job id 3957 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 7, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3958

date job status comment
Mar 07 07:26:26 PM UTC 2023 submitted job id 3958 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 7, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3959

date job status comment
Mar 07 07:27:45 PM UTC 2023 submitted job id 3959 awaits release by job manager

@trz42
Copy link
Owner

trz42 commented Mar 10, 2023

Repeating the same test protocol after code was changed to also cover cases 8 & 9.

Testing eessi bot PR#156: default time limit

  • only run event_handler.sh to prevent job manager from releasing jobs (releasing is not necessary)

Test cases

  • 1. run without any time limit set in app.cfg --> scontrol show job JOBID should report 1-00:00:00 hours as time limit

    [trz42@mgmt bot-side-4-bot-build]$ scontrol show job 3974 | grep -E '(JobId|TimeLimit)'
    JobId=3974 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
    
  • 2. run with time limit specified via --time=12:00:00 set via slurm_params --> scontrol show job JOBID should report 12:00:00 hours as time limit

    [trz42@mgmt bot-side-4-bot-build]$ scontrol show job 3975 | grep -E '(JobId|TimeLimit)'
    JobId=3975 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=12:00:00 TimeMin=N/A
    
  • 3. run with time limit specified via --time 09:00:00 set via slurm_params --> scontrol show job JOBID should report 09:00:00 hours as time limit
    skipped

  • 4. run with time limit specified via -t 06:00:00 set via slurm_params --> scontrol show job JOBID should report 06:00:00 hours as time limit

    [trz42@mgmt bot-side-4-bot-build]$ scontrol show job 3976 | grep -E '(JobId|TimeLimit)'
    JobId=3976 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=06:00:00 TimeMin=N/A
    
  • 5. run with time limit specified via --time 03:00:00 set via arch_target_map --> scontrol show job JOBID should report 03:00:00 hours as time limit

    [trz42@mgmt bot-side-4-bot-build]$ scontrol show job 3977 | grep -E '(JobId|TimeLimit)'
    JobId=3977 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=03:00:00 TimeMin=N/A
    
  • 6. run with malformed time limit spec, e.g., --TimeLimit=01:00:00 set via slurm_params --> scontrol show job JOBID should report 1-00:00:00 as time limit
    skipped

  • 7. run with malformed time limit spec, e.g., --Time=01:00:00 set via slurm_params --> scontrol show job JOBID should report 1-00:00:00 as time limit
    skipped

  • 8. one more test to check if the algorithm can be mislead: no time limit specified, however, a job name is provided via --job-name=real-test --> scontrol show job JOBID should report 1-00:00:00 as time limit
    now, it works as expected

    [trz42@mgmt bot-side-4-bot-build]$ scontrol show job 3973 | grep -E '(JobId|TimeLimit)'
    JobId=3973 JobName=eessi-bot-build.slurm
       RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
    
  • 9. yet another test to check if the algorithm works: no time limit specified, however, CPUs per task provided via --cpus-per-task=2 --> scontrol show job JOBID should report 1-00:00:00 as time limit
    now, it works as expected

    [trz42@mgmt bot-side-4-bot-build]$ scontrol show job 3972 | grep -E '(JobId|TimeLimit)'
    JobId=3972 JobName=real-test
       RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
    

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 10, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3972

date job status comment
Mar 10 07:57:26 AM UTC 2023 submitted job id 3972 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 10, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3973

date job status comment
Mar 10 08:01:27 AM UTC 2023 submitted job id 3973 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 10, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3974

date job status comment
Mar 10 08:04:23 AM UTC 2023 submitted job id 3974 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 10, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3975

date job status comment
Mar 10 08:06:39 AM UTC 2023 submitted job id 3975 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 10, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3976

date job status comment
Mar 10 08:17:42 AM UTC 2023 submitted job id 3976 awaits release by job manager

@trz42 trz42 added bot:build Instruct bot to build software stack and removed bot:build Instruct bot to build software stack labels Mar 10, 2023
@eessi-bot-devel-trz42
Copy link

New job on instance CitC-PR156 for architecture x86_64-amd-zen2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR156/jobs/2023.03/pr_58/3977

date job status comment
Mar 10 08:20:42 AM UTC 2023 submitted job id 3977 awaits release by job manager

trz42 pushed a commit that referenced this pull request Mar 18, 2023
pull in fixes from EESSI/software-layer PR238 and PR239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:build Instruct bot to build software stack

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants