Skip to content

Conversation

@casparvl
Copy link
Contributor

@casparvl casparvl commented Apr 17, 2025

This PR restructures the arch_target_map configuration item (and any place in the code that used it), so that the keys no longer have meaning. This allows having e.g. a CPU partition based on zen4 and an accelerated partition based on zen4, but have the bot distinguish between them. This then allows the user to target the CPU or GPU partition, based on what is desired for the build.

The functionality implemented here enables:

  • a bot admin to allow cross-compilation for accelerator targets on CPU-only partitions
  • a bot admin to prevent cross-compilation for accelerator targets on CPU-only partitions
  • a bot admin to allow CPU-only compilation on a partition with accelerators
  • a bot admin to disallow CPU-only compilation on a partition with accelerators

It should thus be flexible enough to create a configuration that 'makes sense' on any system. As an example: Snellius has both zen4 CPU-only nodes, and zen4 + H100 nodes. This PR allows me to configure the bot in such a way that e.g. CPU-only zen4 builds, as well as zen4 + nvidia/cc70 and zen4 + nvidia/cc80 builds are done on the CPU-only partition with zen4 nodes, while the zen4 + nvidia/cc90 builds will use the GPU partition (zen4 + H100 nodes) so that they can build that GPU software natively.

The repo_target_map is no longer needed, as each virtual partition in the arch_target_map now declares itself for which repositories it is configured to build. This also allows some extra flexibility, as one could e.g. allow building for the eessi.io-2023.06-compat only on CPU-only nodes. Not sure if this is super useful, but the flexibility comes for free with the new structure.

Finally, all the places in the code that used repo_target_map have been rewritten. As far as I could tell, that was mainly the handle_pull_request_opened_event function.

Fixes: #294

Caspar van Leeuwen and others added 11 commits April 17, 2025 16:42
…e keys don't have meaning, and all meaning is in the values
…so, make sure to print the same information to the logs, regardless of whether this is about a partition that has accelerators defined or not. Finally, make sure that if we have not hit a match by the end of the loop over all accelerators, we continue to the next iteration of the loop over the repo-targets, so that the job-dir preparation is skipped for the current one
…AND if the build command does NOT, we explicitely check that the defined accelerator is 'None'. This allows skipping CPU-only builds on accelerated partitions.
…, as it is now replaced by the repo list in the arch_target_map. Also, fix hound issues
…o individual partitions when printing the config
@casparvl casparvl marked this pull request as ready for review July 1, 2025 15:18
Copy link
Contributor

@trz42 trz42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of comments.

This PR does actually more than what was originally "requested" in #294

It provides more control for bot admins to allow/prevent certain jobs to be triggered, e.g., one might prevent accidental CPU-only builds on GPU-nodes. This feature is welcome, I think, although it changes the meaning of the accelerator filter. I don't know if the "prevent GPU builds on CPU-only nodes" is really necessary though. Main problem could be that is not so clear why some filter trigger a build and some other not.

While I'll have to give this a test run, I believe it does what it should do.

The move (or removal) of the repo_target_map is not necessary (I think, but I may be wrong). It doesn't hurt either.

I wonder if one should simply rename arch_target_map to make a braking change...

Also for the bot command filter names, I wonder if one could do better, e.g.,

current new comment
architecture node or node_type ... arch_target_map could be renamed to node_map or node_type_map
... cpu this would allow to make the definition of node types or partitions simpler OR use the information for verifying (similar to what is done for accel)

I wonder if we really need to have os and cpu_subdir in the definition of a partition, and use them to set the architecture (in the context when matching filters). Maybe it would be better to add another filter for cpu (defines for what CPU microarchitecture we want to build for). os and cpu_subdir are really odd things that should not matter much to the bot. They should be just passed through to the job.

Similarly, do we really care about the repository information when we want to trigger a job on a zen4 node? It may add some options for an admin to constrain certain combinations ... however it feels a bit odd that this is a bot configuration option.

app.cfg.example Outdated


[architecturetargets]
# defines for which architectures the bot will build and what job submission
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# defines for which architectures the bot will build and what job submission
# defines for which architectures (CPU and/or GPU) the bot can build and what job submission

app.cfg.example Outdated
[architecturetargets]
# defines for which architectures the bot will build and what job submission
# parameters shall be used to allocate a compute node with the correct
# parameters shall be used to allocate a compute node with the correct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# parameters shall be used to allocate a compute node with the correct
# parameters shall be used to allocate a compute node with the correct CPU(+GPU) architecture

app.cfg.example Outdated
# defines for which architectures the bot will build and what job submission
# parameters shall be used to allocate a compute node with the correct
# parameters shall be used to allocate a compute node with the correct
# The keys of the arch_target_map are virtual partition names. They don't have any meaning in the bot code,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The keys of the arch_target_map are virtual partition names. They don't have any meaning in the bot code,
# The keys of the arch_target_map are just strings. They don't have any meaning in the bot code,

"virtual" and "partition" carry some meaning which could be misleading.

app.cfg.example Outdated
# parameters shall be used to allocate a compute node with the correct
# parameters shall be used to allocate a compute node with the correct
# The keys of the arch_target_map are virtual partition names. They don't have any meaning in the bot code,
# and can thus be chosen as desired.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# and can thus be chosen as desired.
# and can thus be chosen as desired (however, they could be standardised across different bot instances run by an organisation, so users easily understand what they mean).

app.cfg.example Outdated
# parameters shall be used to allocate a compute node with the correct
# The keys of the arch_target_map are virtual partition names. They don't have any meaning in the bot code,
# and can thus be chosen as desired.
# Note that you are responsible that ANY bot:build command ONLY matches a single virtual partition!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now this is the case, but with #310 which only supports exact matches this wouldn't be an issue any longer.

Suggested change
# Note that you are responsible that ANY bot:build command ONLY matches a single virtual partition!
# Note that you are responsible that ANY bot:build command ONLY matches a single key for the architecture!

Actually, this is a little off. Maybe it would be better to explain that (without #310) if one has defined the keys zen4 and zen4+H100, then a bot:build arch:zen4 would result in two jobs. So, at the moment, none of the keys should be a substring of another key.

app.cfg.example Outdated
Comment on lines 334 to 336
# the event_filter will NOT mark this virtual partition as a valid match. This is intentional, as this particular
# (example) cluster has a native zen4+cc90 partition (gpu_h100) and we want this command to trigger a native build
# on that partition, rather than cross-compiling on this cpu_zen4 partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really the intention? It changes the meaning of accel as we used it until now (defining for which accelerator we want to build for) to if zen4 architecture target defines "accel": ["None"] it is interpreted as a means to select or not select a build node. If cpu_zen4 architecture target does not define "accel" it is interpreted as a means to define for which accelerator we want to build for).

Plus the selection of the build node is now sometimes the result of only arch and sometimes the result of both arch and accel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always been a bit strange to me why accel had a different meaning than arch. I.e. arch was used for node selection, but accel wasn't. The only reason that 'worked' was because we always do native builds, and arch thus actually meant both: give me a node that matches this architecture and build for this architecture (i.e. in this architectures prefix). Somehow, accel was special in that it did not have any effect on node selection. But it has to if we ever want to enable a combination of native and cross-compiled builds (how else do I ensure a native build?).

Anyway, I think this:

if zen4 architecture target defines "accel": ["None"] it is interpreted as a means to select or not select a build node. If cpu_zen4 architecture target does not define "accel" it is interpreted as a means to define for which accelerator we want to build for).

Should be seen differently. If a partition defines "accel": ["None"] that is a declaration that this partition is not suitable for (/should not be used for) compiling for accelerators. If a partition does not define "accel", that is an (implicit) declaration that it may be used to (cross)-compile for any accelerator.

The build command then defines what accelerator we want to build for (e.g. it determines the prefix in which we'll install software). It is the job of the matching logic to then select a partition that can facilitate that request.

This is pretty equivalent to how ReFrame does things with partition features. A ReFrame config can declare "Partition A has a feature 'GPU', Partition B does not". A test can declare "I need a partition with the feature 'GPU' to run". As a result, when running ReFrame on a system with Partition A and B, that test will then only be scheduled on Partition A.

Comment on lines 417 to 419
# Do not print virtual partition names, a bot admin may not want to share those
# Instead, just number them
comment += f"\n- Partition {partition_num+1}:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would one be able to know the names if they are not shown?

Copy link
Contributor Author

@casparvl casparvl Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to know. They could be named foo and bar, it's not relevant to the end user, since the names don't have any meaning (it's just helpful for the bot admin to pick a 'sensible' name :))

tasks/build.py Outdated
Comment on lines 615 to 625
# for a lot of accelerator targets
# arch_target_map = {
# 'virtual_partition_name': {
# 'os': 'linux',
# 'cpu_subdir': 'x86_64/amd/zen4',
# 'accel': ['nvidia/cc90'],
# 'slurm_params': '-p genoa <etc>',
# 'repo_targets': ["eessi.io-2023.06-compat","eessi.io-2023.06-software"],
# },
# 'virtual_partition_name2': {
# ... etc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be used at the start of the explanation in app.cfg.example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll move it.

tasks/build.py Outdated
if 'accel' in partition_info and accelerator is not None:
# Use the accelerator as defined by the action_filter. We check if this is valid for the current
# virtual partition later
arch_dir += accelerator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arch_dir could become x86_64/amd/zen4nvidia/cc90 ?

Copy link
Contributor Author

@casparvl casparvl Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean: the slash after zen4 is missing? That's indeed a mistake

app.cfg.example Outdated
# "accel": ["nvidia/cc70", "nvidia/cc80", "nvidia/cc90"]
# Note that setting:
# "accel": ["None", "nvidia/cc90"]
# is invalid here, since it would lead to both the cpu_zen4 and the gpu_h100 partitions matching the build command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Invalid" in the sense the bot checks and refuses to even start or "invalid" in the sense it is not recommended because it's ambiguous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid in the sense it will lead to an error (the bot will fail on prepare the job dir for the gpu_h100 partition because that job dir was already prepared for the same job on cpu_zen4).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which would never happen if partitions have unique names and a user targets a partition by its name. Would also be easier to understand what is going on.

@trz42
Copy link
Contributor

trz42 commented Jul 1, 2025

There's actually one thing that may unnecessarily limit what we could achieve here. With the slurm_params we cannot only control to which type of node a job is submitted, we could also make use of partial nodes, different memory requirements, ...

For example,

"zen4_small_short" : {
  "slurm_params": "--partition zen4_nodes --nodes=1 --ntasks-per-node=4 --mem=16G --time=0-6"
},
"zen4_big_long" : {
  "slurm_params": "--partition zen4_nodes --nodes=1 --ntasks-per-node=8 --mem=64G --time=1-0"
},
"zen4_quick_shot": {
  "slurm_params": "--partition zen4_nodes --nodes=1 --ntasks-per-node=1 --mem=4G --time=10"
}

but this would require a departure from the current filters (architecture and accel) to, maybe, (nodetype, cpu and accel).

This could be particularly useful when submitting jobs to busy HPC clusters.

However, it also feels like using the same hammer for another problem -- tweaking submission parameters -- maybe that would be better supported with another filter submitopts:arg=value,... and we do some sanity checking on the args and values (could do similar things what we do for exportvariable).

@casparvl
Copy link
Contributor Author

casparvl commented Jul 2, 2025

I don't know if the "prevent GPU builds on CPU-only nodes" is really necessary though. Main problem could be that is not so clear why some filter trigger a build and some other not.

You don't have to prevent GPU builds on CPU only nodes. But we need the possibility to do this, because otherwise, two partitions match, and the bot will try to prep the job twice in the same builddir. See this issue

The alternative way of fixing this issue is to break the loop after the first match. But in that case the order of partitions in the arch_target_list would be important. I think as long as we document it well, that's also fine. But I think the same is true with the current setup: it requires proper docs, but then it offers the bot admin to do whatever they want :)

@casparvl
Copy link
Contributor Author

casparvl commented Jul 2, 2025

The move (or removal) of the repo_target_map is not necessary (I think, but I may be wrong). It doesn't hurt either.
I wonder if one should simply rename arch_target_map to make a braking change...

Removing it keeps things clean - it is no longer used, as far as I could tell. But I do like your idea of renaming arch_target_map. And then in de config validation, maybe we can check if those items are still set - and if they are, print an informative message (that it's been replaced by node_map or whatever we will call it)

@casparvl
Copy link
Contributor Author

casparvl commented Jul 2, 2025

However, it also feels like using the same hammer for another problem -- tweaking submission parameters -- maybe that would be better supported with another filter submitopts:arg=value,... and we do some sanity checking on the args and values (could do similar things what we do for exportvariable).

Hm, yeah, I'm not sure if 'the same hammer' is the solution here. Unless you consider 'max walltime' a requirement for the build job (just like a particular architecture or accelerator is). Then you could say, the person commanding the bot declares what they need, and they get a partition that has a max walltime that is at least the amount they requested. Then, it would just be another filter option, essentially. And, we'd have to make that timelimit a separate field in the config (just like "os" and "cpu_subdir" are currently) so that it can easily be compared against. Finally, it would require then adding a -t <that_config_field> to the slurm_params at the end. Same could be true for memory btw.

But indeed, a submitopts:arg=value may be a better approach here. Whatever we choose, I'd do that in a follow-up PR, I see no reason to combine it with the current one.

@trz42
Copy link
Contributor

trz42 commented Jul 2, 2025

I don't know if the "prevent GPU builds on CPU-only nodes" is really necessary though. Main problem could be that is not so clear why some filter trigger a build and some other not.

You don't have to prevent GPU builds on CPU only nodes. But we need the possibility to do this, because otherwise, two partitions match, and the bot will try to prep the job twice in the same builddir. See this issue

To me this sounds the concept is not right if we need extra configuration to prevent something.

The alternative way of fixing this issue is to break the loop after the first match. But in that case the order of partitions in the arch_target_list would be important. I think as long as we document it well, that's also fine. But I think the same is true with the current setup: it requires proper docs, but then it offers the bot admin to do whatever they want :)

Strange argument. The current code uses the key "OS/CPU_ARCH" for defining part of the installation directory. That needs to be changed.

@casparvl
Copy link
Contributor Author

casparvl commented Jul 8, 2025

Discussed some things with Thomas on chat, some conclusions:

The real issue may not have been that keys have meaning, but that the architecture bot argument is used for two purposes: "Give me a node of CPU arch X", but also "Build for CPU arch X". For CPU, that was ok (at least for EESSI, since we didn't cross-compile). However, this PR now does the same to accel: it used to only mean "Build for GPU arch Y", but now also means "Give me a node with GPU arch Y".

The proposed solution is to split the two concerns by making two separate bot arguments, e.g.:

bot:build on:CPU=x86_64/amd/zen4 for:CPU=x86_64/amd/zen4

or

bot:build on:CPU=x86_64/amd/zen4,GPU=nvidia/cc90 for:CPU=x86_64/amd/zen4,GPU=nvidia/cc90

The bot config could then be:

"cpu_zen4" : {
    "cpu_arch": "x86_64/amd/zen4",
    "slurm_params": "--partition genoa ...other-non-common-submission-params..."
}
"gpu_zen4_h100" : {
    "cpu_arch": "x86_64/amd/zen4",
    "gpu_arch": "nvidia/h200",
    "slurm_params": "--partition gpu_h100 ...other-non-common-submission-params..."
}

Some questions that were still open in the discussion:

  • How do you make sure the zen4 CPU-only builds don't end up on your h100 partition?
  • How do you deal with requirements that match multiple node types in the bot config?

The suggestion by @trz42 was to add gpu_arch to the context, and then require that all elements of the context be present as a filter (i.e. all these elements should have been passed as an argument for bot:build). This means that for the cpu_zen4 node type, there is no gpu_arch in the context. Since bot:build on:CPU=x86_64/amd/zen4 for:CPU=x86_64/amd/zen4 also doesn't set a filter for gpu_arch, this is a match. However, for the gpu_zen4_h100 node type, the context will have a gpu_arch set. So, if we enforce that all elements of a context need to be present as a filter, bot:build on:CPU=x86_64/amd/zen4 for:CPU=x86_64/amd/zen4 doesn't match, becaues it doesn't set a gpu_arch. This may require changes from PR310.

Note that this doesn't handle the second question. What if a bot config has

"cpu_zen4" : {
    "cpu_arch": "x86_64/amd/zen4",
    "slurm_params": "--partition genoa ...other-non-common-submission-params..."
}
"cpu_zen4_2" : {
    "cpu_arch": "x86_64/amd/zen4",
    "slurm_params": "--partition genoa_partition_2 ...other-non-common-submission-params..."
}

and a builder does:

bot:build on:CPU=x86_64/amd/zen4 for:CPU=x86_64/amd/zen4. Again, with the upstream code, this would trigger two builds, in the same job prefix. I see two solutions:

  1. First match wins, i.e. the order in which partitions is listed in the bot config determines there priority. As soon as one can satisfy the needs declared in the bot:build command, that node type is picked.
  2. Error out. I.e. detect that there are multiple matches, and print an informative error.

With the requirement that all elements of the context should be present as filter, I don't see how a filter could match multiple partitions - the above config doesn't really make sense, as there wouldn't be any bot:build command that only matches the second node type. So, maybe option 2 is the most sensible: maybe we should just print an informative error stating that the bot config is invalid, because there is no (functional) distinction between cpu_zen4 and cpu_zen4_2.

The final thing that was discussed is that it may be good to give the ability to select a partition directly, without filter matching. I.e. bot:build label:cpu_zen4 would then match the cpu_zen4 partition, no on: argument would be needed. This also means we need show_config to print the actual keys used in the config, so that a build knows which labels are valid.

@casparvl
Copy link
Contributor Author

casparvl commented Jul 8, 2025

Just a note: the for arguments should probably not be stored as action filters. They are meant to be used for selection, and thus should not be compared to the context. The only reason to put them as a filter would then be as a means to pass on the values from the bot command line, to eventually the prepare_job_cfg. That's a bad reason to do it like that. We should really make a separate place to store these from EESSIBotCommand.__init__() (i.e. some separate attribute), then pass it on when submit_build_jobs()callsprepare_jobs()and thenprepare_job_cfg`

@casparvl
Copy link
Contributor Author

casparvl commented Jul 10, 2025

TODO's:

  • Change path of job dir so that it represents the 'for:' architectures, instead of the 'on:'. Needed so that we can do different cross-compiles on the same partition at the same time (and still have unique directories).
  • Improve reporting of the bot so that it makes clear in the PR not only which architecture it builds ON but also which one it builds FOR
  • Rename the arch_target_map according to Thomas' suggestion
  • Implement a removal warning for the old config items (i.e. check if someone defines them, if so, error out)
  • Make sure show_config to show the real keys from the arch_target_map, and make clear in the app.cfg.example that these names are meaningless, but public.
  • Currently, the code supports defining multiple accelerators. That was needed previously for allowing cross-compilation, but now that we have the on: ... for: ... syntax, it is no longer needed. Reduce to 1 accelerator? Note that even if a node has multiple different types of accelerators, you could define two entries in the arch_target_map: one for each accelerator type. They would just have the same slurm parameters to allocate one, but that's fine.

Won't do:

  • Node selection based on label. We can do this in a separate PR if we want.

@casparvl
Copy link
Contributor Author

Reporting of which arch it builds for: casparvl/software-layer#1 (comment)

Copy link
Contributor

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 1 😉

Clarify descriptions in app.cfg.example

Co-authored-by: Bob Dröge <b.e.droge@rug.nl>
Copy link
Contributor

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review part 2 (build.py

Comment on lines +960 to +963
on_arch = '-'.join(job.arch_target.split('/')[1:])

# Obtain the architecture to build for
for_arch = build_params[BUILD_PARAM_ARCH]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started wondering here if build_params shouldn't be part of the job namedtuple, since they both contain information about the build job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build parameters (i.e. anything passed to for: on the command line) need to somehow make their way from the bot_command as it is present in handle_bot_command_build to this point (and to the prepare_jobs() function as well, where it is also needed). Short of passing the raw bot_command in it's entirety, the best way to do this is to create a separate data structure for it.

Note that create_pr_comment indeed receives both job and build_params. But prepare_jobs() doesn't: instances of the jobs NamedTuple only get create in that function. Thus, the only way to get things from handle_bot_command_build to prepare_jobs() is to pack them in a data structure, and pass them as arguments through the respective functions.

The only alternative I see is passing the full bot_command down to prepare_jobs, but I would consider that bad practice, as it creates less isolated code by passing more information than is required.

Caspar van Leeuwen added 2 commits July 28, 2025 16:53
…docstring for status command to show the correct return type (string)
…ted on the docstring and comments of the template_to_regex function, since it's functionality may be a bit hard (abstract) to understand otherwise.
casparvl and others added 2 commits July 29, 2025 13:48
Co-authored-by: Bob Dröge <b.e.droge@rug.nl>
Co-authored-by: Bob Dröge <b.e.droge@rug.nl>
@bedroge
Copy link
Contributor

bedroge commented Aug 4, 2025

Tested in SURF-hpcv/software-layer-scripts#2.

@bedroge bedroge merged commit 7a92892 into EESSI:develop Aug 4, 2025
5 checks passed
@boegel
Copy link
Contributor

boegel commented Aug 4, 2025

@casparvl The README file still explains how to use arch_target_map, but I think this PR removed support for it?

(I would've preferred a more gradual transition, like keeping support for arch_target_map as long as node_type_map is not used, but it seems like that ship has sailed)

@casparvl
Copy link
Contributor Author

casparvl commented Aug 4, 2025

(I would've preferred a more gradual transition, like keeping support for arch_target_map as long as node_type_map is not used, but it seems like that ship has sailed)

I considered this, but it would have been quite complex to keep both (and the PR was already quite complex). Considering we've made plenty of breaking changes to the config before, I consider mine to be reasonably 'nice' since the bot will inform you upon startup that you have a key defined in your config which is no longer supported, and by what it got replaced... So even if you didn't know about this PR, you would have been pointed in the right direction :)

I'll fix the README though, good point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid assigning meaning to arch_target_map keys

4 participants