Skip to content

Slurm support#123

Open
t-ramz wants to merge 4 commits intoTRIQS:unstablefrom
t-ramz:slurm-support
Open

Slurm support#123
t-ramz wants to merge 4 commits intoTRIQS:unstablefrom
t-ramz:slurm-support

Conversation

@t-ramz
Copy link

@t-ramz t-ramz commented Dec 17, 2025

I have a user at ORNL's OLCF that has requested to use sold_dmft on one of our resources and I have had trouble making it run with the slurm workload manager. I made a few changes here that can help alleviate the problem, and make it "work," but there remains a problem I'm having where submitted sub-jobs are causing an MPI error where it causes an MPI_ABORT that may be beyond the scope of this software.

Overall, I think if you want to include slurm-only support out of the box there may need to be a switch or feature in place to divvy up tasks from the host MPI process and allocate to VASP/dft_exe accordingly. E.g. get all ranks up front running solid_dmft and when running dft_exe provide explicit rank mappings in some way.
Take this with a grain of salt, I don't know your codebase.

Let me know if you have thoughts or questions and I'll do my best to get back to you!

@t-ramz t-ramz marked this pull request as ready for review December 17, 2025 23:59
@the-hampel
Copy link
Member

Hi @t-ramz ,
thank you for your addition! We can of course add such option. See two questions / comments I have above.

Regarding MPI_ABORT. If you have specific ideas let me know. We currently have anyway trouble often. For example if one of the ranks sends a signal 9 / 15 or something often other ranks don't die etc.

Best,
Alex

@t-ramz
Copy link
Author

t-ramz commented Dec 18, 2025

Hi Alex,

I believe I ran into the MPI_ABORT on my end because Slurm tried to allocate the same node and re-instantiate the base MPI communicator, which I speculate caused a fault in the parent process.

One way around this, which I discussed with a colleague, may be to create a secondary MPI communicator and use that communicator for the dft_exe process.

Thanks,
Anthony

@the-hampel
Copy link
Member

Yes the forking and spawning a subprocess is not super elegant and a bit problematic for some HPC configs. Maybe it is worth exploring spawning another communicator. Maybe not in the scope of this PR? If you can comment on the above mentioned questions we can merge this soon.

Best,
Alex


hostnames = mpi.world.gather(socket.gethostname(), root=0)
if cluster_name == 'slurm':
slurm_hostnames = [hostname.split('.')[0] for hostname in hostnames] # TODO: please find a better solution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you enlighten me why this extra part is necessary in this case?


if mpi_profile == 'slurm':
return [
mpi_exe, '-n', str(number_cores), '--export=PATH',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we enforce here that mpi_exe is 'srun' ?

@the-hampel
Copy link
Member

the-hampel commented Feb 5, 2026

Sry I just remembered that this PR was still open. Do you see my last two comments?

if cluster_name == 'slurm':
        slurm_hostnames = [hostname.split('.')[0] for hostname in hostnames]  # TODO: please find a better solution

can you enlighten me why this extra part is necessary in this case?
2)

    if mpi_profile == 'slurm':
        return [
            mpi_exe, '-n', str(number_cores), '--export=PATH',

should we enforce here that mpi_exe is 'srun' ?

If we discuss these two than we can merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants