Replies: 1 comment
-
|
Thanks for opening the discussion and bringing this issue to everyone’s attention. To avoid confusion, I just want to add that this is only relevant when your simulation data is written to the scratch (local file system) of a compute node. At IMCS, we don’t do this and instead always use the shared file system. While this might have other downsides (performance issues can occur), this setup prevents us from having to ensure cleanup. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am currently struggling with queens’ simulation scheduling: After a queens run, a lot of simulation data was left on the compute nodes. In this particular case, the problem was so extreme that the node was almost full. However, while deleting the data, I also found a lot of folders from other queens users. Any suggestions on what is going wrong and how I can avoid it, since it causes problems for all cluster users?
As @maxdinkel explained: If the workers are killed before a 4C simulation job is finished, the simulation outputs are not deleted in the scratch folder. Maybe some excessive killing of queens runs causes the issue. This explains some of the problems, but I found folders which were very likely not caused by killing of queens runs.
Possible solution
Deleting the data from the compute nodes might be failing for other reasons. We might try multiple times to call the deleting process, e.g.
Thanks @maxdinkel!
Beta Was this translation helpful? Give feedback.
All reactions