Hello, thanks for you answers, > - Does it work if you remove the space in "TaskPlugin=task/affinity, > task/cgroup"? (Slurm can be quite picky when reading slurm.conf).
It was the case, I make a mistake when I copy/cut... So, I haven't space here. > > - See in slurmd.log on the node(s) of the job if cgroup actually gets > activated and starts limit memory for the job, or if there are any > errors related to cgroup. Yes, example: Launching batch job 1605839 for UID <num> [1605839.batch] task/cgroup: /slurm/uid_<num>/job_1605839: alloc=200MB mem.limit=200MB memsw.limit=200MB [1605839.batch] task/cgroup: /slurm/uid_<num>/job_1605839/step_batch: alloc=200MB mem.limit=200MB memsw.limit=200MB > > - While a job is running, see in the cgroup memory directory (typically > /sys/fs/cgroup/memory/slurm/uid_<num>/job_<num> for the job (on the > compute node). Does the values there, for instance > memory.limit_in_bytes and memory.max_usage_in_bytes, make sense? Yes, for the same job: cat /sys/fs/cgroup/memory/slurm/uid_<num>/job_1605839/memory.limit_in_bytes 209715200 root@star190:~# cat /sys/fs/cgroup/memory/slurm/uid_<num>/job_1605839/memory.max_usage_in_bytes 209715200 But: cat /sys/fs/cgroup/memory/slurm/uid_<num>/job_1605839/memory.usage_in_bytes 209711104 is always under memory.max_usage_in_bytes. I think it's because the field ConstrainRAMSpace=yes in cgroup.conf, and the process swap (with ConstrainRAMSpace=no)... I try configuration of Michael Renfro in precedent email, but when ConstrainRAMSpace=no and ConstrainSwapSpace=no, cgroup are no activate for the job (nothing appears in slurm.log or /sys/fs/cgroup/memory/slurm/uid_<num>/ ) Set the MemEnforceLimit to no or yes seem to be have no influence... Maybe I missed something else... Regards, Jean-Mathieu > -- > Regards, > Bjørn-Helge Mevik, dr. scient, > Department for Research Computing, University of Oslo