Hello, thanks for you answers,

> - Does it work if you remove the space in "TaskPlugin=task/affinity,
>  task/cgroup"? (Slurm can be quite picky when reading slurm.conf).

It was the case, I make a mistake when I copy/cut... So, I haven't space here.

> 
> - See in slurmd.log on the node(s) of the job if cgroup actually gets
>  activated and starts limit memory for the job, or if there are any
>  errors related to cgroup.

Yes, example:
Launching batch job 1605839 for UID <num>
[1605839.batch] task/cgroup: /slurm/uid_<num>/job_1605839: alloc=200MB 
mem.limit=200MB memsw.limit=200MB
[1605839.batch] task/cgroup: /slurm/uid_<num>/job_1605839/step_batch: 
alloc=200MB mem.limit=200MB memsw.limit=200MB

> 
> - While a job is running, see in the cgroup memory directory (typically
>  /sys/fs/cgroup/memory/slurm/uid_<num>/job_<num> for the job (on the
>  compute node).  Does the values there, for instance
>  memory.limit_in_bytes and memory.max_usage_in_bytes, make sense?

Yes, for the same job:
cat /sys/fs/cgroup/memory/slurm/uid_<num>/job_1605839/memory.limit_in_bytes 
209715200
root@star190:~# cat 
/sys/fs/cgroup/memory/slurm/uid_<num>/job_1605839/memory.max_usage_in_bytes 
209715200

But:

cat /sys/fs/cgroup/memory/slurm/uid_<num>/job_1605839/memory.usage_in_bytes 
209711104

is always under memory.max_usage_in_bytes. I think it's because the field 
ConstrainRAMSpace=yes in cgroup.conf, and the process swap (with 
ConstrainRAMSpace=no)... I try configuration of Michael Renfro in precedent 
email, but when ConstrainRAMSpace=no and ConstrainSwapSpace=no, cgroup are no 
activate for the job (nothing appears in slurm.log or 
/sys/fs/cgroup/memory/slurm/uid_<num>/ ) Set the MemEnforceLimit to no or yes 
seem to be have no influence...

Maybe I missed something else...

Regards,

Jean-Mathieu
 
> --
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo

Reply via email to