Add custom OOM killer for Linux containers#653
Conversation
| usleep(1_000_000) | ||
| let events = try cgroupManager.getMemoryEvents() | ||
|
|
||
| if events.max > oomLimit { |
There was a problem hiding this comment.
events.max represents the number of events. 1_000_000 seems to be a too large threshold. Would it be appropriate to use a threshold of zero?
|
|
||
| while true { | ||
| usleep(1_000_000) | ||
| let events = try cgroupManager.getMemoryEvents() |
There was a problem hiding this comment.
Can we move MemoryMonitor to the Cgroup library and use it instead of pulling memory events with a fixed interval? CC @dcantah
| let events = try cgroupManager.getMemoryEvents() | ||
|
|
||
| if events.max > oomLimit { | ||
| try cgroupManager.kill() |
There was a problem hiding this comment.
Should we use App.writeError(error) and exit(code) when try fails?
|
I'm sort of confused on what this is trying to solve. If the idea is we have some oom kills (likely child processes) that happen but init keeps running, there exists a cgroup toggle that makes it such that every process in the cgroup gets killed if there was an oom condition. Meaning, if the init process for the container is well within its limits, but some child process(es) keep getting oom killed, the kernel would kill the whole cgroup (and thus the whole container). |
|
When we run out of memory, a container hangs. The goal is to make it crash with an OOM error. |
|
Ok, regardless of what we decide I don't think we should run a separate forked process to do this. We can try and expose an API on LinuxContainer to monitor memory events in a stream like fashion. If we don't want to to do that either, today this whole scheme could be done with the APIs we expose right now. You could call |
This PR implements a custom OOM killer that is spawned as a child process of
vmexec.While Linux kernel also OOM kills a process if it hits cgroup memory limit and the kernel cannot reclaim the memory, kernel often fails to kill the process and left the system hang due to the memory thrashing. Especially, the process is not OOM killed because the kernel still succeeds reclaiming the memory, not meeting the condition for OOM kill (but which takes way longer time, and leads to hang).
Thus, this PR adds a user space OOM killer as a child process of
vmexec, which monitors cgroup memory events, and kills the process whenmaxevent hits a specified limit. This approach can reliably kills the OOM process as monitoring memory events can be performed in small time window.This PR needs following more works:
errorPipeto catch errors from (long running) OOM killer process (or any other ways to catch the errors).