Add custom OOM killer for Linux containers by JaewonHur · Pull Request #653 · apple/containerization

JaewonHur · 2026-04-06T07:00:05Z

This PR implements a custom OOM killer that is spawned as a child process of vmexec.

While Linux kernel also OOM kills a process if it hits cgroup memory limit and the kernel cannot reclaim the memory, kernel often fails to kill the process and left the system hang due to the memory thrashing. Especially, the process is not OOM killed because the kernel still succeeds reclaiming the memory, not meeting the condition for OOM kill (but which takes way longer time, and leads to hang).

Thus, this PR adds a user space OOM killer as a child process of vmexec, which monitors cgroup memory events, and kills the process when max event hits a specified limit. This approach can reliably kills the OOM process as monitoring memory events can be performed in small time window.

This PR needs following more works:

Plumb UI to inform the users that the container has been killed due to the OOM.
Refactor errorPipe to catch errors from (long running) OOM killer process (or any other ways to catch the errors).

dkovba · 2026-04-06T20:35:13Z

vminitd/Sources/vmexec/RunCommand.swift

+                usleep(1_000_000)
+                let events = try cgroupManager.getMemoryEvents()
+
+                if events.max > oomLimit {


events.max represents the number of events. 1_000_000 seems to be a too large threshold. Would it be appropriate to use a threshold of zero?

dkovba · 2026-04-06T20:48:49Z

vminitd/Sources/vmexec/RunCommand.swift

+
+            while true {
+                usleep(1_000_000)
+                let events = try cgroupManager.getMemoryEvents()


Can we move MemoryMonitor to the Cgroup library and use it instead of pulling memory events with a fixed interval? CC @dcantah

dkovba · 2026-04-06T20:57:13Z

vminitd/Sources/vmexec/RunCommand.swift

+                let events = try cgroupManager.getMemoryEvents()
+
+                if events.max > oomLimit {
+                    try cgroupManager.kill()


Should we use App.writeError(error) and exit(code) when try fails?

dcantah · 2026-04-06T21:36:01Z

I'm sort of confused on what this is trying to solve. If the idea is we have some oom kills (likely child processes) that happen but init keeps running, there exists a cgroup toggle that makes it such that every process in the cgroup gets killed if there was an oom condition. Meaning, if the init process for the container is well within its limits, but some child process(es) keep getting oom killed, the kernel would kill the whole cgroup (and thus the whole container).

dkovba · 2026-04-06T21:40:28Z

When we run out of memory, a container hangs. The goal is to make it crash with an OOM error.

dcantah · 2026-04-06T21:54:27Z

Ok, regardless of what we decide I don't think we should run a separate forked process to do this. We can try and expose an API on LinuxContainer to monitor memory events in a stream like fashion. If we don't want to to do that either, today this whole scheme could be done with the APIs we expose right now. You could call LinuxContainer.statistics every {arbitrary} seconds and check the memoryEvents field.

Add custom oom killer

1a6cd26

dkovba self-requested a review April 6, 2026 20:03

dkovba reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom OOM killer for Linux containers#653

Add custom OOM killer for Linux containers#653
JaewonHur wants to merge 1 commit intoapple:mainfrom
JaewonHur:oom-kill

JaewonHur commented Apr 6, 2026

Uh oh!

dkovba Apr 6, 2026

Uh oh!

dkovba Apr 6, 2026

Uh oh!

dkovba Apr 6, 2026

Uh oh!

dcantah commented Apr 6, 2026

Uh oh!

dkovba commented Apr 6, 2026

Uh oh!

dcantah commented Apr 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JaewonHur commented Apr 6, 2026

Uh oh!

dkovba Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

dkovba Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

dkovba Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

dcantah commented Apr 6, 2026

Uh oh!

dkovba commented Apr 6, 2026

Uh oh!

dcantah commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dcantah commented Apr 6, 2026 •

edited

Loading