Skip to content

CI: Emulator OOM kills system_server, causing Mono.Android.NET_Tests-NoAot to fail with no results #11065

@jonathanpeppers

Description

@jonathanpeppers

Summary

The Mono.Android.NET_Tests-NoAot test component fails intermittently because the Android emulator runs out of memory, triggering the Low Memory Killer (LMK) to kill system_server. This causes a full system restart and kills the test app before any tests execute, producing no NUnit results.

Failing build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1362050&view=logs&j=b227c637-c968-57be-6c5e-9409267d8fb0&t=a68687ba-0089-5801-b54b-a3b240b66a93&l=6069

Job: macOS > Tests > APKs 2 in the Package Tests stage (dotnet-android pipeline on main)

Timeline from logcat

Time Event
16:06:22 am instrument launches test app (PID 4181)
16:06:24–29 .NET runtime initializes normally (~5s), loads assemblies, resolves NUnitInstrumentation
16:06:29.151 Last log from test app — typemap resolution completed successfully
16:06:29.823 Process 2922 exited due to signal 9 (Killed) — LMK begins killing processes
16:06:29–32 ~80+ system services die (ServiceManager: service 'xxx' died)
16:06:32.302 lowmemorykiller: lmkd data connection dropped — confirms OOM
16:06:32.880 am instrument process shuts down with no test results

Errors in build log

  • warning XAAADB0000: Failure [DELETE_FAILED_INTERNAL_ERROR] — the prior uninstall also failed, suggesting the emulator was already unhealthy
  • error: Could not find NUnit2 results file after running component Mono.Android.NET_Tests/...NUnitInstrumentation: no nunit2-results-path bundle value found
  • warning: Unable to process TestResult-Mono.Android.NET_Tests.xml. Is it empty? (Did a unit test runner SIGSEGV?)

Root cause

The emulator's Low Memory Killer killed system_server, which cascaded into a full system restart. The test app was killed as collateral damage before it could run any tests. This is a transient infrastructure issue, not a code bug.

Suggestions for automatic CI recovery

Option 1: Retry the test component on failure with no results

In TestApks.targets, when the RunInstrumentationTests task detects that no nunit2-results-path was returned (i.e. the app produced zero output), automatically retry the component once before reporting failure. This specifically targets the "emulator died" scenario without masking real test failures.

Option 2: Add an emulator health check before running tests

Before launching am instrument, run a lightweight health-check (e.g. adb shell getprop sys.boot_completed + adb shell am get-config) and if the emulator is unresponsive or recently rebooted, restart it and re-deploy before running tests. This would catch the "already unhealthy" state indicated by the DELETE_FAILED_INTERNAL_ERROR.

Option 3: Use Azure DevOps automatic retry for the job

Configure the macOS > Tests > APKs 2 job with a retryCountOnTaskFailure so that Azure DevOps automatically retries the entire job on failure. This is the lowest-effort option but retries the full job rather than just the failing component.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIssues that need to be assigned.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions