-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[Android CoreCLR] Log managed callstacks on native crash #123824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Android CoreCLR] Log managed callstacks on native crash #123824
Conversation
|
Tagging subscribers to this area: @agocke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enables Android CoreCLR to log managed stack traces when a native crash occurs, providing diagnostic information in environments where CreateDump is not available. The logging is wired through the existing crash-dump hook so that, whenever a dump would be created, the crashing managed thread’s stack is emitted to Android’s logging.
Changes:
- Add a HOST_ANDROID helper in
EEPolicythat walks and logs the current managed thread’s call stack using existingLogCallstackForLogWorkerinfrastructure. - Extend
PROCCreateCrashDumpIfEnabledon Android to call this helper via a weak symbol and emit a formatted “Managed Stacktrace” section tominipal_log_write_fatal, followed by the existing abort message.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/coreclr/vm/eepolicy.cpp |
Introduces LogCallstackForAndroidNativeCrash, an extern "C" Android-only helper that obtains the current Thread and logs its managed call stack, to be invoked from the PAL crash-dump path. |
src/coreclr/pal/src/thread/process.cpp |
On Android, adds a weak reference to LogCallstackForAndroidNativeCrash and, when present, logs a managed stacktrace banner and the managed stack in PROCCreateCrashDumpIfEnabled before logging that the process is aborting. |
| minipal_log_write_fatal("\n=================================================================\n"); | ||
| minipal_log_write_fatal("\tManaged Stacktrace:\n"); | ||
| minipal_log_write_fatal("=================================================================\n"); | ||
| LogCallstackForAndroidNativeCrash(); | ||
| minipal_log_write_fatal("=================================================================\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lateralusX I briefly looked into emitting the callstacks in the same manner that CrashInfo.cs does it. It seemed like its format was designed for managed exceptions, and I don't know if it'd be useful to stub in fields like the type, hr, message, etc. until we know what tools reading the adb log would expect.
If we did want to format it a particular way, it looed liked we'd be adding another API to CallStackLogger, for what seems like a temporary solution until Android CoreCLR can create a dump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not have to stub anything in the format produced by CrashInfo.cs.
We have intentionally used json for this format. json makes it straightforward to omit fields that are not relevant or available in the given situation, and also add additional information specific for the given situation. We have number of optional fields like that, for example
runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/CrashInfo.cs
Lines 249 to 253 in 3df8fb2
| if (moduleBase != nint.Zero) | |
| { | |
| if (!WriteHexValue("module"u8, (nuint)moduleBase)) | |
| return false; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to, or we dont need to adopt the same format?
We were considering using the same format since NativeAOT uses it in some scenarios. If we end up generating a dump on Android CoreCLR, it would be nice if it could automatically work with !crashinfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to, or we dont need to adopt the same format?
Sorry, I meant to "We do not nave to stub..."
We were considering using the same format since NativeAOT uses it in some scenarios.
Yes, it would be great to have on json format for crash reports.
3d0912d to
ffb92cc
Compare
| // TODO: Dump all managed threads callstacks into logcat and/or file? | ||
| if (LogCallstackForAndroidNativeCrash != nullptr) | ||
| { | ||
| minipal_log_write_fatal("\n=================================================================\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the ======================== separators? We do not use a separators like that anywhere else in CoreCLR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they're not needed, I had initially just used the format that Android Mono was using
runtime/src/mono/mono/mini/mini-exceptions.c
Lines 3005 to 3011 in 2772854
| g_async_safe_printf ("\n=================================================================\n"); | |
| g_async_safe_printf ("\tManaged Stacktrace:\n"); | |
| g_async_safe_printf ("=================================================================\n"); | |
| mono_walk_stack_full (print_stack_frame_signal_safe, mctx, jit_tls, mono_get_lmf (), MONO_UNWIND_LOOKUP_IL_OFFSET | MONO_UNWIND_SIGNAL_SAFE, NULL); | |
| g_async_safe_printf ("=================================================================\n"); |
|
What is the experience for unhandled managed exceptions or fail fasts with this fix? We should make sure that the stacktrace is logged just once. |
What was the source code or runtime change for this example? |
| if (LogCallstackForAndroidNativeCrash != nullptr) | ||
| { | ||
| minipal_log_write_fatal("\n=================================================================\n"); | ||
| minipal_log_write_fatal("\tManaged Stacktrace:\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should something more self-explanatory than just "Managed stacktrace:". Maybe something like "Crash in native code".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought PROCCreateCrashDumpIfEnabled also gets hit for unhandled managed exceptions. Would it still be fine to classify that as crash in native code? Maybe also Crash report: would align more given the hope is this would be replaced by real create dump logic.
|
Just curious - would we also consider printing out full mixed callstack (native/managed)? The OS stackwalker will only be able to walk the first native section of the stack, once it hits managed code it will stop. So if there's native code above the managed code which would be interesting, we will have no way to learn about it. We could possibly print out just native addresses for the native portion, and rely on symbolication later. |
I'm modifying the Android.Device_Emulator.JIT.Test Functional test as that was the easiest way to apply runtime changes.
That particular log in the description is the RaiseFailFastException in ep_buffer_write_event. This log is from unhandled managed exception (haven't yet changed the format yet from discussions in above threads) |
Android CoreCLR currently does not ship with CreateDump. When native crashes occur, it is difficult to figure out the underlying cause of the crash, as managed stacks aren't captured anywhere, and android's crash reporter doesn't understand runtime symbols when it generates a tombstone.
Until we have a solution for creating a dump on Android CoreCLR, this PR looks to log managed callstacks for the crashing thread whenever a dump would have been created. That way, there is some hint as to what may have caused the crash, despite not having the native callstack.
These managed callstacks will only be emitted during FailFast. Synchronous faults currently will not hit PROCCreateCrashDumpIfEnabled as Android registers default signal handlers that the runtime defer signal handling to. See #123735
Example ADB log