Skip to content

CASSANDRA-16939: Only remove shutdown hook for fatal heap OOMs#4691

Open
Shanzita wants to merge 1 commit intoapache:trunkfrom
Shanzita:CASSANDRA-16939-trunk
Open

CASSANDRA-16939: Only remove shutdown hook for fatal heap OOMs#4691
Shanzita wants to merge 1 commit intoapache:trunkfrom
Shanzita:CASSANDRA-16939-trunk

Conversation

@Shanzita
Copy link

Description:
CASSANDRA-16939: Only remove shutdown hook for fatal heap
OOMs

Previously, StorageService.removeShutdownHook() was
called unconditionally
for all OutOfMemoryErrors in
JVMStabilityInspector.inspectThrowable().
This removed the drain shutdown hook even for non-fatal
OOMs like
"Direct buffer memory" or "unable to create new native
thread", preventing
graceful shutdown afterward even though the JVM was
otherwise healthy.

The fix guards removeShutdownHook() with a new
isHeapSpaceOom() method
that reuses the existing FORCE_HEAP_OOM_IGNORE_SET to
only remove the hook
for fatal heap OOMs ("Java heap space", "GC Overhead limit
exceeded").

Changes:

  • JVMStabilityInspector.java: Guard
    removeShutdownHook() with isHeapSpaceOom() check
  • JVMStabilityInspectorTest.java: Added tests verifying
    shutdown hook is preserved
    for non-fatal OOMs and removed for fatal OOMs

Tests:

  • All existing tests pass (8/8, 0 failures)
  • Checkstyle clean (0 violations)

patch by Shanizta Siddiqua; reviewed by for
CASSANDRA-16939

drainField.set(StorageService.instance, testHook);

// "Direct buffer memory" is a non-fatal OOM — the shutdown hook must be preserved
try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shanzita please rework this to more idiomatic org.assertj.core.api.Assertions.assertThatThrownBy which is use extensively through the code base and is more succinct / readable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your reviews! I have removed the try/catch/fail block to use assertThatThrownBy as suggested.

}

@Test
public void testShutdownHookNotRemovedForNonFatalOom() throws Exception
Copy link
Contributor

@smiklosovic smiklosovic Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two test methods are 95% same. All it differs on is message in that exception and assertion if it should remove shutdown hook or not. Could you do a helper method which encapsulates it all with appropriate parameters to pass to it so the actual test methods will be effectively one-liners?

You could also make FORCE_HEAP_OOM_IGNORE_SET public and take its first value to test on a real value which will be used in production. If we ever change it in JVMStabilityInspector but not here we will not be testing anything "real". You should ideally iterate over all values, not sure how feasible it is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your reviews! I have extracted a helper method, made FORCE_HEAP_OOM_IGNORE_SET public, and iterating over all values in the fatal OOM test.

@Shanzita Shanzita force-pushed the CASSANDRA-16939-trunk branch from 84be9a8 to e79db48 Compare March 26, 2026 04:28
}

private static final Set<String> FORCE_HEAP_OOM_IGNORE_SET = ImmutableSet.of("Java heap space", "GC Overhead limit exceeded");
public static final Set<String> FORCE_HEAP_OOM_IGNORE_SET = ImmutableSet.of("Java heap space", "GC Overhead limit exceeded");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shanzita annotate this with @VisibleForTesting. You might additionally make it package protected as both test and this class are in the same package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants