From 86b087f550b92ff98b6b295b6199a2e5a361a220 Mon Sep 17 00:00:00 2001 From: Pranav Bhandari Date: Fri, 20 Mar 2026 15:17:42 -0700 Subject: [PATCH] OSS test fix MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Summary: Fix NvmCacheTest.EvictToNvmGet failing in OSS CI. ``` /home/runner/work/CacheLib/CacheLib/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp:252: Failure Expected equality of these values: 0 nvm.getNumActiveHandles() Which is: 1 I0318 19:45:43.067236 184036 Driver.cpp:84] Driver: finish scheduler I0318 19:45:43.067307 184036 Driver.cpp:86] Driver: finish scheduler successful I0318 19:45:43.067613 184036 ThreadPoolJobScheduler.cpp:86] JobScheduler: join threads for reader_pool I0318 19:45:43.069042 184036 ThreadPoolJobScheduler.cpp:90] JobScheduler: join threads for reader_pool successful I0318 19:45:43.072589 184036 ThreadPoolJobScheduler.cpp:86] JobScheduler: join threads for writer_pool I0318 19:45:43.074025 184036 ThreadPoolJobScheduler.cpp:90] JobScheduler: join threads for writer_pool successful [ FAILED ] NvmCacheTest.EvictToNvmGet (433 ms) [----------] 1 test from NvmCacheTest (433 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (433 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] NvmCacheTest.EvictToNvmGet 1 FAILED TEST ``` Two bugs: 1. Unsigned underflow in loop index. The loop uses the i-- > 0 idiom which already decrements i before the body runs (values nKeys+99 down to 0). The extra index = i - 1 double-decremented, so on the last iteration (i=0), index = (unsigned)0 - 1 = UINT_MAX. This created a spurious lookup for a nonsensical key and skipped key1123. * **Fix**: index = i. 2. Race between async NVM callback and handle count check. When a key is fetched from NVM, the async callback runs onGetComplete →removeFromFillMap → ~GetCtx → wakeUpWaiters → baton_.post(). The baton wakes the test thread, but GetCtx::it (the fill handle) is destroyed after the baton post, still inside the same callback. Until that destruction completes, the async thread's handle count is +1. If the test thread reaches `getNumActiveHandles()` before the callback fully returns, it observes the stale +1. * **Fix**: call flushNvmCache() before checking handle counts. This calls scheduler_->finish() which busy-waits until all navy jobs (including the callback that destroys GetCtx::it) have fully completed. Reviewed By: byahn0996 Differential Revision: D97184057 --- cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp b/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp index 6ba940a45..774d491b7 100644 --- a/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp +++ b/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp @@ -215,11 +215,11 @@ TEST_F(NvmCacheTest, EvictToNvmGet) { const auto nEvictions = this->evictionCount() - evictBefore; ASSERT_LT(0, nEvictions); - // read from ram cache first so that we will not cause evictions + // Read from ram cache first so that we will not cause evictions // to navy for items that are still in ram-cache until we start - // reading items from navy + // reading items from navy. for (unsigned int i = nKeys + 100; i-- > 0;) { - unsigned int index = i - 1; + unsigned int index = i; auto key = folly::sformat("key{}", index); auto hdl = this->fetch(key, false /* ramOnly */); hdl.wait(); @@ -248,6 +248,10 @@ TEST_F(NvmCacheTest, EvictToNvmGet) { } } + // Flush to ensure all async navy callbacks (including GetCtx destruction + // which decrements handle counts) have fully completed. + nvm.flushNvmCache(); + // Reads are done. We should be at "0" active handle count across all threads. ASSERT_EQ(0, nvm.getNumActiveHandles()); ASSERT_EQ(0, nvm.getHandleCountForThread());