Epoll optimizations by sgerbino · Pull Request #106 · cppalliance/corosio

sgerbino · 2026-02-06T04:53:46Z

Summary by CodeRabbit

Refactor
- Improved internal thread-safety and synchronization for I/O descriptor handling using mutex-protected state.
- Reworked scheduler to a deferred I/O, task-oriented model with explicit work accounting and better thread signaling.
- Adjusted timer and completion handling to always post completions to the scheduler.
- Optimized internal memory layout for better cache efficiency.

Reduces atomic contention on outstanding_work_ by batching work counting per-thread. Fast-path posts now only increment the thread-local private_outstanding_work counter; the global counter is updated in batch when handlers complete. Key changes: - Fast-path post() only increments private_outstanding_work - work_cleanup uses N-1 formula: handler consumes 1, produces N - task_cleanup flushes all private work from reactor - timer_service posts timer_op (like signal_op) to run inside work_cleanup scope, ensuring proper batching 2-thread scaling improves from ~0.56x to ~1.0x speedup.

Replace idle_thread_count_ with state_ variable that encodes both the signaled flag (bit 0) and waiter count (upper bits). This enables: - Signalers only call notify when waiters exist (state_ > 1) - Waiters check if already signaled before blocking (fast-path) - 22% reduction in futex calls in multi-threaded benchmarks Add helper functions for consistent signaling semantics: - signal_all: wake all waiters (shutdown/stop) - maybe_unlock_and_signal_one: conditional unlock with fallback - unlock_and_signal_one: always unlock, signal if waiters - clear_signal/wait_for_signal: wait coordination

Convert descriptor_data synchronization from lock-free atomics to a per-descriptor mutex. The mutex protects operation pointers and ready flags while I/O is performed outside the lock to minimize hold time.

Move I/O execution from reactor to scheduler thread. The reactor now just sets ready events atomically and enqueues descriptor_state for processing. Actual I/O happens when the scheduler pops and invokes the descriptor_state::operator(). This eliminates per-descriptor mutex locking from the reactor hot path, improving multi-threaded HTTP throughput by 2.5-3.6x: - 8 threads: 190K -> 475K req/s - 16 threads: 115K -> 414K req/s Changes: - descriptor_state extends scheduler_op with deferred I/O support - Add is_deferred_io() to distinguish deferred ops from work items - Add post_deferred_completions() helper for posting completed ops - Rename descriptor_data -> descriptor_state - Rename reactor_* -> task_* for clarity - Clean up do_one() with helper functions for private queue handling

Workers no longer wake other workers when more work exists in the queue. Only producers signal workers when posting new work.

work_cleanup now leaves the mutex held after splicing the private queue into completed_ops_, so the next do_one iteration reuses the same lock acquisition instead of unlock+relock. Callers use owns_lock() to avoid redundant locking.

Move inline completion logic from the scheduler into descriptor_state::operator()(), matching Asio's pattern where the descriptor performs I/O and executes the first handler directly. The scheduler no longer special-cases deferred I/O — descriptor_state flows through the normal work_cleanup path. Add compensating_work_started() for the all-EAGAIN case to offset the -1 that work_cleanup applies. Remove is_deferred_io virtual from scheduler_op.

Hoist find_context() before the for(;;) loop since the thread context is invariant for the lifetime of do_one. Saves a TLS read and linked-list walk on every retry iteration. Convert stopped_ from atomic<bool> to plain bool. All accesses are now under mutex protection. Remove redundant pre-lock stopped_ checks from public entry points — do_one's check under mutex is sufficient.

Pass scheduler_context directly from callers instead of calling find_context() per iteration. Eliminate redundant lock cycle when entering reactor with pending handlers by using unlock_and_signal_one instead of maybe_unlock_and_signal_one followed by re-lock.

…ructs The develop rebase removed data_ (8 bytes) from scheduler_op, shrinking it from 32 to 24 bytes. This shifted cache line boundaries in every derived struct (descriptor_state, epoll_op, socket_impl), causing a measurable throughput regression at 4+ threads. Restoring the 32-byte base size recovers the original layout and performance.

When the reactor queues a descriptor_state and a handler from a different op later destroys the impl via close()/destroy_impl(), the queued descriptor_state (a member of the freed impl) became a dangling node in completed_ops_. Fix by capturing shared_from_this() into descriptor_state::impl_ref_ in close_socket() when is_enqueued_ is true, keeping the impl alive until the descriptor is popped and processed.

Claim all pending ops in a single lock instead of separate locks per event type, reducing worst-case from 4 lock/unlock cycles to 2 and common-case from 2 to 1. Add cascading thread wakes in do_one so sleeping threads are notified when more work is queued. Remove dead code: set_ready_events, is_registered, update_epoll_events, update_descriptor_events, unused start() overload, and epoll_sockets alias. Use std::exchange for claim-and-null patterns throughout.

timer_op used the default scheduler_op constructor which sets func_ to nullptr. The IOCP scheduler dispatches posted ops via func_, causing an ACCESS_VIOLATION on Windows. Provide a do_complete function pointer so both dispatch paths work.

coderabbitai · 2026-02-06T04:54:02Z

📝 Walkthrough

Walkthrough

This pull request replaces atomic per-descriptor state with a mutex‑protected descriptor_state, introduces a deferred I/O model where readiness marks enqueue descriptor work, and restructures the epoll scheduler to use private/global work queues and condition-variable signaling instead of reactor ownership flags.

Changes

Cohort / File(s)	Summary
Descriptor & op types `src/corosio/src/detail/epoll/op.hpp`, `src/corosio/src/detail/scheduler_op.hpp`	Renamed `descriptor_data` → `descriptor_state`, added mutex-protected op pointers, readiness flags, scheduler/impl references, enqueue semantics, `add_ready_events()` and `operator()()` for deferred processing. Added padding `reserved_` to `scheduler_op`.
Scheduler core `src/corosio/src/detail/epoll/scheduler.hpp`, `src/corosio/src/detail/epoll/scheduler.cpp`	Rewrote scheduler to task-oriented model: condition variable signaling, private/global work queues, RAII guards (work_cleanup, task_cleanup), `post_deferred_completions()`, `compensating_work_started()`, and `register_descriptor(int, descriptor_state*)`. Removed `update_descriptor_events`.
Sockets implementation `src/corosio/src/detail/epoll/sockets.hpp`, `src/corosio/src/detail/epoll/sockets.cpp`	Replaced per-socket `descriptor_data` with `descriptor_state`; moved atomic op exchanges to mutex-guarded updates, deferred/inline I/O decision via `*_ready` flags, updated open/close/cancel flows to use `desc_state_`. Removed `update_epoll_events()` API.
Acceptors implementation `src/corosio/src/detail/epoll/acceptors.hpp`, `src/corosio/src/detail/epoll/acceptors.cpp`	Switched acceptor descriptor storage to `descriptor_state`, guarded accept/register/open/close paths with mutexed descriptor state, adjusted accept flow to compute `perform_now` under lock and route through `desc_state_`.
Timer service `src/corosio/src/detail/timer_service.cpp`	Introduced `timer_op` (a `scheduler_op`) to post timer completions to scheduler instead of directly resuming coroutines; expired/cancelled timers now create/post `timer_op` instances for deferred completion and correct work accounting.
Build/workflow `.github/workflows/ci.yml`	Added GCC coverage flag `-fprofile-update=atomic` to Linux GCC-13 coverage CI configuration.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant epoll_scheduler
  participant Reactor
  participant descriptor_state
  participant Worker

  Client->>epoll_scheduler: submit operation (read/write/connect/timer)
  epoll_scheduler->>descriptor_state: register descriptor / set op pointer
  Reactor-->>epoll_scheduler: epoll_wait -> events
  epoll_scheduler->>descriptor_state: add_ready_events(ev)
  descriptor_state->>epoll_scheduler: enqueue descriptor (is_enqueued_ guard)
  epoll_scheduler->>Worker: flush private queue -> schedule descriptor_state()
  Worker->>descriptor_state: descriptor_state::operator()() (lock, claim ops)
  descriptor_state->>Worker: complete ops (post completions to scheduler)
  Worker->>epoll_scheduler: post_deferred_completions(op_queue)
  epoll_scheduler->>Client: execute completion callbacks / resume coroutines

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Edge-Triggered Epoll and Backend Improvements #96: Overlapping descriptor_data → descriptor_state refactor and register_descriptor signature changes across scheduler, sockets, and acceptors.
Epoll optimizations #104: Scheduler refactor touching private/global work queues and task-oriented scheduling/wake semantics.
Add portable select() reactor backend for POSIX platforms #87: Related epoll backend changes introducing descriptor_state and deferred I/O semantics in op/scheduler/socket code.

Suggested reviewers

vinniefalco

Poem

🐰
I hop where descriptors softly wait,
I queue their dances, stitch their fate,
No frantic locks in hotpath play,
Work flows in queues, then hops away,
A tiny rabbit cheers: async is great!

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Epoll optimizations' is vague and generic, lacking specificity about the actual changes made in the pull request.	The title should be more specific. Consider: 'Refactor epoll descriptor state management to use mutex-protected operations' or 'Replace descriptor_data with descriptor_state for improved thread safety in epoll scheduler'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Fixes negative count errors from lcov when multi-threaded code paths race on GCC's non-atomic coverage counters.

cppalliance-bot · 2026-02-06T04:58:28Z

An automated preview of the documentation is available at https://106.corosio.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-02-06 05:03:14 UTC

codecov · 2026-02-06T05:01:22Z

Codecov Report

❌ Patch coverage is 68.53448% with 146 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.06%. Comparing base (da76c9c) to head (8afa09a).
⚠️ Report is 14 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/corosio/src/detail/epoll/scheduler.cpp	70.85%	65 Missing ⚠️
src/corosio/src/detail/epoll/sockets.cpp	59.23%	53 Missing ⚠️
src/corosio/src/detail/epoll/acceptors.cpp	70.00%	18 Missing ⚠️
src/corosio/src/detail/timer_service.cpp	80.85%	9 Missing ⚠️
src/corosio/src/detail/epoll/op.hpp	75.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #106      +/-   ##
===========================================
- Coverage    80.28%   80.06%   -0.22%     
===========================================
  Files           65       65              
  Lines         5376     5569     +193     
===========================================
+ Hits          4316     4459     +143     
- Misses        1060     1110      +50

Files with missing lines	Coverage Δ
src/corosio/src/detail/epoll/acceptors.hpp	`100.00% <ø> (ø)`
src/corosio/src/detail/epoll/scheduler.hpp	`0.00% <ø> (ø)`
src/corosio/src/detail/epoll/sockets.hpp	`91.66% <ø> (ø)`
src/corosio/src/detail/scheduler_op.hpp	`58.33% <ø> (+13.88%)`	⬆️
src/corosio/src/detail/epoll/op.hpp	`84.82% <75.00%> (+0.56%)`	⬆️
src/corosio/src/detail/timer_service.cpp	`88.79% <80.85%> (-2.75%)`	⬇️
src/corosio/src/detail/epoll/acceptors.cpp	`79.59% <70.00%> (-0.87%)`	⬇️
src/corosio/src/detail/epoll/sockets.cpp	`74.48% <59.23%> (-2.42%)`	⬇️
src/corosio/src/detail/epoll/scheduler.cpp	`79.35% <70.85%> (+0.90%)`	⬆️

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da76c9c...8afa09a. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cppalliance-bot · 2026-02-06T05:01:52Z

GCOVR code coverage report https://106.corosio.prtest3.cppalliance.org/gcovr/index.html
LCOV code coverage report https://106.corosio.prtest3.cppalliance.org/genhtml/index.html
Coverage Diff Report https://106.corosio.prtest3.cppalliance.org/diff-report/index.html

Build time: 2026-02-06 05:06:44 UTC

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@src/corosio/src/detail/epoll/op.hpp`:
- Around line 146-147: The override of destroy() currently does nothing, leaving
descriptor_state::impl_ref_ set and causing a self-referencing cycle on
shutdown; update destroy() (the override in the scheduler op derived class) to
clear/reset impl_ref_ (e.g., impl_ref_.reset()) so that when
scheduler_op_queue::~scheduler_op_queue() calls destroy() on queued items the
shared_ptr cycle is broken — check related symbols close_socket(), is_enqueued_,
and descriptor_state to ensure impl_ref_ is the member being cleared.
- Around line 137-141: The producer uses ready_events_.fetch_or in
add_ready_events with memory_order_relaxed, which doesn't provide release
semantics to match the consumer's exchange(..., memory_order_acquire) in
scheduler.cpp; change the atomic operation in add_ready_events to use
std::memory_order_release (i.e., call ready_events_.fetch_or(ev,
std::memory_order_release)) so the producer establishes the necessary
release-acquire synchronization with the scheduler's exchange.

In `@src/corosio/src/detail/epoll/scheduler.cpp`:
- Around line 800-812: The current wait_for_signal_for uses cond_.wait_for with
the full timeout on each call, so spurious wakeups in do_one/wait_one can cause
total sleep to exceed the intended timeout; change the design to compute an
absolute deadline once (using std::chrono::steady_clock::now() +
std::chrono::microseconds(timeout_us)) in the caller (e.g., wait_one or do_one)
and replace epoll_scheduler::wait_for_signal_for(std::unique_lock<std::mutex>&,
long) with an overload that accepts a std::chrono::steady_clock::time_point
deadline and uses cond_.wait_until(lock, deadline) (or wait_until in a loop
checking state_ and whether now >= deadline) so the total wait is capped by the
deadline; update all callers (do_one, wait_one) to pass the precomputed deadline
and to stop looping when the deadline is reached.
- Around line 703-712: The comment is wrong and the global work counter must be
updated before moving ops into completed_ops_; modify
epoll_scheduler::drain_thread_queue to, when count>0, increment
outstanding_work_ (or transfer ctx->private_outstanding_work) by count while
holding mutex_ before calling completed_ops_.splice(queue), then proceed to call
maybe_unlock_and_signal_one(lock); update the comment to reflect that the global
outstanding_work_ is flushed here to avoid underflow when those ops complete.
- Around line 282-302: The re-registration path that assigns read_op/write_op
under the mutex (the block that sets read_op = rd / write_op = wr) must, while
still holding the lock, check the corresponding readiness flags (read_ready and
write_ready) and if either is true ensure the descriptor is re-enqueued or
scheduled for immediate processing (e.g., push back onto local_ops or call the
existing enqueue/CAS path) so we don't leave is_enqueued_ == false with ready
ops; update the logic inside operator()'s re-registration block to perform this
readiness check and trigger the same enqueue/posting mechanism used by the
reactor when it sees events (reuse
scheduler_->post_deferred_completions/local_ops enqueue or the descriptor
enqueue routine) so a ready op will be processed without waiting for a new EPOLL
event.

In `@src/corosio/src/detail/epoll/scheduler.hpp`:
- Line 111: Update the Doxygen `@param` for the parameter named desc to reference
the current type descriptor_state* (instead of the old "descriptor data") and
briefly describe its purpose (e.g., pointer to descriptor_state used for
tracking epoll descriptor state, stored in epoll_event.data.ptr); locate the
comment in scheduler.hpp where `@param` desc is declared and replace the stale
wording to mention descriptor_state* and its role.

🧹 Nitpick comments (5)

src/corosio/src/detail/scheduler_op.hpp (1)
120-123: Consider adding a static_assert to enforce the 32-byte size invariant.

The padding relies on assumptions about the base class layout. If intrusive_queue::node or other members change, this could silently break the alignment goal.
💡 Suggested assertion (placed after the class definition)
 };
 
+static_assert(sizeof(scheduler_op) == 32,
+    "Update reserved_ to maintain 32-byte size for cache alignment");
+
 //------------------------------------------------------------------------------
src/corosio/src/detail/timer_service.cpp (2)
61-77: d and h are accessed as members during d.post(h) — inconsistent with epoll_op safety pattern.

epoll_op::operator()() (in op.hpp lines 216-218) deliberately moves ex and h to stack locals before dispatch to prevent use-after-free. Here, d.post(h) references member variables directly. This works because timer_op is independently heap-allocated (inline execution of h can't destroy this object), but the inconsistency is a maintenance hazard.
💡 Align with the established safety pattern
     void operator()() override
     {
         if (ec_out)
             *ec_out = ec_value;
 
-        // Capture before posting (coro may destroy this op)
         auto* service = sched;
         sched = nullptr;
 
-        d.post(h);
+        capy::executor_ref saved_d(std::move(d));
+        capy::coro saved_h(std::move(h));
+
+        saved_d.post(saved_h);
 
-        // Balance the on_work_started() from timer_impl::wait()
         if (service)
             service->on_work_finished();
 
         delete this;
     }
251-261: Duplicated timer_op creation in three locations.

The same field-setting pattern (h, d, ec_out, ec_value, sched + post) appears in update_timer (line 254), cancel_timer (line 296), and process_expired (line 335). A small factory helper would reduce repetition.
💡 Example helper
// In timer_service_impl, private:
timer_op* make_timer_op(
    capy::coro h,
    capy::executor_ref d,
    std::error_code* ec_out,
    std::error_code ec_value)
{
    auto* op = new timer_op;
    op->h = h;
    op->d = std::move(d);
    op->ec_out = ec_out;
    op->ec_value = ec_value;
    op->sched = sched_;
    return op;
}
src/corosio/src/detail/epoll/sockets.cpp (1)

261-290: Duplicated try-ready-or-register pattern across four I/O paths.

The same guarded check-readiness → perform_io() → re-register-on-EAGAIN pattern appears in connect(), do_read_io(), do_write_io(), and accept() (in acceptors.cpp). Each differs only in the ready flag (read_ready/write_ready) and the op slot (read_op/write_op/connect_op).

A helper template or lambda parameterised on the flag and slot pointer could consolidate this, but each call site's context is slightly different (return type, work accounting), so this is at your discretion.
src/corosio/src/detail/epoll/scheduler.cpp (1)
513-524: interrupt_reactor() called while holding mutex; work_finished() unlocks first.

In work_finished() (line 687-689), the mutex is explicitly released before calling interrupt_reactor(). Here in stop(), interrupt_reactor() runs with the lock held, blocking other threads during the eventfd write syscall.
Consistent unlock before interrupt
     std::unique_lock lock(mutex_);
     if (!stopped_)
     {
         stopped_ = true;
         signal_all(lock);
+        lock.unlock();
         interrupt_reactor();
     }

coderabbitai · 2026-02-06T05:05:48Z

src/corosio/src/detail/epoll/op.hpp

+    /// Add ready events atomically.
+    void add_ready_events(std::uint32_t ev) noexcept
+    {
+        ready_events_.fetch_or(ev, std::memory_order_relaxed);
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n -C5 'ready_events_' --type=cpp

Repository: cppalliance/corosio

Length of output: 3334

🏁 Script executed:

rg -n -B10 -A10 'ready_events_\.exchange' --type=cpp

Repository: cppalliance/corosio

Length of output: 1572

Change fetch_or to use memory_order_release to pair with consumer's memory_order_acquire.

The producer-side fetch_or with memory_order_relaxed (line 140) does not provide the release semantics needed to synchronize with the consumer's exchange(..., memory_order_acquire) (scheduler.cpp:190). Acquire semantics require a corresponding release store to establish a happens-before relationship; relaxed operations provide no such guarantee. This can result in I/O side-effects from the reactor thread not being visible to the scheduler thread when it reads ready_events_.

Change to memory_order_release on the producer side:

ready_events_.fetch_or(ev, std::memory_order_release);

🤖 Prompt for AI Agents

In `@src/corosio/src/detail/epoll/op.hpp` around lines 137 - 141, The producer uses ready_events_.fetch_or in add_ready_events with memory_order_relaxed, which doesn't provide release semantics to match the consumer's exchange(..., memory_order_acquire) in scheduler.cpp; change the atomic operation in add_ready_events to use std::memory_order_release (i.e., call ready_events_.fetch_or(ev, std::memory_order_release)) so the producer establishes the necessary release-acquire synchronization with the scheduler's exchange.

coderabbitai · 2026-02-06T05:05:48Z

src/corosio/src/detail/epoll/op.hpp

+    /// Destroy without invoking.
+    void destroy() override {}


⚠️ Potential issue | 🟠 Major

destroy() should clear impl_ref_ to prevent a memory leak during shutdown.

close_socket() sets impl_ref_ (holding a shared_ptr to the owning impl) when is_enqueued_ is true. If the scheduler shuts down without draining its queue, scheduler_op_queue::~scheduler_op_queue() calls destroy() on remaining items. The empty override leaves impl_ref_ set, creating a self-referencing cycle (the impl → descriptor_state → impl_ref_ → impl) that is never broken.

🐛 Proposed fix

/// Destroy without invoking. - void destroy() override {} + void destroy() override { impl_ref_.reset(); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Destroy without invoking.

void destroy() override {}

/// Destroy without invoking.

void destroy() override { impl_ref_.reset(); }

🤖 Prompt for AI Agents

In `@src/corosio/src/detail/epoll/op.hpp` around lines 146 - 147, The override of destroy() currently does nothing, leaving descriptor_state::impl_ref_ set and causing a self-referencing cycle on shutdown; update destroy() (the override in the scheduler op derived class) to clear/reset impl_ref_ (e.g., impl_ref_.reset()) so that when scheduler_op_queue::~scheduler_op_queue() calls destroy() on queued items the shared_ptr cycle is broken — check related symbols close_socket(), is_enqueued_, and descriptor_state to ensure impl_ref_ is the member being cleared.

coderabbitai · 2026-02-06T05:05:48Z

src/corosio/src/detail/epoll/scheduler.cpp

+    if (rd || wr)
+    {
+        std::lock_guard lock(mutex);
+        if (rd)
+            read_op = rd;
+        if (wr)
+            write_op = wr;
+    }
+
+    // Execute first handler inline — the scheduler's work_cleanup
+    // accounts for this as the "consumed" work item
+    scheduler_op* first = local_ops.pop();
+    if (first)
+    {
+        scheduler_->post_deferred_completions(local_ops);
+        (*first)();
+    }
+    else
+    {
+        scheduler_->compensating_work_started();
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find descriptor_state definition in headers find . -name "*.hpp" -o -name "*.h" | head -20

Repository: cppalliance/corosio

Length of output: 889

🏁 Script executed:

#!/bin/bash # Search for descriptor_state definition fd -e hpp -e h | xargs grep -l "struct descriptor_state\|class descriptor_state" | head -5

Repository: cppalliance/corosio

Length of output: 141

🏁 Script executed:

#!/bin/bash # Examine descriptor_state definition and read_ready/write_ready members rg -n "read_ready|write_ready" -A 2 -B 2 --type=cpp --type=hpp -g "!scheduler.cpp" | head -100

Repository: cppalliance/corosio

Length of output: 90

🏁 Script executed:

#!/bin/bash # Check what happens when read_ready/write_ready are set but no handler is enqueued # Look for any re-enqueue logic triggered by readiness flags rg -n "read_ready|write_ready" --type=cpp --type=hpp -A 5 | grep -A 5 "enqueue\|re-enqueue\|is_enqueued_" | head -50

Repository: cppalliance/corosio

Length of output: 90

🏁 Script executed:

#!/bin/bash # Read the descriptor_state definition cat src/corosio/src/detail/epoll/op.hpp

Repository: cppalliance/corosio

Length of output: 11163

🏁 Script executed:

#!/bin/bash # Search for read_ready/write_ready without type filters grep -n "read_ready\|write_ready" src/corosio/src/detail/epoll/scheduler.cpp | head -30

Repository: cppalliance/corosio

Length of output: 210

🏁 Script executed:

#!/bin/bash # Check if there's any re-check of readiness flags after re-registering ops sed -n '280,310p' src/corosio/src/detail/epoll/scheduler.cpp

Repository: cppalliance/corosio

Length of output: 681

🏁 Script executed:

#!/bin/bash # Search for where read_ready and write_ready are CHECKED/USED (not just set) grep -n "read_ready\|write_ready" src/corosio/src/detail/epoll/scheduler.cpp | grep -v "= true\|= false"

Repository: cppalliance/corosio

Length of output: 45

🏁 Script executed:

#!/bin/bash # Get more context around where read_ready is cleared (line 655) sed -n '645,670p' src/corosio/src/detail/epoll/scheduler.cpp

Repository: cppalliance/corosio

Length of output: 656

🏁 Script executed:

#!/bin/bash # Search for any logic that re-enqueues or triggers I/O when readiness flags are set grep -n "need_reenqueue\|read_ready.*write_ready\|if.*read_ready\|if.*write_ready" src/corosio/src/detail/epoll/scheduler.cpp

Repository: cppalliance/corosio

Length of output: 45

🏁 Script executed:

#!/bin/bash # Check the broader context of lines 280-310 to see if there's any post-registration logic sed -n '200,320p' src/corosio/src/detail/epoll/scheduler.cpp

Repository: cppalliance/corosio

Length of output: 2718

🏁 Script executed:

#!/bin/bash # Check if there's any mechanism that periodically re-enqueues or re-checks # descriptors after they're enqueued grep -n "is_enqueued_\|re.*enqueue\|retry" src/corosio/src/detail/epoll/scheduler.cpp | head -30

Repository: cppalliance/corosio

Length of output: 325

🏁 Script executed:

#!/bin/bash # Check if post_deferred_completions or similar mechanism can handle # re-registration with ready flags sed -n '700,750p' src/corosio/src/detail/epoll/scheduler.cpp

Repository: cppalliance/corosio

Length of output: 1293

🏁 Script executed:

#!/bin/brief # Verify: after operator() returns, is the descriptor checked again? # Look at do_one to see what happens after (*op)() is called sed -n '1050,1100p' src/corosio/src/detail/epoll/scheduler.cpp

Repository: cppalliance/corosio

Length of output: 1228

Race condition: EAGAIN re-registration can leave operations stuck with readiness flag set but unqueued.

After EAGAIN, ops are re-registered under lock at lines 284–288. However, a concurrent operator() invocation may have already set read_ready = true (line 218) or write_ready = true (line 225) because is_enqueued_ was cleared at line 184.

Scenario: Thread A executes operator(), sets is_enqueued_ = false, extracts read_op, and calls perform_io → EAGAIN. The reactor sees new EPOLLIN, successfully enqueues the descriptor again (CAS at line 986 succeeds). Thread B pops and runs operator(), finds read_op == nullptr (Thread A holds it), sets read_ready = true, and returns. Thread A then re-registers read_op under lock. Final state: is_enqueued_ == false, read_ready == true, read_op != nullptr, with no mechanism to process the pending op. With edge-triggered epoll, no new event fires unless more data arrives → indefinite stall.

The re-registration block should check the readiness flags and, if set, re-enqueue the descriptor for another processing round or immediately re-attempt I/O.

🤖 Prompt for AI Agents

In `@src/corosio/src/detail/epoll/scheduler.cpp` around lines 282 - 302, The re-registration path that assigns read_op/write_op under the mutex (the block that sets read_op = rd / write_op = wr) must, while still holding the lock, check the corresponding readiness flags (read_ready and write_ready) and if either is true ensure the descriptor is re-enqueued or scheduled for immediate processing (e.g., push back onto local_ops or call the existing enqueue/CAS path) so we don't leave is_enqueued_ == false with ready ops; update the logic inside operator()'s re-registration block to perform this readiness check and trigger the same enqueue/posting mechanism used by the reactor when it sees events (reuse scheduler_->post_deferred_completions/local_ops enqueue or the descriptor enqueue routine) so a ready op will be processed without waiting for a new EPOLL event.

coderabbitai · 2026-02-06T05:05:48Z

src/corosio/src/detail/epoll/scheduler.cpp

 void
 epoll_scheduler::
 drain_thread_queue(op_queue& queue, long count) const
 {
-    std::lock_guard lock(mutex_);
    // Note: outstanding_work_ was already incremented when posting
+    std::unique_lock lock(mutex_);
    completed_ops_.splice(queue);
    if (count > 0)
-        wakeup_event_.notify_all();
+        maybe_unlock_and_signal_one(lock);
+}


⚠️ Potential issue | 🟠 Major

Misleading comment and missing work-count flush in drain_thread_queue.

The comment "outstanding_work_ was already incremented when posting" is incorrect for the fast path. post() fast-path (lines 454-455) only increments ctx->private_outstanding_work, not outstanding_work_. If drain_thread_queue is ever called with count > 0, the work items spliced into completed_ops_ will have no corresponding global work count, causing outstanding_work_ to underflow when those ops complete.

In practice this is a safety net (the normal flow drains via work_cleanup / task_cleanup), but the accounting is still incorrect if triggered.

Fix: flush work count before draining

void epoll_scheduler:: drain_thread_queue(op_queue& queue, long count) const { - // Note: outstanding_work_ was already incremented when posting + if (count > 0) + outstanding_work_.fetch_add(count, std::memory_order_relaxed); + std::unique_lock lock(mutex_); completed_ops_.splice(queue); if (count > 0) maybe_unlock_and_signal_one(lock); }

🤖 Prompt for AI Agents

In `@src/corosio/src/detail/epoll/scheduler.cpp` around lines 703 - 712, The comment is wrong and the global work counter must be updated before moving ops into completed_ops_; modify epoll_scheduler::drain_thread_queue to, when count>0, increment outstanding_work_ (or transfer ctx->private_outstanding_work) by count while holding mutex_ before calling completed_ops_.splice(queue), then proceed to call maybe_unlock_and_signal_one(lock); update the comment to reflect that the global outstanding_work_ is flushed here to avoid underflow when those ops complete.

coderabbitai · 2026-02-06T05:05:48Z

src/corosio/src/detail/epoll/scheduler.cpp

+void
+epoll_scheduler::
+wait_for_signal_for(
+    std::unique_lock<std::mutex>& lock,
+    long timeout_us) const
+{
+    if ((state_ & 1) == 0)
+    {
+        state_ += 2;
+        cond_.wait_for(lock, std::chrono::microseconds(timeout_us));
+        state_ -= 2;
    }
-    else if (reactor_running_ && !reactor_interrupted_)
+}


⚠️ Potential issue | 🟠 Major

wait_one() can exceed its timeout due to spurious wakeups.

wait_for_signal_for calls cond_.wait_for with the full timeout_us on every invocation. When do_one loops back after a spurious wakeup (line 1086-1090), it passes the original timeout_us again, not the remaining time. This means wait_one(usec) could block for up to N × usec where N is the number of spurious wakeups plus one.

Consider using an absolute deadline (wait_until) to cap the total wait time:

Sketch: use absolute deadline

One approach is to compute the deadline once in do_one (or wait_one) and pass it through, switching wait_for_signal_for to use cond_.wait_until:

void epoll_scheduler:: wait_for_signal_for( std::unique_lock<std::mutex>& lock, - long timeout_us) const + std::chrono::steady_clock::time_point deadline) const { if ((state_ & 1) == 0) { state_ += 2; - cond_.wait_for(lock, std::chrono::microseconds(timeout_us)); + cond_.wait_until(lock, deadline); state_ -= 2; } }

Callers would compute the deadline before entering the loop.

Also applies to: 1086-1091

🤖 Prompt for AI Agents

In `@src/corosio/src/detail/epoll/scheduler.cpp` around lines 800 - 812, The current wait_for_signal_for uses cond_.wait_for with the full timeout on each call, so spurious wakeups in do_one/wait_one can cause total sleep to exceed the intended timeout; change the design to compute an absolute deadline once (using std::chrono::steady_clock::now() + std::chrono::microseconds(timeout_us)) in the caller (e.g., wait_one or do_one) and replace epoll_scheduler::wait_for_signal_for(std::unique_lock<std::mutex>&, long) with an overload that accepts a std::chrono::steady_clock::time_point deadline and uses cond_.wait_until(lock, deadline) (or wait_until in a loop checking state_ and whether now >= deadline) so the total wait is capped by the deadline; update all callers (do_one, wait_one) to pass the precomputed deadline and to stop looping when the deadline is reached.

coderabbitai · 2026-02-06T05:05:48Z

src/corosio/src/detail/epoll/scheduler.hpp

        tracks pending read/write/connect operations.

        @param fd The file descriptor to register.
        @param desc Pointer to descriptor data (stored in epoll_event.data.ptr).


⚠️ Potential issue | 🟡 Minor

Stale @param description references old type name.

The @param desc text says "Pointer to descriptor data" but the parameter type is now descriptor_state*. Update to match.

- `@param` desc Pointer to descriptor data (stored in epoll_event.data.ptr). + `@param` desc Pointer to descriptor state (stored in epoll_event.data.ptr).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@param desc Pointer to descriptor data (stored in epoll_event.data.ptr).

`@param` desc Pointer to descriptor state (stored in epoll_event.data.ptr).

🤖 Prompt for AI Agents

In `@src/corosio/src/detail/epoll/scheduler.hpp` at line 111, Update the Doxygen `@param` for the parameter named desc to reference the current type descriptor_state* (instead of the old "descriptor data") and briefly describe its purpose (e.g., pointer to descriptor_state used for tracking epoll descriptor state, stored in epoll_event.data.ptr); locate the comment in scheduler.hpp where `@param` desc is declared and replace the stale wording to mention descriptor_state* and its role.

sgerbino added 13 commits February 5, 2026 23:29

Replace atomic pointers with mutex in descriptor_data

e9fb520

Convert descriptor_data synchronization from lock-free atomics to a per-descriptor mutex. The mutex protects operation pointers and ready flags while I/O is performed outside the lock to minimize hold time.

Remove cascade wake pattern to reduce futex contention

520f684

Workers no longer wake other workers when more work exists in the queue. Only producers signal workers when posting new work.

Use atomic profile counters in coverage build

8afa09a

Fixes negative count errors from lcov when multi-threaded code paths race on GCC's non-atomic coverage counters.

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

sgerbino merged commit 0feb528 into cppalliance:develop Feb 6, 2026
17 checks passed

sgerbino deleted the pr/epoll branch February 6, 2026 05:12

	@param desc Pointer to descriptor data (stored in epoll_event.data.ptr).
	`@param` desc Pointer to descriptor state (stored in epoll_event.data.ptr).

Conversation

sgerbino commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

cppalliance-bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cppalliance-bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sgerbino commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

cppalliance-bot commented Feb 6, 2026 •

edited

Loading

codecov bot commented Feb 6, 2026 •

edited

Loading

cppalliance-bot commented Feb 6, 2026 •

edited

Loading