Conversation
Reduces atomic contention on outstanding_work_ by batching work counting per-thread. Fast-path posts now only increment the thread-local private_outstanding_work counter; the global counter is updated in batch when handlers complete. Key changes: - Fast-path post() only increments private_outstanding_work - work_cleanup uses N-1 formula: handler consumes 1, produces N - task_cleanup flushes all private work from reactor - timer_service posts timer_op (like signal_op) to run inside work_cleanup scope, ensuring proper batching 2-thread scaling improves from ~0.56x to ~1.0x speedup.
Replace idle_thread_count_ with state_ variable that encodes both the signaled flag (bit 0) and waiter count (upper bits). This enables: - Signalers only call notify when waiters exist (state_ > 1) - Waiters check if already signaled before blocking (fast-path) - 22% reduction in futex calls in multi-threaded benchmarks Add helper functions for consistent signaling semantics: - signal_all: wake all waiters (shutdown/stop) - maybe_unlock_and_signal_one: conditional unlock with fallback - unlock_and_signal_one: always unlock, signal if waiters - clear_signal/wait_for_signal: wait coordination
Convert descriptor_data synchronization from lock-free atomics to a per-descriptor mutex. The mutex protects operation pointers and ready flags while I/O is performed outside the lock to minimize hold time.
Move I/O execution from reactor to scheduler thread. The reactor now just sets ready events atomically and enqueues descriptor_state for processing. Actual I/O happens when the scheduler pops and invokes the descriptor_state::operator(). This eliminates per-descriptor mutex locking from the reactor hot path, improving multi-threaded HTTP throughput by 2.5-3.6x: - 8 threads: 190K -> 475K req/s - 16 threads: 115K -> 414K req/s Changes: - descriptor_state extends scheduler_op with deferred I/O support - Add is_deferred_io() to distinguish deferred ops from work items - Add post_deferred_completions() helper for posting completed ops - Rename descriptor_data -> descriptor_state - Rename reactor_* -> task_* for clarity - Clean up do_one() with helper functions for private queue handling
Workers no longer wake other workers when more work exists in the queue. Only producers signal workers when posting new work.
work_cleanup now leaves the mutex held after splicing the private queue into completed_ops_, so the next do_one iteration reuses the same lock acquisition instead of unlock+relock. Callers use owns_lock() to avoid redundant locking.
Move inline completion logic from the scheduler into descriptor_state::operator()(), matching Asio's pattern where the descriptor performs I/O and executes the first handler directly. The scheduler no longer special-cases deferred I/O — descriptor_state flows through the normal work_cleanup path. Add compensating_work_started() for the all-EAGAIN case to offset the -1 that work_cleanup applies. Remove is_deferred_io virtual from scheduler_op.
Hoist find_context() before the for(;;) loop since the thread context is invariant for the lifetime of do_one. Saves a TLS read and linked-list walk on every retry iteration. Convert stopped_ from atomic<bool> to plain bool. All accesses are now under mutex protection. Remove redundant pre-lock stopped_ checks from public entry points — do_one's check under mutex is sufficient.
Pass scheduler_context directly from callers instead of calling find_context() per iteration. Eliminate redundant lock cycle when entering reactor with pending handlers by using unlock_and_signal_one instead of maybe_unlock_and_signal_one followed by re-lock.
…ructs The develop rebase removed data_ (8 bytes) from scheduler_op, shrinking it from 32 to 24 bytes. This shifted cache line boundaries in every derived struct (descriptor_state, epoll_op, socket_impl), causing a measurable throughput regression at 4+ threads. Restoring the 32-byte base size recovers the original layout and performance.
When the reactor queues a descriptor_state and a handler from a different op later destroys the impl via close()/destroy_impl(), the queued descriptor_state (a member of the freed impl) became a dangling node in completed_ops_. Fix by capturing shared_from_this() into descriptor_state::impl_ref_ in close_socket() when is_enqueued_ is true, keeping the impl alive until the descriptor is popped and processed.
Claim all pending ops in a single lock instead of separate locks per event type, reducing worst-case from 4 lock/unlock cycles to 2 and common-case from 2 to 1. Add cascading thread wakes in do_one so sleeping threads are notified when more work is queued. Remove dead code: set_ready_events, is_registered, update_epoll_events, update_descriptor_events, unused start() overload, and epoll_sockets alias. Use std::exchange for claim-and-null patterns throughout.
timer_op used the default scheduler_op constructor which sets func_ to nullptr. The IOCP scheduler dispatches posted ops via func_, causing an ACCESS_VIOLATION on Windows. Provide a do_complete function pointer so both dispatch paths work.
📝 WalkthroughWalkthroughThis pull request replaces atomic per-descriptor state with a mutex‑protected Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant epoll_scheduler
participant Reactor
participant descriptor_state
participant Worker
Client->>epoll_scheduler: submit operation (read/write/connect/timer)
epoll_scheduler->>descriptor_state: register descriptor / set op pointer
Reactor-->>epoll_scheduler: epoll_wait -> events
epoll_scheduler->>descriptor_state: add_ready_events(ev)
descriptor_state->>epoll_scheduler: enqueue descriptor (is_enqueued_ guard)
epoll_scheduler->>Worker: flush private queue -> schedule descriptor_state()
Worker->>descriptor_state: descriptor_state::operator()() (lock, claim ops)
descriptor_state->>Worker: complete ops (post completions to scheduler)
Worker->>epoll_scheduler: post_deferred_completions(op_queue)
epoll_scheduler->>Client: execute completion callbacks / resume coroutines
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Fixes negative count errors from lcov when multi-threaded code paths race on GCC's non-atomic coverage counters.
|
An automated preview of the documentation is available at https://106.corosio.prtest3.cppalliance.org/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-02-06 05:03:14 UTC |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #106 +/- ##
===========================================
- Coverage 80.28% 80.06% -0.22%
===========================================
Files 65 65
Lines 5376 5569 +193
===========================================
+ Hits 4316 4459 +143
- Misses 1060 1110 +50
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
GCOVR code coverage report https://106.corosio.prtest3.cppalliance.org/gcovr/index.html Build time: 2026-02-06 05:06:44 UTC |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@src/corosio/src/detail/epoll/op.hpp`:
- Around line 146-147: The override of destroy() currently does nothing, leaving
descriptor_state::impl_ref_ set and causing a self-referencing cycle on
shutdown; update destroy() (the override in the scheduler op derived class) to
clear/reset impl_ref_ (e.g., impl_ref_.reset()) so that when
scheduler_op_queue::~scheduler_op_queue() calls destroy() on queued items the
shared_ptr cycle is broken — check related symbols close_socket(), is_enqueued_,
and descriptor_state to ensure impl_ref_ is the member being cleared.
- Around line 137-141: The producer uses ready_events_.fetch_or in
add_ready_events with memory_order_relaxed, which doesn't provide release
semantics to match the consumer's exchange(..., memory_order_acquire) in
scheduler.cpp; change the atomic operation in add_ready_events to use
std::memory_order_release (i.e., call ready_events_.fetch_or(ev,
std::memory_order_release)) so the producer establishes the necessary
release-acquire synchronization with the scheduler's exchange.
In `@src/corosio/src/detail/epoll/scheduler.cpp`:
- Around line 800-812: The current wait_for_signal_for uses cond_.wait_for with
the full timeout on each call, so spurious wakeups in do_one/wait_one can cause
total sleep to exceed the intended timeout; change the design to compute an
absolute deadline once (using std::chrono::steady_clock::now() +
std::chrono::microseconds(timeout_us)) in the caller (e.g., wait_one or do_one)
and replace epoll_scheduler::wait_for_signal_for(std::unique_lock<std::mutex>&,
long) with an overload that accepts a std::chrono::steady_clock::time_point
deadline and uses cond_.wait_until(lock, deadline) (or wait_until in a loop
checking state_ and whether now >= deadline) so the total wait is capped by the
deadline; update all callers (do_one, wait_one) to pass the precomputed deadline
and to stop looping when the deadline is reached.
- Around line 703-712: The comment is wrong and the global work counter must be
updated before moving ops into completed_ops_; modify
epoll_scheduler::drain_thread_queue to, when count>0, increment
outstanding_work_ (or transfer ctx->private_outstanding_work) by count while
holding mutex_ before calling completed_ops_.splice(queue), then proceed to call
maybe_unlock_and_signal_one(lock); update the comment to reflect that the global
outstanding_work_ is flushed here to avoid underflow when those ops complete.
- Around line 282-302: The re-registration path that assigns read_op/write_op
under the mutex (the block that sets read_op = rd / write_op = wr) must, while
still holding the lock, check the corresponding readiness flags (read_ready and
write_ready) and if either is true ensure the descriptor is re-enqueued or
scheduled for immediate processing (e.g., push back onto local_ops or call the
existing enqueue/CAS path) so we don't leave is_enqueued_ == false with ready
ops; update the logic inside operator()'s re-registration block to perform this
readiness check and trigger the same enqueue/posting mechanism used by the
reactor when it sees events (reuse
scheduler_->post_deferred_completions/local_ops enqueue or the descriptor
enqueue routine) so a ready op will be processed without waiting for a new EPOLL
event.
In `@src/corosio/src/detail/epoll/scheduler.hpp`:
- Line 111: Update the Doxygen `@param` for the parameter named desc to reference
the current type descriptor_state* (instead of the old "descriptor data") and
briefly describe its purpose (e.g., pointer to descriptor_state used for
tracking epoll descriptor state, stored in epoll_event.data.ptr); locate the
comment in scheduler.hpp where `@param` desc is declared and replace the stale
wording to mention descriptor_state* and its role.
🧹 Nitpick comments (5)
src/corosio/src/detail/scheduler_op.hpp (1)
120-123: Consider adding astatic_assertto enforce the 32-byte size invariant.The padding relies on assumptions about the base class layout. If
intrusive_queue::nodeor other members change, this could silently break the alignment goal.💡 Suggested assertion (placed after the class definition)
}; +static_assert(sizeof(scheduler_op) == 32, + "Update reserved_ to maintain 32-byte size for cache alignment"); + //------------------------------------------------------------------------------src/corosio/src/detail/timer_service.cpp (2)
61-77:dandhare accessed as members duringd.post(h)— inconsistent withepoll_opsafety pattern.
epoll_op::operator()()(inop.hpplines 216-218) deliberately movesexandhto stack locals before dispatch to prevent use-after-free. Here,d.post(h)references member variables directly. This works becausetimer_opis independently heap-allocated (inline execution ofhcan't destroy this object), but the inconsistency is a maintenance hazard.💡 Align with the established safety pattern
void operator()() override { if (ec_out) *ec_out = ec_value; - // Capture before posting (coro may destroy this op) auto* service = sched; sched = nullptr; - d.post(h); + capy::executor_ref saved_d(std::move(d)); + capy::coro saved_h(std::move(h)); + + saved_d.post(saved_h); - // Balance the on_work_started() from timer_impl::wait() if (service) service->on_work_finished(); delete this; }
251-261: Duplicatedtimer_opcreation in three locations.The same field-setting pattern (
h,d,ec_out,ec_value,sched+ post) appears inupdate_timer(line 254),cancel_timer(line 296), andprocess_expired(line 335). A small factory helper would reduce repetition.💡 Example helper
// In timer_service_impl, private: timer_op* make_timer_op( capy::coro h, capy::executor_ref d, std::error_code* ec_out, std::error_code ec_value) { auto* op = new timer_op; op->h = h; op->d = std::move(d); op->ec_out = ec_out; op->ec_value = ec_value; op->sched = sched_; return op; }src/corosio/src/detail/epoll/sockets.cpp (1)
261-290: Duplicated try-ready-or-register pattern across four I/O paths.The same guarded check-readiness →
perform_io()→ re-register-on-EAGAIN pattern appears inconnect(),do_read_io(),do_write_io(), andaccept()(inacceptors.cpp). Each differs only in the ready flag (read_ready/write_ready) and the op slot (read_op/write_op/connect_op).A helper template or lambda parameterised on the flag and slot pointer could consolidate this, but each call site's context is slightly different (return type, work accounting), so this is at your discretion.
src/corosio/src/detail/epoll/scheduler.cpp (1)
513-524:interrupt_reactor()called while holding mutex;work_finished()unlocks first.In
work_finished()(line 687-689), the mutex is explicitly released before callinginterrupt_reactor(). Here instop(),interrupt_reactor()runs with the lock held, blocking other threads during theeventfdwrite syscall.Consistent unlock before interrupt
std::unique_lock lock(mutex_); if (!stopped_) { stopped_ = true; signal_all(lock); + lock.unlock(); interrupt_reactor(); }
| /// Add ready events atomically. | ||
| void add_ready_events(std::uint32_t ev) noexcept | ||
| { | ||
| ready_events_.fetch_or(ev, std::memory_order_relaxed); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n -C5 'ready_events_' --type=cppRepository: cppalliance/corosio
Length of output: 3334
🏁 Script executed:
rg -n -B10 -A10 'ready_events_\.exchange' --type=cppRepository: cppalliance/corosio
Length of output: 1572
Change fetch_or to use memory_order_release to pair with consumer's memory_order_acquire.
The producer-side fetch_or with memory_order_relaxed (line 140) does not provide the release semantics needed to synchronize with the consumer's exchange(..., memory_order_acquire) (scheduler.cpp:190). Acquire semantics require a corresponding release store to establish a happens-before relationship; relaxed operations provide no such guarantee. This can result in I/O side-effects from the reactor thread not being visible to the scheduler thread when it reads ready_events_.
Change to memory_order_release on the producer side:
ready_events_.fetch_or(ev, std::memory_order_release);
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/op.hpp` around lines 137 - 141, The producer
uses ready_events_.fetch_or in add_ready_events with memory_order_relaxed, which
doesn't provide release semantics to match the consumer's exchange(...,
memory_order_acquire) in scheduler.cpp; change the atomic operation in
add_ready_events to use std::memory_order_release (i.e., call
ready_events_.fetch_or(ev, std::memory_order_release)) so the producer
establishes the necessary release-acquire synchronization with the scheduler's
exchange.
| /// Destroy without invoking. | ||
| void destroy() override {} |
There was a problem hiding this comment.
destroy() should clear impl_ref_ to prevent a memory leak during shutdown.
close_socket() sets impl_ref_ (holding a shared_ptr to the owning impl) when is_enqueued_ is true. If the scheduler shuts down without draining its queue, scheduler_op_queue::~scheduler_op_queue() calls destroy() on remaining items. The empty override leaves impl_ref_ set, creating a self-referencing cycle (the impl → descriptor_state → impl_ref_ → impl) that is never broken.
🐛 Proposed fix
/// Destroy without invoking.
- void destroy() override {}
+ void destroy() override { impl_ref_.reset(); }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Destroy without invoking. | |
| void destroy() override {} | |
| /// Destroy without invoking. | |
| void destroy() override { impl_ref_.reset(); } |
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/op.hpp` around lines 146 - 147, The override of
destroy() currently does nothing, leaving descriptor_state::impl_ref_ set and
causing a self-referencing cycle on shutdown; update destroy() (the override in
the scheduler op derived class) to clear/reset impl_ref_ (e.g.,
impl_ref_.reset()) so that when scheduler_op_queue::~scheduler_op_queue() calls
destroy() on queued items the shared_ptr cycle is broken — check related symbols
close_socket(), is_enqueued_, and descriptor_state to ensure impl_ref_ is the
member being cleared.
| if (rd || wr) | ||
| { | ||
| std::lock_guard lock(mutex); | ||
| if (rd) | ||
| read_op = rd; | ||
| if (wr) | ||
| write_op = wr; | ||
| } | ||
|
|
||
| // Execute first handler inline — the scheduler's work_cleanup | ||
| // accounts for this as the "consumed" work item | ||
| scheduler_op* first = local_ops.pop(); | ||
| if (first) | ||
| { | ||
| scheduler_->post_deferred_completions(local_ops); | ||
| (*first)(); | ||
| } | ||
| else | ||
| { | ||
| scheduler_->compensating_work_started(); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find descriptor_state definition in headers
find . -name "*.hpp" -o -name "*.h" | head -20Repository: cppalliance/corosio
Length of output: 889
🏁 Script executed:
#!/bin/bash
# Search for descriptor_state definition
fd -e hpp -e h | xargs grep -l "struct descriptor_state\|class descriptor_state" | head -5Repository: cppalliance/corosio
Length of output: 141
🏁 Script executed:
#!/bin/bash
# Examine descriptor_state definition and read_ready/write_ready members
rg -n "read_ready|write_ready" -A 2 -B 2 --type=cpp --type=hpp -g "!scheduler.cpp" | head -100Repository: cppalliance/corosio
Length of output: 90
🏁 Script executed:
#!/bin/bash
# Check what happens when read_ready/write_ready are set but no handler is enqueued
# Look for any re-enqueue logic triggered by readiness flags
rg -n "read_ready|write_ready" --type=cpp --type=hpp -A 5 | grep -A 5 "enqueue\|re-enqueue\|is_enqueued_" | head -50Repository: cppalliance/corosio
Length of output: 90
🏁 Script executed:
#!/bin/bash
# Read the descriptor_state definition
cat src/corosio/src/detail/epoll/op.hppRepository: cppalliance/corosio
Length of output: 11163
🏁 Script executed:
#!/bin/bash
# Search for read_ready/write_ready without type filters
grep -n "read_ready\|write_ready" src/corosio/src/detail/epoll/scheduler.cpp | head -30Repository: cppalliance/corosio
Length of output: 210
🏁 Script executed:
#!/bin/bash
# Check if there's any re-check of readiness flags after re-registering ops
sed -n '280,310p' src/corosio/src/detail/epoll/scheduler.cppRepository: cppalliance/corosio
Length of output: 681
🏁 Script executed:
#!/bin/bash
# Search for where read_ready and write_ready are CHECKED/USED (not just set)
grep -n "read_ready\|write_ready" src/corosio/src/detail/epoll/scheduler.cpp | grep -v "= true\|= false"Repository: cppalliance/corosio
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Get more context around where read_ready is cleared (line 655)
sed -n '645,670p' src/corosio/src/detail/epoll/scheduler.cppRepository: cppalliance/corosio
Length of output: 656
🏁 Script executed:
#!/bin/bash
# Search for any logic that re-enqueues or triggers I/O when readiness flags are set
grep -n "need_reenqueue\|read_ready.*write_ready\|if.*read_ready\|if.*write_ready" src/corosio/src/detail/epoll/scheduler.cppRepository: cppalliance/corosio
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Check the broader context of lines 280-310 to see if there's any post-registration logic
sed -n '200,320p' src/corosio/src/detail/epoll/scheduler.cppRepository: cppalliance/corosio
Length of output: 2718
🏁 Script executed:
#!/bin/bash
# Check if there's any mechanism that periodically re-enqueues or re-checks
# descriptors after they're enqueued
grep -n "is_enqueued_\|re.*enqueue\|retry" src/corosio/src/detail/epoll/scheduler.cpp | head -30Repository: cppalliance/corosio
Length of output: 325
🏁 Script executed:
#!/bin/bash
# Check if post_deferred_completions or similar mechanism can handle
# re-registration with ready flags
sed -n '700,750p' src/corosio/src/detail/epoll/scheduler.cppRepository: cppalliance/corosio
Length of output: 1293
🏁 Script executed:
#!/bin/brief
# Verify: after operator() returns, is the descriptor checked again?
# Look at do_one to see what happens after (*op)() is called
sed -n '1050,1100p' src/corosio/src/detail/epoll/scheduler.cppRepository: cppalliance/corosio
Length of output: 1228
Race condition: EAGAIN re-registration can leave operations stuck with readiness flag set but unqueued.
After EAGAIN, ops are re-registered under lock at lines 284–288. However, a concurrent operator() invocation may have already set read_ready = true (line 218) or write_ready = true (line 225) because is_enqueued_ was cleared at line 184.
Scenario: Thread A executes operator(), sets is_enqueued_ = false, extracts read_op, and calls perform_io → EAGAIN. The reactor sees new EPOLLIN, successfully enqueues the descriptor again (CAS at line 986 succeeds). Thread B pops and runs operator(), finds read_op == nullptr (Thread A holds it), sets read_ready = true, and returns. Thread A then re-registers read_op under lock. Final state: is_enqueued_ == false, read_ready == true, read_op != nullptr, with no mechanism to process the pending op. With edge-triggered epoll, no new event fires unless more data arrives → indefinite stall.
The re-registration block should check the readiness flags and, if set, re-enqueue the descriptor for another processing round or immediately re-attempt I/O.
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/scheduler.cpp` around lines 282 - 302, The
re-registration path that assigns read_op/write_op under the mutex (the block
that sets read_op = rd / write_op = wr) must, while still holding the lock,
check the corresponding readiness flags (read_ready and write_ready) and if
either is true ensure the descriptor is re-enqueued or scheduled for immediate
processing (e.g., push back onto local_ops or call the existing enqueue/CAS
path) so we don't leave is_enqueued_ == false with ready ops; update the logic
inside operator()'s re-registration block to perform this readiness check and
trigger the same enqueue/posting mechanism used by the reactor when it sees
events (reuse scheduler_->post_deferred_completions/local_ops enqueue or the
descriptor enqueue routine) so a ready op will be processed without waiting for
a new EPOLL event.
| void | ||
| epoll_scheduler:: | ||
| drain_thread_queue(op_queue& queue, long count) const | ||
| { | ||
| std::lock_guard lock(mutex_); | ||
| // Note: outstanding_work_ was already incremented when posting | ||
| std::unique_lock lock(mutex_); | ||
| completed_ops_.splice(queue); | ||
| if (count > 0) | ||
| wakeup_event_.notify_all(); | ||
| maybe_unlock_and_signal_one(lock); | ||
| } |
There was a problem hiding this comment.
Misleading comment and missing work-count flush in drain_thread_queue.
The comment "outstanding_work_ was already incremented when posting" is incorrect for the fast path. post() fast-path (lines 454-455) only increments ctx->private_outstanding_work, not outstanding_work_. If drain_thread_queue is ever called with count > 0, the work items spliced into completed_ops_ will have no corresponding global work count, causing outstanding_work_ to underflow when those ops complete.
In practice this is a safety net (the normal flow drains via work_cleanup / task_cleanup), but the accounting is still incorrect if triggered.
Fix: flush work count before draining
void
epoll_scheduler::
drain_thread_queue(op_queue& queue, long count) const
{
- // Note: outstanding_work_ was already incremented when posting
+ if (count > 0)
+ outstanding_work_.fetch_add(count, std::memory_order_relaxed);
+
std::unique_lock lock(mutex_);
completed_ops_.splice(queue);
if (count > 0)
maybe_unlock_and_signal_one(lock);
}🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/scheduler.cpp` around lines 703 - 712, The
comment is wrong and the global work counter must be updated before moving ops
into completed_ops_; modify epoll_scheduler::drain_thread_queue to, when
count>0, increment outstanding_work_ (or transfer ctx->private_outstanding_work)
by count while holding mutex_ before calling completed_ops_.splice(queue), then
proceed to call maybe_unlock_and_signal_one(lock); update the comment to reflect
that the global outstanding_work_ is flushed here to avoid underflow when those
ops complete.
| void | ||
| epoll_scheduler:: | ||
| wait_for_signal_for( | ||
| std::unique_lock<std::mutex>& lock, | ||
| long timeout_us) const | ||
| { | ||
| if ((state_ & 1) == 0) | ||
| { | ||
| state_ += 2; | ||
| cond_.wait_for(lock, std::chrono::microseconds(timeout_us)); | ||
| state_ -= 2; | ||
| } | ||
| else if (reactor_running_ && !reactor_interrupted_) | ||
| } |
There was a problem hiding this comment.
wait_one() can exceed its timeout due to spurious wakeups.
wait_for_signal_for calls cond_.wait_for with the full timeout_us on every invocation. When do_one loops back after a spurious wakeup (line 1086-1090), it passes the original timeout_us again, not the remaining time. This means wait_one(usec) could block for up to N × usec where N is the number of spurious wakeups plus one.
Consider using an absolute deadline (wait_until) to cap the total wait time:
Sketch: use absolute deadline
One approach is to compute the deadline once in do_one (or wait_one) and pass it through, switching wait_for_signal_for to use cond_.wait_until:
void
epoll_scheduler::
wait_for_signal_for(
std::unique_lock<std::mutex>& lock,
- long timeout_us) const
+ std::chrono::steady_clock::time_point deadline) const
{
if ((state_ & 1) == 0)
{
state_ += 2;
- cond_.wait_for(lock, std::chrono::microseconds(timeout_us));
+ cond_.wait_until(lock, deadline);
state_ -= 2;
}
}Callers would compute the deadline before entering the loop.
Also applies to: 1086-1091
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/scheduler.cpp` around lines 800 - 812, The
current wait_for_signal_for uses cond_.wait_for with the full timeout on each
call, so spurious wakeups in do_one/wait_one can cause total sleep to exceed the
intended timeout; change the design to compute an absolute deadline once (using
std::chrono::steady_clock::now() + std::chrono::microseconds(timeout_us)) in the
caller (e.g., wait_one or do_one) and replace
epoll_scheduler::wait_for_signal_for(std::unique_lock<std::mutex>&, long) with
an overload that accepts a std::chrono::steady_clock::time_point deadline and
uses cond_.wait_until(lock, deadline) (or wait_until in a loop checking state_
and whether now >= deadline) so the total wait is capped by the deadline; update
all callers (do_one, wait_one) to pass the precomputed deadline and to stop
looping when the deadline is reached.
| tracks pending read/write/connect operations. | ||
|
|
||
| @param fd The file descriptor to register. | ||
| @param desc Pointer to descriptor data (stored in epoll_event.data.ptr). |
There was a problem hiding this comment.
Stale @param description references old type name.
The @param desc text says "Pointer to descriptor data" but the parameter type is now descriptor_state*. Update to match.
- `@param` desc Pointer to descriptor data (stored in epoll_event.data.ptr).
+ `@param` desc Pointer to descriptor state (stored in epoll_event.data.ptr).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @param desc Pointer to descriptor data (stored in epoll_event.data.ptr). | |
| `@param` desc Pointer to descriptor state (stored in epoll_event.data.ptr). |
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/scheduler.hpp` at line 111, Update the Doxygen
`@param` for the parameter named desc to reference the current type
descriptor_state* (instead of the old "descriptor data") and briefly describe
its purpose (e.g., pointer to descriptor_state used for tracking epoll
descriptor state, stored in epoll_event.data.ptr); locate the comment in
scheduler.hpp where `@param` desc is declared and replace the stale wording to
mention descriptor_state* and its role.
Summary by CodeRabbit