Avoid errors from asynchronous (non-c++) clients by ryanofsky · Pull Request #240 · bitcoin-core/libmultiprocess

ryanofsky · 2026-01-21T04:06:30Z

The PR avoids errors from non-C++ rust & python clients when they make asynchronous requests (bitcoin/bitcoin#33923) and unclean disconnects (bitcoin/bitcoin#34250). Neither of these errors are easily possible to trigger from libmultiprocess clients because of its blocking interface and RAII semantics, but they are fairly easy to trigger from rust and python and there is a test triggering both of them in bitcoin/bitcoin#34284

DrahtBot · 2026-01-21T04:07:16Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	Sjors
Concept ACK	ismaelsadeeq

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#218 (Better error and log messages by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

call.) -> call. [Extra stray parentheses make the sentence fragment confusing; remove the trailing ".)" to read correctly.]
the the -> the [Duplicate word "the" hinders comprehension; remove the extra "the".]

^{2026-02-20 16:59:02}

ryanofsky · 2026-01-22T02:40:58Z

Updated ba865d9 -> 92a2c5f (pr/promise.1 -> pr/promise.2, compare) fixing various ci errors (iwyu, compiler errors) https://github.com/bitcoin-core/libmultiprocess/actions/runs/21196838571

The change moves code responsible for posting work to the execution thread, and notifying the event loop thread after execution is finished out of type-context.h and into a new ProxyServer<Thread>::post() method. This commit does not change behavior other than changing log messages, but improvements are made in upcoming commits which extend the new method.

If multiple IPC requests happen at the same time specifying same Context.thread to run the requests on, queue the requests to execute in the order they are received instead of raising a "thread busy" exception. This change has no effect on C++ clients using libmultiprocess as a client library, since the libmultiprocess client only makes blocking calls and creates a server thread for every client thread, so it's not possible for there to be multiple calls on the same server thread. But this change may be useful for rust and python clients as discussed bitcoin/bitcoin#33923

…erface pointer This is a fix for bitcoin/bitcoin#34250 which reports that bitcoin node crashes if a rust stratrumv2 mining client calls BlockTemplate.waitNext() and disconnects without waiting for a response from the call, and if mempool fees increased so the call returns a non-null interface BlockTemplate pointer. The node would crash in this case while trying to call MakeProxyServer on the returned BlockTemplate pointer, which would fail because MakeProxyServer would try to use a reference to the Connection object that had been deleted as a result of the disconnect. The fix works by: - Adding a Connection::m_canceler member variable and using it to cancel any IPC response promises that are pending when the connection is destroyed. - Updating ProxyServer<Thread>::post to use promise.attach() as described https://capnproto.org/cxxrpc.html#cancellation to detect cancellation and set a ServerContext::request_canceled variable. - Updating ServerCall to check the ServerContext::request_canceled status after any C++ server method returns, and throw an exception if it is set. - Updating type-context.h PassField() function to deal with the exception by catching and logging it.

ismaelsadeeq · 2026-01-27T10:21:41Z

Concept ACK

ryanofsky · 2026-01-27T19:05:11Z

Updated 92a2c5f -> 1f307fb (pr/promise.2 -> pr/promise.3, compare) fixing tsan error caused by deleting ready_fulfiller promise fulfiller outside the event loop thread https://github.com/bitcoin-core/libmultiprocess/actions/runs/21233968144/job/61097851891?pr=240

ryanofsky · 2026-01-28T04:31:02Z

Have been debugging a TSAN race condition (https://github.com/bitcoin/bitcoin/actions/runs/21421194207/job/61680781122?pr=34422#step:11:4303) the functional test in bitcoin/bitcoin#34422 detects in the new code, that is not triggered here because this PR does not currently include a test that triggers IPC request cancellation.

The race seems real and fixable, also probably unlikely to cause problems in practice. In the control flow when an IPC gets cancelled, TSAN sees the worker thread doing a CallContext::getParams call before the cancellation, triggering a read of the RpcCallContext::request pointer. Then after the cancellation occurs, the I/O thread wakes up and destroys the CallContext object, triggering a write of RpcCallContext::request pointer. This is not actually dangerous because the write always happens before the read, but TSAN is unable to detect any "happens-before" relationship between the read and write events, so considers this to be a race.

I think this should be fixable by having the worker thread acquire a lock in the brief period when it reads request parameters, before executing the IPC call, and also acquire the lock after executing the call when it is writing request results. Then the cancellation callback, if triggered, can acquire the same lock and TSAN can be aware of the ordering between the two threads

Sjors · 2026-02-10T14:17:21Z

should be fixable by having the worker thread acquire a lock in the brief period ...

@ryanofsky is that something you want to add here?

include/mp/proxy-io.h

ryanofsky · 2026-02-11T20:05:48Z

re: #240 (comment)

should be fixable by having the worker thread acquire a lock in the brief period ...

@ryanofsky is that something you want to add here?

Yes thanks, added a new commit which does this and fixes the TSAN error locally for me. Will update bitcoin/bitcoin#34422 and confirm it fixes the failure there too.

Rebased 1f307fb -> 1d7debf (pr/promise.3 -> pr/promise.4, compare) adding result_value.reset() call to try fix CI error https://github.com/bitcoin/bitcoin/actions/runs/21412348994/job/61652239914?pr=34422

Updated 1d7debf -> 4f42fc4 (pr/promise.4 -> pr/promise.5, compare) to fix CI error https://github.com/bitcoin/bitcoin/actions/runs/21421194207/job/61680781122?pr=34422

theuni · 2026-02-12T23:35:50Z

@ryanofsky I'm having a bit of trouble understanding what's happening on which thread, but it seems like a mutex is not the most straightforward solution here? It's not immediately obvious to me what it's guarding, anyway. Perhaps a simple std::atomic_flag would be better to communicate "I've reached this point"? You could set/notify when the params are safe to delete, then in m_on_cancel you could just wait for it?

ryanofsky · 2026-02-13T03:24:08Z

@ryanofsky I'm having a bit of trouble understanding what's happening on which thread, but it seems like a mutex is not the most straightforward solution here? It's not immediately obvious to me what it's guarding, anyway. Perhaps a simple std::atomic_flag would be better to communicate "I've reached this point"? You could set/notify when the params are safe to delete, then in m_on_cancel you could just wait for it?

Thanks for looking at this! I assume you are asking about the last commit 4f42fc4 which is fixing the race condition detected by TSAN.

The issue being fixed in that commit is a race between two threads both potentially accessing capnproto request parameters in an incoming request.

One thread is a worker thread running the big invoke lambda which is responsible for converting the capnproto request parameters into c++ arguments, and then calling a c++ method with those arguments, and then converting the c++ method's return value & output arguments or exception into a capnproto response.
The other thread is the capnproto event loop thread running the on_cancel lambda when there is a disconnect or cancellation. After the on_cancel lambda returns, capnproto deletes the request parameters.

The point of the mutex is to prevent the event loop thread from deleting the request parameters while the worker thread is still reading them.

The control flow is a little complicated because of the way parameters are unpacked recursively, but the idea of guarding access to the parameters with a mutex is pretty simple. The params_mutex is locked by the worker thread before it accesses the capnproto parameters, and unlocked by that thread after all the parameters have been unpacked and it is ready to call the c++ method that is supposed to be executed. Then on the event loop thread inside the on_cancel lambda, the same mutex is locked before it returns so if there is a cancellation while the worker thread is unpacking parameters, the event loop thread will be blocked, and not delete the parameters until after the worker thread has finished reading them.

What is odd about this fix is that the event loop thread is releasing the mutex before it actually deletes the parameters. This is suspicious, because if the mutex is not locked while they are deleted it does nothing to solve the reverse race condition in the case where the cancellation happens before the parameters start being read. But this case is handled by a separate cancel_monitor.m_cancelled check which raises an exception to prevent the worker thread from reading the parameters.

So the mutex should solve delete-while-reading race condition in a fairly straightforward way even if the control flow is complicated by a lot of recursion and callbacks, and even if it doesn't solve the related delete-before-reading race conditions.

Tangent: One observation looking at the code is that while the delete-before-reading reverse case is being handled, it isn't being handled very well because the exception raised is thrown inside the sync block instead of outside of it, and it should probably be an InterruptException instead of an MP_LOG call so it can be caught higher in the stack and the thread can be cleaned up. I'm guessing right now that code will cause the worker thread to hang be leaked, never getting cleaned up. Ideally I' could write a unit test trying to trigger this, but so far I've been struggling to figure out how to test different possible races and make sure they are handled well. It's not obvious how to arrange the the request being cancelled before parameters are read. Anyway, this is only indirectly related to the TSAN race addressed in the commit, which is also unlikely to happen in practice and probably hasn't happened, even though TSAN was right to warn the request parameters were previously being accessed from different threads without synchronization between them. (EDIT: This behavior is improved in latest push pr/promise.6 so if a cancellation does happen before parameters are read this will now be handled cleanly and just result in an info message being logged)

theuni · 2026-02-13T16:50:25Z

@ryanofsky Thanks for the explanation, that was very helpful.

I'm still struggling to understand how this mutex is working though. It seems that since it's already been unlocked in invoke(), the mutex will already be unlocked on the worker thread when params_lock goes out of scope, which would cause a double unlock (undefined), no?

Sjors · 2026-02-13T16:55:08Z

I created a branch and had an agent splinter 558ae11 refactor: Add ProxyServer::post() method into multiple tiny commits: sjors/2026/02/558ae11-split

No need to use it here, but it might aid others in reviewing (with dimmed-zebra). Noticed a small behavior change in that it removes MP_LOG for thread busy. It's harmless since 92fed35 drops this exception anyway.

Also drew a little diagram to orient myself about where this function lives (IIUC):

Review placeholder: the first two commits, up to 92fed35, mostly make sense to me now.

theuni · 2026-02-13T17:08:32Z

I'm still struggling to understand how this mutex is working though. It seems that since it's already been unlocked in invoke(), the mutex will already be unlocked on the worker thread when params_lock goes out of scope, which would cause a double unlock (undefined), no?

Nevermind, I see my mistake now. I was thinking it was the bare mutex that was being unlocked in invoke(), but it's the lock itself. Seems sane I suppose, just not exactly straightforward.

Sjors · 2026-02-13T19:10:39Z

include/mp/proxy-io.h

        auto result = kj::newPromiseAndFulfiller<T>(); // Signaled when fn() is called, with its return value.
-        bool posted = m_thread_context.waiter->post([this, fn = std::forward<Fn>(fn), result_fulfiller = kj::mv(result.fulfiller)]() mutable {
+        bool posted = m_thread_context.waiter->post([this, fn = std::forward<Fn>(fn), ready_fulfiller = kj::mv(ready_fulfiller), result_fulfiller = kj::mv(result.fulfiller)]() mutable {
+            m_loop->sync([ready_fulfiller = kj::mv(ready_fulfiller)]() mutable {


In 92fed35 Allow simultaneous calls on same Context.thread: maybe add a comment that the goal is to add requested fn() calls to the promise chain as quickly as possible, so we call ready_fulfiller->fulfill(); before executing it.

re: #240 (comment)

maybe add a comment that the goal is to add requested fn() calls to the promise chain as quickly as possible, so we call ready_fulfiller->fulfill(); before executing it.

Thanks, added comment

Sjors

The third commit looks good to me as well.

Sjors · 2026-02-17T15:52:50Z

include/mp/proxy-types.h

+        // https://github.com/bitcoin/bitcoin/issues/34250 and there would be a
+        // crash if execution continued.
+        // TODO: Note this detection is racy because cancellation could happen
+        // after this check. However, fixing this would require changing the


In 0a22425 Prevent crash on unclean disconnect if abandoned IPC call returns interface pointer: for waitNext() the time spent in this race-vulnerable state is minuscule compared to the multi second wait?

In that case, let's not deal with it in this PR.

re: #240 (comment)

for waitNext() the time spent in this race-vulnerable state is minuscule compared to the multi second wait?

In that case, let's not deal with it in this PR.

Yes this race, like the tsan race fixed in 4f42fc4 would be pretty difficult to trigger unintentionally (or even intentionally). But it turns out the lock added in 4f42fc4 can be used to fix this race as well so I extended it to cover both cases (both event loop deleting params on cancellation while they might be being read, and event loop deleting response on cancellation while it might be being written).

Sjors · 2026-02-17T17:12:45Z

include/mp/type-context.h

                        // client object.
                        server.m_context.loop->sync([&] {
+                            auto self_dispose{kj::mv(self)};
+                            if (erase_thread) {


In 0a22425 Prevent crash on unclean disconnect if abandoned IPC call returns interface pointer: can you add a comment here to explain why it's important to call sync() without doing anything in particular (where !erase_thread)?

IIUC it's to ensure InterruptException gets thrown as needed, so we can catch it below.

No test breaks (for this commit) if I do:

diff --git a/include/mp/type-context.h b/include/mp/type-context.h index 134542e..440a3ef 100644 --- a/include/mp/type-context.h +++ b/include/mp/type-context.h @@ -113,5 +113,5 @@ auto PassField(Priority<1>, TypeList<>, ServerContext& server_context, const Fn& // makes another IPC call), so avoid modifying the map. const bool erase_thread{inserted}; - KJ_DEFER( + KJ_DEFER(if (erase_thread) { // Erase the request_threads entry on the event loop // thread with loop->sync(), so if the connection is @@ -121,5 +121,4 @@ auto PassField(Priority<1>, TypeList<>, ServerContext& server_context, const Fn& server.m_context.loop->sync([&] { auto self_dispose{kj::mv(self)}; - if (erase_thread) { // Look up the thread again without using existing // iterator since entry may no longer be there after @@ -133,7 +132,6 @@ auto PassField(Priority<1>, TypeList<>, ServerContext& server_context, const Fn& removed = request_threads.extract(server.m_context.connection); } - } }); - ); + }); KJ_IF_MAYBE(exception, kj::runCatchingExceptions([&]{ try {

(but I guess that makes sense because the commit doesn't touch tests in general)

re: #240 (comment)

No test breaks (for this commit) if I do:

Interesting find, and I was surprised that diff works, but I think it only works because erase_thread is normally true. The only time when erase_thread should be false is when a client calls a server method, and the server method calls back into the client (for example through a std::function parameter`) and then the client makes another call back into the server on the same thread. This doesn't happen in any mining code but IIRC there are some gui interactions in bitcoin/bitcoin#10102 where it does.

The reason it's important call call loop->sync unconditionally here is run the auto self_dispose{kj::mv(self)}; line so the proxy server object is freed on the event loop thread, not the worker thread, which would not be safe, because the proxy server object is managed by capnproto. I added a comment about the self variable to clarify.

ismaelsadeeq

reviewed the first two commits to 92fed35

Thanks @Sjors for your split of the first commit, and the diagram, it was sweet to use that in the review.

ismaelsadeeq · 2026-02-17T17:59:31Z

include/mp/proxy-io.h

 kj::Promise<T> ProxyServer<Thread>::post(Fn&& fn)
 {
-    {
+    auto ready = kj::newPromiseAndFulfiller<void>(); // Signaled when waiter is idle again.


In "Allow simultaneous calls on same Context.thread" 92fed35

Before this commit, ProxyServer<Thread>::post() threw immediately when the thread was busy:

After 92fed35, post() queues the call by chaining onto m_thread_ready instead of throwing, which enables pipelining but removes the only backpressure mechanism. There is now no bound on chain depth. Every call from a fast producer appends a new node to the chain, each holding a captured lambda with serialized arguments plus two kj::Promise objects; the chain grows without limit.

I'm curious whether it's something we should worry about in practice as a potential for OOM of the server, either by bug or deliberate?

I modified the existing test to simulate a slow server (10s sleep per request) with a large number of queued calls:

setup.server->m_impl->m_int_fn = [&](int n) { + std::this_thread::sleep_for(std::chrono::seconds{10}); assert(n == expected); expected += 100; return n; }; -std::atomic<size_t> running{3}; +std::atomic<size_t> running{1000001};

The mptest process consumed 17 GB RSS in under 30 seconds.

With while(true), the mptest was killed by OOM:

Out of memory: Killed process 2461509 (mptest) total-vm:53,439,684kB anon-rss:53,617,500kB

Worth noting that in the mptest process the client and server share the same address space, so the 17 GB figure includes client-side memory (unsent/pending capnp requests) as well as server-side promise chain growth?

A potential naive solution, if this is deemed worth fixing, is a depth counter with a limit. increment before queuing, throw if the limit is exceeded, and decrement after the result is dispatched

+ uint64_t m_maximum_promise_depth{1000}; + uint64_t m_pending_tasks{0}; template<typename T, typename Fn> kj::Promise<T> ProxyServer<Thread>::post(Fn&& fn) { + if (m_pending_tasks >= m_maximum_promise_depth) { + throw std::runtime_error("maximum promise depth reached"); + } + m_pending_tasks += 1; + auto ready = kj::newPromiseAndFulfiller<void>(); auto ret = m_thread_ready.then([this, fn = std::forward<Fn>(fn), ready_fulfiller = kj::mv(ready.fulfiller)]() mutable { ... - m_loop->sync([&result_value, &exception, result_fulfiller = kj::mv(result_fulfiller)]() mutable { + m_loop->sync([this, &result_value, &exception, result_fulfiller = kj::mv(result_fulfiller)]() mutable { KJ_IF_MAYBE(e, exception) { result_fulfiller->reject(kj::mv(*e)); } else { result_fulfiller->fulfill(kj::mv(*result_value)); result_value.reset(); } + m_pending_tasks--; result_fulfiller = nullptr; }); });

With the fix applied, running mptest with a modified running=1001, a 10s server sleep holds RSS at a stable 17 MB and throws cleanly at depth 1000:

remote exception: std::exception: maximum promise depth reached VmRSS: 17,484 kB

At running=1000001 with the fix, RSS reached ~9.5 GB from client memory alone.

This implies that in a real deployment where client and server are separate processes, the server would be protected by this fix, but the client could still exhaust its own memory independently. This fix addresses the server-side only.

Maybe there is even a more interesting approach with kj?

re: #240 (comment)

removes the only backpressure mechanism

Thankfully, this shouldn't be the case. Cap'n Proto's API is nonblocking, so there is nothing the server can do to directly slow down the rate of client requests. The way flow control is implemented is by having the server send a response to each request after it is processed, and as long as the client waits for these responses, it can avoid sending requests faster than the server processes them, and can limit the number of requests that are queued up.

I'm curious whether it's something we should worry about in practice as a potential for OOM of the server, either by bug or deliberate?

I do think it's something we may want to worry about, and we may want to add counters and hard caps on things like number of objects, connections, threads, and pending requests in the future.

I think your m_pending_tasks idea is basically the right approach, although I probably we would want the counter to be more global, like one counter per UNIX socket path instead of one counter per thread object. This way badly behaved clients would not be able to circumvent the limit by creating many connections or many threads.

Initially in bitcoin/bitcoin#33923, I resisted the idea the clients should be able to send arbitrary numbers of requests and have them queued up on the server. But I changed my mind after realizing this is already how cap'n proto handles requests that run on the event loop thread, so this would just be making requests that run on worker threads behave consistently. Also this behavior is potentially useful if clients need multiple pieces of data from the server, so they can just send all the requests at once instead of staggering them.

ismaelsadeeq · 2026-02-17T18:01:28Z

include/mp/proxy-io.h

    ~ProxyServer();
    kj::Promise<void> getName(GetNameContext context) override;
+
+    //! Run a callback function returning T on this thread.


In "refactor: Add ProxyServer::post() method" 558ae11

nit: clarify that we return a T promise?

re: #240 (comment)

nit: clarify that we return a T promise?

Thanks, expanded the comment to describe the behavior and what's returned.

ismaelsadeeq · 2026-02-17T18:02:41Z

src/mp/proxy.cpp


-ProxyServer<Thread>::ProxyServer(ThreadContext& thread_context, std::thread&& thread)
-    : m_thread_context(thread_context), m_thread(std::move(thread))
+ProxyServer<Thread>::ProxyServer(Connection& connection, ThreadContext& thread_context, std::thread&& thread)


In "refactor: Add ProxyServer::post() method" 558ae11

Are we passing the whole connection here for future profing, i.e we need more things apart from m_loop later?

re: #240 (comment)

Are we passing the whole connection here for future profing, i.e we need more things apart from m_loop later?

That's one reason, and the other reason is that ProxyServer<Thread> is a specialization of the generic ProxyServer struct which uses a connection pointer to be able to run cleanup code when the connection is destroyed, and I thought it would be good if different ProxyServer specializations were more similar. Choice is admittedly pretty arbitrary, though.

ryanofsky

Thanks for the detailed reviews!

Updated 4f42fc4 -> c436ae4 (pr/promise.5 -> pr/promise.6, compare) making suggested changes, and merging 3rd and 4th commits and improving them. Instead of just handling the cancellation-while-reading params race case detected by TSAN, the code now handles the cancellation-while-writing-response race case which was previously a TODO, and better handles the cancellation-before-reading-params race case which was previously handled, but not very well, resulting in an uncaught exception. All of these cases would be very unlikely to happen since params and response structs are only accessed very briefly before and after method execution, so accesses by the event thread would be unlikely. But they are good to handle. Other than these change, the other changes were comment, logging, and naming improvements.

ryanofsky · 2026-02-18T02:57:42Z

include/mp/proxy-io.h

        auto result = kj::newPromiseAndFulfiller<T>(); // Signaled when fn() is called, with its return value.
-        bool posted = m_thread_context.waiter->post([this, fn = std::forward<Fn>(fn), result_fulfiller = kj::mv(result.fulfiller)]() mutable {
+        bool posted = m_thread_context.waiter->post([this, fn = std::forward<Fn>(fn), ready_fulfiller = kj::mv(ready_fulfiller), result_fulfiller = kj::mv(result.fulfiller)]() mutable {
+            m_loop->sync([ready_fulfiller = kj::mv(ready_fulfiller)]() mutable {


re: #240 (comment)

maybe add a comment that the goal is to add requested fn() calls to the promise chain as quickly as possible, so we call ready_fulfiller->fulfill(); before executing it.

Thanks, added comment

ryanofsky · 2026-02-18T03:02:10Z

include/mp/proxy-types.h

+        // https://github.com/bitcoin/bitcoin/issues/34250 and there would be a
+        // crash if execution continued.
+        // TODO: Note this detection is racy because cancellation could happen
+        // after this check. However, fixing this would require changing the


re: #240 (comment)

for waitNext() the time spent in this race-vulnerable state is minuscule compared to the multi second wait?

In that case, let's not deal with it in this PR.

Yes this race, like the tsan race fixed in 4f42fc4 would be pretty difficult to trigger unintentionally (or even intentionally). But it turns out the lock added in 4f42fc4 can be used to fix this race as well so I extended it to cover both cases (both event loop deleting params on cancellation while they might be being read, and event loop deleting response on cancellation while it might be being written).

ryanofsky · 2026-02-18T03:25:24Z

include/mp/type-context.h

                        // client object.
                        server.m_context.loop->sync([&] {
+                            auto self_dispose{kj::mv(self)};
+                            if (erase_thread) {


re: #240 (comment)

No test breaks (for this commit) if I do:

Interesting find, and I was surprised that diff works, but I think it only works because erase_thread is normally true. The only time when erase_thread should be false is when a client calls a server method, and the server method calls back into the client (for example through a std::function parameter`) and then the client makes another call back into the server on the same thread. This doesn't happen in any mining code but IIRC there are some gui interactions in bitcoin/bitcoin#10102 where it does.

The reason it's important call call loop->sync unconditionally here is run the auto self_dispose{kj::mv(self)}; line so the proxy server object is freed on the event loop thread, not the worker thread, which would not be safe, because the proxy server object is managed by capnproto. I added a comment about the self variable to clarify.

ryanofsky · 2026-02-18T03:36:47Z

include/mp/proxy-io.h

 kj::Promise<T> ProxyServer<Thread>::post(Fn&& fn)
 {
-    {
+    auto ready = kj::newPromiseAndFulfiller<void>(); // Signaled when waiter is idle again.


re: #240 (comment)

removes the only backpressure mechanism

Thankfully, this shouldn't be the case. Cap'n Proto's API is nonblocking, so there is nothing the server can do to directly slow down the rate of client requests. The way flow control is implemented is by having the server send a response to each request after it is processed, and as long as the client waits for these responses, it can avoid sending requests faster than the server processes them, and can limit the number of requests that are queued up.

I'm curious whether it's something we should worry about in practice as a potential for OOM of the server, either by bug or deliberate?

I do think it's something we may want to worry about, and we may want to add counters and hard caps on things like number of objects, connections, threads, and pending requests in the future.

I think your m_pending_tasks idea is basically the right approach, although I probably we would want the counter to be more global, like one counter per UNIX socket path instead of one counter per thread object. This way badly behaved clients would not be able to circumvent the limit by creating many connections or many threads.

Initially in bitcoin/bitcoin#33923, I resisted the idea the clients should be able to send arbitrary numbers of requests and have them queued up on the server. But I changed my mind after realizing this is already how cap'n proto handles requests that run on the event loop thread, so this would just be making requests that run on worker threads behave consistently. Also this behavior is potentially useful if clients need multiple pieces of data from the server, so they can just send all the requests at once instead of staggering them.

ryanofsky · 2026-02-18T03:57:11Z

include/mp/proxy-io.h

    ~ProxyServer();
    kj::Promise<void> getName(GetNameContext context) override;
+
+    //! Run a callback function returning T on this thread.


re: #240 (comment)

nit: clarify that we return a T promise?

Thanks, expanded the comment to describe the behavior and what's returned.

ryanofsky · 2026-02-18T04:04:31Z

src/mp/proxy.cpp


-ProxyServer<Thread>::ProxyServer(ThreadContext& thread_context, std::thread&& thread)
-    : m_thread_context(thread_context), m_thread(std::move(thread))
+ProxyServer<Thread>::ProxyServer(Connection& connection, ThreadContext& thread_context, std::thread&& thread)


re: #240 (comment)

Are we passing the whole connection here for future profing, i.e we need more things apart from m_loop later?

That's one reason, and the other reason is that ProxyServer<Thread> is a specialization of the generic ProxyServer struct which uses a connection pointer to be able to run cleanup code when the connection is destroyed, and I thought it would be good if different ProxyServer specializations were more similar. Choice is admittedly pretty arbitrary, though.

include/mp/type-context.h

ryanofsky · 2026-02-19T12:17:51Z

Updated 186a6d3 -> 4324218 (pr/promise.10 -> pr/promise.11, compare) to avoid memory leak detected by ASAN https://github.com/bitcoin/bitcoin/actions/runs/22145238258/job/64020426758?pr=34422 and caused by clang bug described bitcoin/bitcoin#34422 (comment). This should be ready for review again with the fix

Updated 4324218 -> 337fa9c (pr/promise.11 -> pr/promise.12, compare) fixing uninitialized variable pointed out #240 (comment)

Sjors · 2026-02-19T13:06:46Z

The new CallThen helper looks good to me, but I'll wait for the sanitizer issue to resolved: bitcoin/bitcoin#34422 (comment)

…rocess#240

ryanofsky · 2026-02-19T19:48:08Z

The new CallThen helper looks good to me, but I'll wait for the sanitizer issue to resolved: bitcoin/bitcoin#34422 (comment)

Should be resolved now! (With help from Marco identifying the issue and probably saving me a lot of time debugging.)

Sjors · 2026-02-20T07:41:03Z

ACK 337fa9c

ryanofsky · 2026-02-20T16:24:15Z

Updated 337fa9c -> ec788f1 (pr/promise.12 -> pr/promise.13, compare) just making small tweaks: renames, improved comments, updated commit messages, and improving exception logging

Updated ec788f1 -> 0174450 (pr/promise.13 -> pr/promise.14, compare) to fix iwyu errors https://github.com/bitcoin-core/libmultiprocess/actions/runs/22231928222/job/64313628070?pr=240

Sjors · 2026-02-20T17:03:38Z

utACK 0174450 if CI is happy. The additional "uncaught exception ([actual message])" seems useful.

…rocess#240

ryanofsky · 2026-02-20T20:25:03Z

Planning to merge this soon to unblock bitcoin/bitcoin#34422 and #218, but would still welcome any more review and happy to make followup changes!

…74ce 290702c74ce Merge bitcoin-core/libmultiprocess#240: Avoid errors from asynchronous (non-c++) clients 3a69d4755af Merge bitcoin-core/libmultiprocess#241: doc: Bump version number v7 -> v8 0174450ca2e Prevent crash on unclean disconnect if abandoned IPC call returns interface pointer ddb5f74196f Allow simultaneous calls on same Context.thread c4762c7b513 refactor: Add ProxyServer<Thread>::post() method 0ade1b40ac5 doc: Bump version number git-subtree-dir: src/ipc/libmultiprocess git-subtree-split: 290702c74cebde683aad5ca9c587e5bdb6873f66

Upstream PR bitcoin-core/libmultiprocess#240 fixed various issues which require updates to python IPC tests. Those changes are made in this commit.

413f915979 ipc: Serialize null CTransactionRef as empty Data 586fa578be refactor: add missing includes to mp/type-data.h 290702c74c Merge bitcoin-core/libmultiprocess#240: Avoid errors from asynchronous (non-c++) clients 3a69d4755a Merge bitcoin-core/libmultiprocess#241: doc: Bump version number v7 -> v8 0174450ca2 Prevent crash on unclean disconnect if abandoned IPC call returns interface pointer ddb5f74196 Allow simultaneous calls on same Context.thread c4762c7b51 refactor: Add ProxyServer<Thread>::post() method 0ade1b40ac doc: Bump version number git-subtree-dir: src/ipc/libmultiprocess git-subtree-split: 413f91597932921176904492ca09673549750fd1

Upstream PR bitcoin-core/libmultiprocess#240 fixed various issues which require updates to python IPC tests. Those changes are made in this commit.

As pointed out by janb84 in bitcoin/bitcoin#34422 (comment) it makes sense for the on_cancel callback to lock cancel_mutex while it is assigning request_canceled = true. The lock and assigment were introduced in bitcoin-core#240 and in an earlier version of that PR, request_canceled was a std::atomic and the assignment happened before the lock was acquired instead of after, so it was ok for the lock to be unnamed and immediately released after being acquired. But in the final verion of bitcoin-core#240 request_canceled is an ordinary non-atomic bool, and it should be assigned true with the lock held to prevent a theoretical race condition where capn'proto event loop cancels the request before the execution thread runs, and the execution thread sees the old request_canceled = false value and then unsafely accesses deleted parameters. The request being canceled so quickly and parameters being accessed so slowly, and stale request_canceled value being read even after the execution thread has the cancel_mutex lock should be very unlikely to occur in practice, but could happen in theory and is good to fix.

ryanofsky force-pushed the pr/promise branch from ba865d9 to 92a2c5f Compare January 22, 2026 02:39

ryanofsky added 3 commits January 22, 2026 06:08

DrahtBot mentioned this pull request Jan 22, 2026

Better error and log messages #218

Open

ryanofsky force-pushed the pr/promise branch from 92a2c5f to 1f307fb Compare January 27, 2026 19:02

ryanofsky mentioned this pull request Jan 27, 2026

Update libmultiprocess subtree to be more stable with rust IPC client bitcoin/bitcoin#34422

Open

Sjors reviewed Feb 11, 2026

View reviewed changes

include/mp/proxy-io.h Show resolved Hide resolved

ryanofsky force-pushed the pr/promise branch from 1f307fb to 4f42fc4 Compare February 11, 2026 20:06

Sjors mentioned this pull request Feb 13, 2026

Mining interface tracking issue bitcoin/bitcoin#33777

Open

12 tasks

Sjors reviewed Feb 13, 2026

View reviewed changes

Sjors reviewed Feb 17, 2026

View reviewed changes

ismaelsadeeq reviewed Feb 17, 2026

View reviewed changes

ryanofsky force-pushed the pr/promise branch from 4f42fc4 to c436ae4 Compare February 18, 2026 04:39

ryanofsky commented Feb 18, 2026

View reviewed changes

ryanofsky force-pushed the pr/promise branch 2 times, most recently from 55e9eac to 186a6d3 Compare February 18, 2026 14:54

Sjors reviewed Feb 18, 2026

View reviewed changes

include/mp/type-context.h Outdated Show resolved Hide resolved

ryanofsky marked this pull request as draft February 18, 2026 16:36

ryanofsky force-pushed the pr/promise branch from 186a6d3 to 4324218 Compare February 19, 2026 12:14

ryanofsky marked this pull request as ready for review February 19, 2026 12:15

ryanofsky force-pushed the pr/promise branch from 4324218 to 337fa9c Compare February 19, 2026 12:23

ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Feb 19, 2026

test: Update interface_ipc.py after fixes from bitcoin-core/libmultip…

c51b2fd

…rocess#240

ryanofsky force-pushed the pr/promise branch from 337fa9c to ec788f1 Compare February 20, 2026 16:22

ryanofsky force-pushed the pr/promise branch from ec788f1 to 0174450 Compare February 20, 2026 16:58

ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Feb 20, 2026

test: Update interface_ipc.py after fixes from bitcoin-core/libmultip…

f7e3dcc

…rocess#240

ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Feb 20, 2026

test: Update interface_ipc.py after fixes from bitcoin-core/libmultip…

4e7142f

…rocess#240

ryanofsky merged commit 290702c into bitcoin-core:master Feb 20, 2026
10 checks passed

Sjors mentioned this pull request Feb 23, 2026

mining: add getTransactions(ByWitnessID) IPC methods bitcoin/bitcoin#34020

Draft

ryanofsky mentioned this pull request Feb 24, 2026

mining: Break compatibility with existing IPC mining clients bitcoin/bitcoin#34568

Merged

ryanofsky added a commit to ryanofsky/libmultiprocess that referenced this pull request Feb 24, 2026

doc: Comment cleanups after bitcoin-core#240

ef96a5b

ryanofsky mentioned this pull request Feb 24, 2026

type-context.h: Extent cancel_mutex lock to prevent theoretical race #245

Open

Comments

Conversation

ryanofsky commented Jan 21, 2026

Uh oh!

DrahtBot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews

Conflicts

LLM Linter (✨ experimental)

Uh oh!

ryanofsky commented Jan 22, 2026

Uh oh!

ismaelsadeeq commented Jan 27, 2026

Uh oh!

ryanofsky commented Jan 27, 2026

Uh oh!

ryanofsky commented Jan 28, 2026

Uh oh!

Sjors commented Feb 10, 2026

Uh oh!

Uh oh!

ryanofsky commented Feb 11, 2026

Uh oh!

theuni commented Feb 12, 2026

Uh oh!

ryanofsky commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theuni commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sjors commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theuni commented Feb 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sjors left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sjors Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ismaelsadeeq left a comment

Choose a reason for hiding this comment

Uh oh!

ismaelsadeeq Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

DrahtBot commented Jan 21, 2026 •

edited

Loading

ryanofsky commented Feb 13, 2026 •

edited

Loading

theuni commented Feb 13, 2026 •

edited

Loading

Sjors commented Feb 13, 2026 •

edited

Loading

Sjors Feb 17, 2026 •

edited

Loading

ismaelsadeeq Feb 17, 2026 •

edited

Loading

ryanofsky commented Feb 19, 2026 •

edited

Loading

ryanofsky commented Feb 20, 2026 •

edited

Loading