Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors parallelism from using rayon's global thread pool to a local, per-EGraph thread pool. Previously, debug/test builds used random coin flips to decide whether to parallelize, and the global pool was configured once. Now, the EGraph owns an Arc<rayon::ThreadPool>, parallel heuristics use deterministic size thresholds, and tests run each case both single-threaded and with 4 threads.
Changes:
- Added a
pool: Arc<rayon::ThreadPool>field toEGraphwithwith_num_threads()/num_threads()API; public entry points (run_program,resolve_program) wrap execution inpool.install(). - Replaced all
#[cfg(test)]/#[cfg(debug_assertions)]random parallelism decisions with deterministic heuristics based on data size andrayon::current_num_threads(). - Updated test harness to run each test case with both 1 and 4 threads (instead of random), and removed global rayon pool configuration from benchmarks and CLI.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/lib.rs |
Added pool field, with_num_threads/num_threads methods, wrapped run_program/resolve_program in pool.install() |
src/cli.rs |
Replaced global rayon pool setup with egraph.with_num_threads() |
core-relations/src/parallel_heuristics.rs |
Removed random coin-flip logic, now always uses size+thread-count heuristics |
core-relations/src/table/rebuild.rs |
Removed #[cfg]-gated random logic in incremental_rebuild |
core-relations/src/containers/mod.rs |
Removed #[cfg]-gated random logic in incremental_rebuild |
egglog-bridge/src/lib.rs |
Simplified do_parallel() to always check rayon::current_num_threads() |
tests/files.rs |
Added threads field to Run, tests run with 1 and 4 threads, snapshot skipping for parallel |
benches/common.rs |
Removed global rayon pool configuration (EGraph default is 1 thread) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merging this PR will degrade performance by 5.91%
Performance Changes
Comparing Footnotes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #826 +/- ##
==========================================
+ Coverage 81.53% 81.87% +0.33%
==========================================
Files 88 88
Lines 24157 24868 +711
==========================================
+ Hits 19697 20361 +664
- Misses 4460 4507 +47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'm seeing a noticeable slowdown for the parallel AC benchmark on my laptop, let me know if you see the same thing: |
|
|
||
| /// Return the number of threads in this EGraph's thread pool. | ||
| pub fn num_threads(&self) -> usize { | ||
| #[cfg(not(target_family = "wasm"))] |
There was a problem hiding this comment.
Let's move the WASM-gating into its own module, making our own ThreadPool type that exposes the common stuff (like num threads and install).
There was a problem hiding this comment.
Let's wait on this since we might make our own threadpool soon
|
Weird I don't see a slowdown... @ezrosent I'm on Linux with a server CPU, though. Maybe issues with macOS? |
|
I'll make another PR that merges into this one and moves the thread pool to the backend database, since it seems that is where the pool should be |
| } | ||
| #[cfg(not(target_family = "wasm"))] | ||
| { | ||
| // This will fail silently if the global pool has already been configured. |
There was a problem hiding this comment.
Print a log indicating the thread pool has been properly initialized? Currently we don't know if it's initialized (or a failure due to repeated initialization happened).
Also, with_num_threads no longer seems to be the right interface, since it just returns self. Maybe the driver (cli.rs) should config this explicitly.
There was a problem hiding this comment.
Changed to set_num_threads. We'll fix this when we refactor to use a local pool.
Now parallelism can be controlled with new endpoint
with_num_threadsAlso, parallelism in tests is separate- we test with parallelism on and off instead of randomly choosing.