Skip to content

Add new multiset primitives#783

Merged
saulshanabrook merged 36 commits intoegraphs-good:mainfrom
saulshanabrook:multiset-changes
Mar 16, 2026
Merged

Add new multiset primitives#783
saulshanabrook merged 36 commits intoegraphs-good:mainfrom
saulshanabrook:multiset-changes

Conversation

@saulshanabrook
Copy link
Copy Markdown
Member

@saulshanabrook saulshanabrook commented Jan 6, 2026

Adds new multiset primitives as used in https://uwplse.org/2026/02/24/egglog-containers.html and egraphs-good/egglog-python#397 to try avoiding A/C blowup.

This also:

  • Changes how higher order primitive functions are registered, so that either the container sort or the function sort can be registered first.
  • Exposes some additional structures on the egraph to support "freezing" it, which is exposed in Python: https://github.com/egraphs-good/egglog-python/blob/bf94c2ec4d95ec9835be8826cb9d2e8d6e50765f/src/freeze.rs This is used for debugging to see what e-classes are together. Alternatively, I could keep these private and expose a frozen e-graph directly here.
  • Add the uf table to be a merge function dep so that we can say use vec-union as a merge function (fix recommended by @ezrosent)
  • Adds ability to lookup if a function is a let binding so that the Python bindings can expose this.
  • Allow nested container types

@saulshanabrook saulshanabrook added the status:needs decision Needs feedback from a project meeting to know whether to move forward label Jan 26, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

This PR expands the public API of EGraph and Function types, adds numerous new primitives for multiset and vector operations, removes restrictive validation for nested container sorts, and updates internal dependency tracking for merge operations with union-find state.

Changes

Cohort / File(s) Summary
Build Configuration
Makefile
Added nits-fix target to run Rust lints and formatting (clippy fix and rustfmt), updated PHONY declaration.
Public API Expansion
src/lib.rs
Made EGraph fields (backend, functions, fact_directory, type_info) and Function.backend_id public; added Function::is_let_binding() and EGraph::new_union_action() methods; updated imports for UnionAction.
Core Primitive Registration
egglog-bridge/src/lib.rs
Updated Primitive merge function dependency tracking to register uf_table as write dependency alongside argument merge function trees.
Sort Type System
src/sort/mod.rs
Updated documentation for is_eq_container_sort to clarify support for nested container sorts containing ids.
Container Sort Relaxation
src/sort/set.rs, src/sort/map.rs
Removed runtime validation preventing EqSort containers in SetSort/MapSort; expanded is_eq_container_sort to consider nested Eq containers recursively; adjusted rebuild logic for set-empty, set-of, and map-empty primitives.
Primitive Enhancements
src/sort/fn.rs, src/sort/i64.rs
Made FunctionContainer fields public; updated FunctionSort.is_eq_container_sort to check nested containers; enhanced register_primitives for vec/multiset; added i64 "abs" primitive.
Multiset Operations
src/sort/multiset.rs
Added extensive multiset primitives (multiset-single, multiset-intersection, multiset-remove, multiset-subtract, multiset-sum, multiset-pick-max, multiset-count, multiset-union-values, etc.); introduced SumMultisets, UnionValues, Fold, FlatMap, and Filter internal primitives; expanded Map struct with output_multiset field; updated is_eq_container_sort and rebuild logic; added helper functions for registration.
Vector Operations
src/sort/vec.rs
Added vec-union, vec-range, and unstable-vec-map primitives; introduced internal VecMap and Union structs for vector mapping and element-wise union; expanded is_eq_container_sort for nested containers; removed validation preventing nested Vec with EqSort containers; added register_vec_primitives_for_function helper.
Error Handling
src/typechecking.rs
Removed DisallowedSort variant from TypeError enum.
Internal Utilities
src/util.rs
Changed FreshGen name hint key formatting from Debug ({name_hint:?}) to Display ({name_hint}) in SymbolGen implementation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add new multiset primitives' directly matches the primary change: introducing new multiset primitives and related operations throughout the codebase.
Description check ✅ Passed The PR description clearly explains the changes: adding multiset primitives to avoid A/C blowup, changing higher-order function registration semantics, exposing egraph structures for freezing, adding uf table as merge dependency, and allowing nested container types. All aspects align with the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 6, 2026

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 190 skipped benchmarks1


Comparing saulshanabrook:multiset-changes (625f24d) with main (afd5c07)

Open in CodSpeed

Footnotes

  1. 190 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@saulshanabrook saulshanabrook marked this pull request as ready for review March 9, 2026 20:35
@saulshanabrook saulshanabrook requested a review from a team as a code owner March 9, 2026 20:35
@saulshanabrook saulshanabrook requested review from Copilot and oflatt and removed request for a team March 9, 2026 20:35
@saulshanabrook
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 9, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the container primitive surface (notably for MultiSet and Vec) to support new operations and higher-order container combinators, and adjusts registration so that higher-order primitives can be registered regardless of whether the function sort or container sort is defined first. It also exposes additional EGraph internals to support “freezing” for debugging via downstream bindings.

Changes:

  • Add many new MultiSet primitives (single/count/intersection/subtract, higher-order map/filter/fold/flat-map, index fill/clear, union-values, etc.) and extend equality-container handling to nested containers.
  • Add new Vec primitives (vec-union, vec-range, unstable-vec-map) and adjust higher-order primitive registration behavior.
  • Expose additional EGraph/Function internals and update merge dependency tracking to include the UF table for primitive merge functions.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/web-demo/multiset.egg Expands multiset test coverage for new primitives and higher-order operations.
tests/vec.egg Adds regression tests plus new vec-union, vec-range, and unstable-vec-map checks.
tests/snapshots/files__shared_snapshot_vec.snap Updates snapshot outputs due to new/changed test content.
tests/snapshots/files__shared_snapshot_multiset.snap Updates snapshot outputs due to new/changed test content.
tests/math-microbenchmark.egg Formatting-only adjustments.
src/util.rs Changes fresh name hint counting keying for ResolvedCall.
src/typechecking.rs Removes DisallowedSort error variant (nesting restrictions removed elsewhere).
src/sort/vec.rs Adds vec-union, vec-range, unstable-vec-map, and registration logic for higher-order vec primitives.
src/sort/set.rs Allows nested eq-container sorts and updates rebuild behavior accordingly.
src/sort/multiset.rs Adds multiset primitive suite + higher-order primitives and nested eq-container handling.
src/sort/mod.rs Updates comment to reflect nested eq-container canonicalization behavior.
src/sort/map.rs Removes nesting restrictions and updates rebuild-key/value behavior for nested eq-containers.
src/sort/i64.rs Adds abs primitive.
src/sort/fn.rs Registers vec/multiset higher-order primitives when function sorts are registered.
src/lib.rs Exposes backend/functions/type_info and adds union-action helper for external users.
egglog-bridge/src/lib.rs Adds UF table dependency for primitive merge fns.
Makefile Adds nits-fix target and updates .PHONY.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/sort/multiset.rs
Comment thread tests/web-demo/multiset.egg
Comment thread egglog-bridge/src/lib.rs
Comment thread src/lib.rs Outdated
Comment thread src/sort/multiset.rs
Comment thread src/sort/fn.rs
Comment thread src/sort/multiset.rs Outdated
Comment thread src/sort/vec.rs
Comment thread src/lib.rs Outdated
Comment thread src/lib.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
Makefile (1)

28-37: Consider consolidating nits-fix and fixnits targets.

The new nits-fix target (lines 28-32) largely duplicates the existing fixnits target (lines 34-37). The main differences are:

  • nits-fix runs clippy before fmt; fixnits does the opposite
  • nits-fix includes --tests flag for clippy

If both behaviors are intentional, consider documenting the distinction. Otherwise, consolidating into a single target would reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 28 - 37, The Makefile has duplicated targets nits-fix
and fixnits with only minor differences; consolidate them by removing one and
unifying behavior: choose a single target name (e.g., fixnits), combine the
necessary rustup/component installs, run cargo clippy once with the desired
flags (include or exclude --tests as required) and then run cargo fmt (or
document why clippy must run before fmt), and update any docs/comments to
explain the chosen order/flags; ensure references to the removed target
(nits-fix or fixnits) are updated and keep the combined target idempotent.
src/lib.rs (1)

241-252: Public field exposure for external egraph manipulation.

Making backend, functions, and type_info public is a significant API surface change. Per the PR objectives, this supports "freezing" the egraph for Python-facing implementations.

Consider adding doc comments to clarify the intended use cases and any invariants that external code should maintain when accessing these fields directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib.rs` around lines 241 - 252, The EGraph struct exposes backend,
functions, and type_info publicly which expands the API surface; add concise doc
comments on the EGraph type and on each public field (backend, functions,
type_info) that state intended use (e.g., "read-only except via provided APIs",
"used for Python freezing"), any invariants callers must preserve (e.g., do not
mutate internal invariants or swap types), and guidance for future maintainers
about safe access patterns; place these docs above the pub struct EGraph and
above each pub field (backend, functions, type_info) so external users and docs
generators clearly see the contract.
src/sort/vec.rs (1)

146-157: Negative range silently produces empty vector.

When end is negative, try_into().unwrap_or(0) converts it to 0, resulting in an empty vector. This silent fallback may mask user errors.

Consider returning None for negative inputs to surface the error, consistent with how other primitives handle invalid inputs.

Proposed fix
         if self.element.name() == "i64" {
-            add_primitive!(eg, "vec-range" = {self.clone(): VecSort} |end: i64| -> `@VecContainer` (arc) { VecContainer {
+            add_primitive!(eg, "vec-range" = {self.clone(): VecSort} |end: i64| -?> `@VecContainer` (arc) {
+                let end: usize = end.try_into().ok()?;
+                Some(VecContainer {
                 do_rebuild: self.ctx.is_eq_container_sort(),
                 data: {
-                    let end: usize = end.try_into().unwrap_or(0);
                     (0..end)
                         .map(|i| exec_state.base_values().get::<i64>(i as i64))
                         .collect()
                 }
-            } });
+            }) });
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sort/vec.rs` around lines 146 - 157, The current "vec-range" primitive in
VecSort silently converts negative end to 0 via try_into().unwrap_or(0),
producing an empty vector; change the closure passed to add_primitive! for
"vec-range" so it checks the i64 end parameter and returns None for end < 0
instead of converting to 0, otherwise safely convert end to usize and build the
VecContainer as before (preserving do_rebuild and data generation via
exec_state.base_values()). Locate the closure associated with "vec-range" in the
VecSort implementation and replace the try_into().unwrap_or(0) logic with an
explicit negative-check and early None return to match other primitives'
invalid-input handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sort/i64.rs`:
- Around line 44-45: The current literal primitive registration
add_literal_prim!(eg, "abs" = |a: i64| -> i64 { a.abs() }) can panic on
i64::MIN; change the implementation to use a.checked_abs() and handle the None
case with the same fallible pattern used elsewhere (e.g., the
"-?>"/checked-operation convention in this module) so that overflow returns the
appropriate error/None instead of invoking i64::abs(); update the "abs" closure
to call checked_abs() and propagate the failure consistently with other checked
ops.

In `@src/sort/multiset.rs`:
- Around line 135-138: The primitive "multiset-single" currently calls
i.try_into().unwrap() which will panic for negative i; change the conversion to
handle negatives (e.g., compute let count = if i < 0 { 0 } else { i as usize }
and use count with std::iter::repeat_n) or make the primitive fallible and
return an Err for negative counts; update the closure in the add_primitive!
invocation that builds MultiSetContainer (reference symbols: "multiset-single",
MultiSetContainer, self.element(), self.ctx.is_eq_container_sort()) to use the
safe/clamped count conversion instead of unwrap().
- Around line 417-431: The loop in multiset fill logic uses a break when
action.lookup(exec_state, &row).is_some(), which prematurely stops processing
remaining elements (symbols: ResolvedFunctionId::Lookup, action.lookup,
multiset.data.iter_counts(), action.insert), so change the control flow to skip
only the already-indexed element (replace break with continue) so other elements
still get indexed; ensure the rest of the loop still pushes the count and calls
action.insert for non-indexed rows.
- Around line 693-717: The fold currently ignores the provided `initial` when
the multiset is non-empty; update the `apply` implementation so `initial` is
always used as the starting accumulator: in the `apply` function (where you
obtain `fc` from `FunctionContainer`, `initial`, and
`multiset`/`multiset.data`), set `let mut acc = initial;` (keep the existing
early-return behavior for empty multiset if desired) and then iterate over all
`values` calling `acc = fc.apply(exec_state, &[acc, v])?;` for each element,
finally returning `Some(acc)`. Ensure you remove the logic that pops/uses the
first element as the initial accumulator.

---

Nitpick comments:
In `@Makefile`:
- Around line 28-37: The Makefile has duplicated targets nits-fix and fixnits
with only minor differences; consolidate them by removing one and unifying
behavior: choose a single target name (e.g., fixnits), combine the necessary
rustup/component installs, run cargo clippy once with the desired flags (include
or exclude --tests as required) and then run cargo fmt (or document why clippy
must run before fmt), and update any docs/comments to explain the chosen
order/flags; ensure references to the removed target (nits-fix or fixnits) are
updated and keep the combined target idempotent.

In `@src/lib.rs`:
- Around line 241-252: The EGraph struct exposes backend, functions, and
type_info publicly which expands the API surface; add concise doc comments on
the EGraph type and on each public field (backend, functions, type_info) that
state intended use (e.g., "read-only except via provided APIs", "used for Python
freezing"), any invariants callers must preserve (e.g., do not mutate internal
invariants or swap types), and guidance for future maintainers about safe access
patterns; place these docs above the pub struct EGraph and above each pub field
(backend, functions, type_info) so external users and docs generators clearly
see the contract.

In `@src/sort/vec.rs`:
- Around line 146-157: The current "vec-range" primitive in VecSort silently
converts negative end to 0 via try_into().unwrap_or(0), producing an empty
vector; change the closure passed to add_primitive! for "vec-range" so it checks
the i64 end parameter and returns None for end < 0 instead of converting to 0,
otherwise safely convert end to usize and build the VecContainer as before
(preserving do_rebuild and data generation via exec_state.base_values()). Locate
the closure associated with "vec-range" in the VecSort implementation and
replace the try_into().unwrap_or(0) logic with an explicit negative-check and
early None return to match other primitives' invalid-input handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4c2cad1-bb84-4029-91bb-859b05249795

📥 Commits

Reviewing files that changed from the base of the PR and between 912663b and bd57829.

⛔ Files ignored due to path filters (5)
  • tests/math-microbenchmark.egg is excluded by !**/*.egg
  • tests/snapshots/files__shared_snapshot_multiset.snap is excluded by !**/*.snap
  • tests/snapshots/files__shared_snapshot_vec.snap is excluded by !**/*.snap
  • tests/vec.egg is excluded by !**/*.egg
  • tests/web-demo/multiset.egg is excluded by !**/*.egg
📒 Files selected for processing (12)
  • Makefile
  • egglog-bridge/src/lib.rs
  • src/lib.rs
  • src/sort/fn.rs
  • src/sort/i64.rs
  • src/sort/map.rs
  • src/sort/mod.rs
  • src/sort/multiset.rs
  • src/sort/set.rs
  • src/sort/vec.rs
  • src/typechecking.rs
  • src/util.rs
💤 Files with no reviewable changes (1)
  • src/typechecking.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: Agent
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: CodSpeed Macro Runners
  • GitHub Check: benchmark (ubuntu-latest, math-microbenchmark)
  • GitHub Check: benchmark (ubuntu-latest, taylor51)
  • GitHub Check: benchmark (ubuntu-latest, python_array_optimize)
  • GitHub Check: benchmark (codspeed-macro, math-microbenchmark)
  • GitHub Check: benchmark (codspeed-macro, typeinfer)
  • GitHub Check: benchmark (codspeed-macro, stresstest_large_expr)
  • GitHub Check: benchmark (codspeed-macro, herbie)
  • GitHub Check: benchmark (codspeed-macro, eggcc-extraction)
  • GitHub Check: benchmark (codspeed-macro, python_array_optimize)
  • GitHub Check: benchmark (codspeed-macro, extract-vec-bench)
  • GitHub Check: CodSpeed Macro Runners
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile

[warning] 1-1: Missing required phony target "clean"

(minphony)

🔇 Additional comments (21)
egglog-bridge/src/lib.rs (1)

1219-1219: LGTM! Correctly adds uf_table as a write dependency for primitive merge functions.

This enables container merge functions like vec-union to properly write to the union-find table during merge operations. Per the DependencyGraph design (context snippet 1), write dependencies don't increase execution levels, so tables with mutual write dependencies can still run merge operations in parallel.

src/sort/mod.rs (1)

79-83: LGTM! Documentation clarification aligns with recursive implementation.

The updated comment accurately reflects that is_eq_container_sort now considers nested container sorts that transitively contain IDs, matching the recursive implementations in SetSort, VecSort, MapSort, and FunctionSort.

src/sort/map.rs (2)

117-122: LGTM! Recursive is_eq_container_sort correctly propagates through nested containers.

The implementation properly checks both key and value sorts for either is_eq_sort() or is_eq_container_sort(), enabling correct handling of nested containers like Map<Vec<EqSort>, T>.


142-146: LGTM! Rebuild flags correctly account for nested eq-container sorts.

Setting do_rebuild_keys and do_rebuild_vals based on is_eq_sort() || is_eq_container_sort() ensures that values within nested containers are properly canonicalized during rebuild.

src/sort/set.rs (2)

91-93: LGTM! Correctly extends eq-container detection to nested containers.

The recursive check enables proper handling of nested structures like Set<Vec<EqSort>>.


113-120: LGTM! Rebuild behavior correctly propagates through container nesting.

Using is_eq_container_sort() instead of just is_eq_sort() ensures that sets containing containers (e.g., Set<Vec<EqSort>>) are properly rebuilt when their nested IDs change.

src/sort/fn.rs (3)

18-22: Verify: All FunctionContainer tuple fields are now public.

Previously the first field (ResolvedFunctionId) was private. Making all fields public expands the public API surface. Confirm this is intentional for the egglog-python integration mentioned in the PR objectives.


166-170: LGTM! Correctly checks input sorts for nested eq-containers.

The implementation properly only checks inputs (not output) because FunctionContainer only stores partial arguments, which are what need rebuilding. This aligns with rebuild_contents (lines 49-58) which only iterates over self.1 (the partial args).


203-206: LGTM! Registering vec/multiset primitives for function sorts.

This enables operations like vec-map and multiset-map to work with function values, aligning with the PR's goal of adding new container primitives.

src/util.rs (1)

84-101: The concern about Display format collisions is mitigated by validation upstream.

Switching from Debug ({name_hint:?}) to Display ({name_hint}) formatting does create identical HashMap keys for ResolvedCall::Func and ResolvedCall::Primitive if they share the same name. However, the type system prevents this collision from occurring: function definitions are validated in typechecking.rs (line 524) to reject any function named with a primitive name via the is_primitive() check, which considers both self.primitives and self.reserved_primitives. Since users cannot define functions with primitive names, the collision scenario cannot occur in practice, making this change safe.

src/lib.rs (2)

40-54: LGTM!

Import reorganization is clean. The UnionAction import supports the new union primitives added in the vec and multiset sorts.


303-307: LGTM!

The is_let_binding() getter and new_union_action() factory method are straightforward additions that expose existing functionality in a controlled manner. The UnionAction creation is used by the new vec-union and multiset-union-values primitives.

Also applies to: 1807-1811

src/sort/vec.rs (4)

1-4: LGTM!

Clean import additions for the new functionality.


94-96: LGTM!

The updated is_eq_container_sort logic now correctly handles nested containers (e.g., Vec[Vec[SomeEqSort]]) by checking both is_eq_sort() and recursively is_eq_container_sort().


248-275: LGTM!

The VecMap primitive correctly:

  • Retrieves the function and vector containers
  • Applies the function to each element, dropping undefined results
  • Sets do_rebuild based on the output vector's sort
  • Registers the result with the container values

302-321: LGTM!

The Union primitive implementation is correct:

  • Validates that both vectors have the same length
  • Uses UnionAction::union() which stages inserts via stage_insert() (per context snippet 1), making it safe to call during apply()
  • Returns the first vector argument as the result
src/sort/multiset.rs (5)

1-4: LGTM!

Clean import additions for the new functionality.


108-110: LGTM!

Same nested container handling improvement as in vec.rs.


745-760: LGTM!

The UnionValues primitive correctly:

  • Uses iter_counts() to get unique values (ignoring multiplicities)
  • Handles empty multisets by returning None
  • Stages unions via UnionAction::union() which is safe during apply()
  • Returns the first value as the representative

830-847: LGTM!

The subtract method correctly handles:

  • Returning None if the minuend lacks sufficient counts
  • In-place count updates with proper cleanup of zero-count entries
  • Length cache maintenance

868-880: LGTM!

The intersection method correctly computes element-wise minimum counts.

Comment thread src/sort/i64.rs Outdated
Comment thread src/sort/multiset.rs Outdated
Comment thread src/sort/multiset.rs
Comment thread src/sort/multiset.rs
@oflatt
Copy link
Copy Markdown
Member

oflatt commented Mar 10, 2026

Had a chat with Saul on API safety, we'll probably want a different PR for supporting reading from tables in a more convenient way.

Also chatted about multiset and if all these new primitives belong in this repo

@saulshanabrook
Copy link
Copy Markdown
Member Author

Had a chat with Saul on API safety, we'll probably want a different PR for supporting reading from tables in a more convenient way.

I refactored this to remove the things I made public and instead added a few high level functions.

Copy link
Copy Markdown
Member

@oflatt oflatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern on the API

Comment thread src/lib.rs
Copy link
Copy Markdown
Collaborator

@yihozhang yihozhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason it is okay to have nested containers currently is because we are rebuilding the containers in the order they are defined:

for (_, env) in self.data.iter_mut() {
So naturally we rebuild children container before rebuilding parent.

However, I think this is no longer true in parallel mode, where all containers are built in parallel. Could you check if the tests are still passing when it's run in parallel mode and parallelize_inter_container_op is set to always be true?

Comment thread Makefile
Comment thread src/lib.rs
@saulshanabrook
Copy link
Copy Markdown
Member Author

I think the reason it is okay to have nested containers currently is because we are rebuilding the containers in the order they are defined:

for (_, env) in self.data.iter_mut() {

So naturally we rebuild children container before rebuilding parent.
However, I think this is no longer true in parallel mode, where all containers are built in parallel. Could you check if the tests are still passing when it's run in parallel mode and parallelize_inter_container_op is set to always be true?

I re-ran the multiset tests with RAYON_NUM_THREADS=4 and parallelize_inter_container_op always set to true and they pass.

Codex had this to say about overall correctness but I didn't verify it:

From the code, I think the PR comment is probably okay as-is, but for a slightly different reason than “definition order”. The parallel branch is real in core-relations/src/containers/mod.rs:131, so parent and child container envs can rebuild concurrently. What seems to preserve correctness is the fixed-point rebuild loop in egglog-bridge/src/lib.rs:830: if an inner container changes, container rebuild returns true, and the system runs another rebuild iteration. Since MultiSetContainer rebuild just canonicalizes its immediate element
IDs via rebuild_slice in src/sort/multiset.rs:11, an outer multiset that misses a child’s new canonical ID in one parallel pass should get picked up on the next pass. So I don’t currently see evidence that nested multisets are broken in parallel mode.

@yihozhang
Copy link
Copy Markdown
Collaborator

I see! I think Codex's point makes sense.

@saulshanabrook saulshanabrook requested a review from oflatt March 16, 2026 20:45
Copy link
Copy Markdown
Member

@oflatt oflatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@saulshanabrook saulshanabrook merged commit 8972a09 into egraphs-good:main Mar 16, 2026
33 checks passed
@saulshanabrook saulshanabrook deleted the multiset-changes branch March 16, 2026 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status:needs decision Needs feedback from a project meeting to know whether to move forward

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants