[WIP] Quantifier eleminiation work by tautschnig · Pull Request #8904 · diffblue/cbmc

tautschnig · 2026-03-28T19:36:21Z

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

conjunction(...) and disjunction(...) helper functions produce the appropriate result for empty operand sequences. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Use E-matching on index expressions to instantiate quantifiers when eager instantiation (aka enumerative instantiation) cannot be applied. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Replace the simplistic E-matching in instantiate_one_quantifier with a complete instantiation approach based on: Yeting Ge and Leonardo de Moura, "Complete instantiation for quantified formulas in Satisfiability Modulo Theories", CAV 2009. The paper shows that for essentially uninterpreted formulas (where quantified variables only appear as arguments of uninterpreted functions), the quantifier can be eliminated by instantiating with a finite set of ground terms computed as the least fixed point of a system of set constraints (ΔF). In CBMC's context, array accesses (index_exprt) play the role of uninterpreted functions. Key changes: - find_index_contexts: Properly decomposes array index expressions into bound_var + offset form using pre-order traversal. Handles patterns var, cast(var), var+r, r+var, var-r. This implements the offset extension from Section 4 of the paper, where for f(x_i + r) we generate constraints S_{k,i} + r ⊆ A_{f,j} and A_{f,j} - r ⊆ S_{k,i}. - collect_ground_indices: Collects ground index terms from both index_exprt (array reads) and with_exprt (array writes) in the bitvector cache, seeding the sets A_{f,j} from the paper. - compute_instantiation_set: Computes the least fixed point of the set constraint system ΔF. Uses integer arithmetic when all offsets and ground indices are numeric constants (avoiding expression growth), with a symbolic fallback using simplify_expr. Bounded by max_iterations and max_terms to handle non-stratified (non-FEU) cases. - arrays_match: SSA-aware array comparison using L1 object identifiers to match across SSA versions. For nested array accesses (e.g., multi-dimensional arrays like a[i][j]), recursively matches the base array and compares simplified indices, since the pattern may contain unsimplified arithmetic (e.g., cast(0 % 2)) that is semantically equal to a constant in the cache. Instantiation terms are sorted (via irept::operator<) before generating the conjunction/disjunction to ensure deterministic SAT clause ordering. Without this, MiniSat performance varies from sub-second to minutes on the same UNSAT instance depending on hash-dependent iteration order, which changes with file paths embedded in source locations. This enables verification of properties that depend on relationships between adjacent array elements, such as: forall j: arr[j+1] == arr[j] + 1 which the previous E-matching could not handle because it did not generate the ground terms needed for the offset positions. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add two regression tests exercising the complete instantiation approach: - Quantifiers-offset-instantiation: Tests the offset pattern arr[j+1] from Section 4 of Ge & de Moura (2009). The quantifier forall j < n-1: arr[j+1] == arr[j] + 1 requires the set constraint fixed-point to generate ground terms for both arr[j] and arr[j+1] positions. Without the offset handling, the instantiation misses the terms needed to chain the equalities. - Quantifiers-fixpoint-instantiation: Tests that the fixed-point computation correctly propagates through multiple iterations. The property arr[2] > 0 requires two steps of reasoning: j=0: arr[0]>0 => arr[1]>0 j=1: arr[1]>0 => arr[2]>0 This exercises the iterative nature of the ΔF fixed-point from Section 3 of the paper, where instantiating with one set of terms generates new ground terms that feed back into the next iteration. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add TIMEOUT 1200 to the set_tests_properties call in both copies of the add_test_pl_profile macro (regression/CMakeLists.txt and regression/libcprover-cpp/CMakeLists.txt). This ensures CTest will kill and report as failed any regression test that exceeds 20 minutes, preventing CI jobs from hanging indefinitely on tests that time out. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add a -t <secs> option (and TESTPL_TIMEOUT environment variable) to test.pl for enforcing a wall-clock timeout on each individual test. The implementation replaces the system() call in run() with fork/exec, placing the child in its own process group (setpgrp). When the timeout fires via Perl's alarm(), the entire process group is killed with SIGKILL, ensuring all child processes (bash, cbmc, solvers) are cleaned up. On timeout the test is marked as failed and a synthetic output file is written so that the pattern-matching logic does not crash. The timeout defaults to 0 (disabled), preserving existing behaviour unless -t is explicitly passed or TESTPL_TIMEOUT is set. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add a KNOWNBUG regression test for the invariant violation in arrays.cpp when --arrays-uf-always is used with code containing array-of-structs where struct members are arrays accessed through a nondet index. The arrays theory (collect_arrays, add_array_constraints) does not handle member expressions where the struct operand is an index expression, causing CBMC to crash with SIGABRT. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add a regression test for quantifier instantiation with a non-constant upper bound. The test uses assume(forall r. (r < k) ==> t[r] < 10) where k is a symbolic variable later constrained to 1, then asserts t[0] < 10. This currently fails on the SAT backend because the complete instantiation algorithm does not find ground index terms when the array in the quantifier body is an SSA array literal that does not match the SSA symbol used in other array accesses. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add detailed analysis of two issues found during PQC proof benchmarking: 1. invariant_violation_analysis.md: --arrays-uf-always crash on array-of-structs patterns (pre-existing CBMC bug in arrays.cpp) 2. sat_failure_analysis.md: SAT/SMT disagreement caused by incomplete quantifier instantiation with variable bounds (bug in the complete instantiation algorithm on this branch) Both include standalone minimal C reproducers in scripts/reproducers/. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

When a forall expression has a variable (non-constant) upper bound, the eager quantifier instantiation fails and falls through to the complete instantiation algorithm. The complete instantiation collects ground index terms from the bv_cache by matching array expressions. After SSA renaming, the array in the quantifier body becomes an array literal (array_exprt, e.g., {t#1[0], t#1[1]}), while array accesses elsewhere use SSA symbols (e.g., t#1). The arrays_match function cannot match these structurally different representations, causing collect_ground_indices to return an empty set. With no ground indices, the forall is instantiated with zero terms — effectively becoming true. When this occurs in an assume, the assumption is vacuous, leading to spurious counterexamples. Fix: when a context's array is an array_exprt (fixed-size array literal), add all valid indices 0..size-1 as ground indices directly. This is sound because these are exactly the valid indices for the array, and instantiating with all of them is complete. A cap of 256 elements prevents blowup on large arrays. This fixes SAT/SMT disagreements on PQC proofs (mlkem-native and mldsa-native) that use DFCC loop contracts with forall invariants where the loop counter is the quantifier bound. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Re-run the full mlkem-native (153 proofs) and mldsa-native (175 proofs) benchmark with the quantifier instantiation fix (2ac1dee), 30GB per-proof memory limit, and 2 parallel processes. Key result: ZERO FAILURES across all backends. The quantifier fix eliminated all 16 SAT/SMT disagreements from the previous run (4 on both SAT solvers + 12 MiniSat-only). The benchmark script now supports --parallel N for running N proofs concurrently, and --memlimit MB for per-proof memory limits. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

collect_arrays() and add_array_constraints() restricted member expressions to only those whose struct operand was a symbol or nondet_symbol. With --arrays-uf-always, member expressions can have more complex struct operands (index, nested member, struct literal) when arrays are embedded in structs. These are base array expressions that need no special constraint generation, just like symbols. The overly strict invariant caused crashes on tests like Array_operations2, array-bug-6230, byte_update18, and bounds_check1 when run with --arrays-uf-always. From: Michael Tautschnig <tautschn@amazon.com> Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

The arrays.cpp fix now handles member expressions with arbitrary struct operands, so the test no longer crashes and can be a CORE test. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

The script only added tools/bin to PATH when --install-deps was explicitly passed. Tools installed in a previous run (Z3, Bitwuzla) were not found when using --skip-clone --skip-build without --install-deps. Fix by always checking for and adding the tools/bin directory to PATH if it exists. Also adds --parallel N option for running N proofs concurrently. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Re-run the full benchmark (153 mlkem + 175 mldsa proofs) with: - Arrays fix: zero invariant violations (was 3-7 crashes) - Bitwuzla: all 54 previously-skipped mlkem proofs now run (153/153 SMT success) - 30 GB memory limit, 2 parallel processes Results: - mlkem: 153 SMT success, 142 CaDiCaL success, 147 MiniSat success - mldsa: 174 SMT success, 144 CaDiCaL success, 154 MiniSat success - 2 remaining SAT failures (polyveck_add, polyvec_matrix_pointwise_montgomery) are --arrays-uf-always soundness issues, not crashes - Memory: SMT avg 119MB, CaDiCaL avg 826MB, MiniSat avg 539MB - Peak: 21.2 GB (kem_dec/CaDiCaL), well within 30GB limit Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

With --arrays-uf-always, indexing an array of structs containing array members through a nondeterministic index produces a spurious counterexample on the SAT backend. SMT + --arrays-uf-always and SAT without it both verify successfully. Minimal reproducer: 9-line C program with struct S { int d[1]; }, a[2] initialized to 1, nondet index, assertion a[i].d[0] == 1. Trigger requires all three: --arrays-uf-always, struct with array member, nondet outer index. Plain int members do not trigger. Affects mldsa-native proofs polyveck_add and polyvec_matrix_pointwise_montgomery (both SAT solvers). Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Apply arrays2.patch: in convert_index, when the array operand is a member expression whose struct operand is not a symbol/nondet_symbol and the array type has a known finite size, bypass the array decision procedure and fall through to the bounded-array encoding. This fixes the spurious counterexample for small structs (array members with <=64 elements). Upgrade the regression test from KNOWNBUG to CORE. A separate KNOWNBUG test is added for large structs (array members with 65+ elements) where the fix is incomplete: the outer array-of-structs access still goes through the array theory and fails to properly constrain the struct bitvector. This is the root cause of the remaining SAT failures on mldsa-native proofs polyveck_add and polyvec_matrix_pointwise_montgomery. Co-authored-by: Michael Tautschnig <tautschn@amazon.com> Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

When --arrays-uf-always is used, symbol arrays have two disconnected representations: map literals (used in struct contexts) and element-wise free variables (from the array theory). The array theory constrains the element-wise variables but not the map literals, causing spurious counterexamples when a symbol array appears inside a struct that is an element of another array. Fix: in boolbv_set_equality_to_true, for symbol arrays with known finite size up to MAX_FLATTENED_ARRAY_SIZE elements, connect the map literals to the element-wise bitvectors via convert_bv on index expressions for each element. This ensures array theory constraints propagate to struct contexts. Upgrade arrays-uf-always-large-struct-soundness from KNOWNBUG to CORE. Co-authored-by: Michael Tautschnig <tautschn@amazon.com> Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Re-run the full benchmark (153 mlkem + 175 mldsa proofs) with all fixes: - Quantifier instantiation for array literals in SSA - Accept member expressions with arbitrary struct operands - Bypass array theory for member-of-index expressions - Connect array symbol map literals to element-wise constraints Results: 0 FAILURE across all 328 proofs x 3 backends. - mlkem: 153 SMT success, 142 CaDiCaL success, 147 MiniSat success - mldsa: 174 SMT success, 144 CaDiCaL success, 154 MiniSat success - Previously-failing polyveck_add and polyvec_matrix_pointwise_montgomery now TIMEOUT instead of FAILURE (no more spurious counterexamples) - Memory: SMT avg 119MB, CaDiCaL avg 773MB, MiniSat avg 524MB Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

timeout_oom_analysis.md: Root cause analysis of 46 SAT timeout/OOM proofs. Three categories: Keccak+no-AFS array theory blowup (5), quantifier instantiation expansion (41), large matrix operations (1). 84% of timeouts are memory-bound; 9 low-memory MiniSat timeouts are CPU-bound. pqc_benchmark_context.md: Compacted context document for resuming this work in future sessions. Contains branch state, all bugs found/fixed, final benchmark results, file locations, and commit guidelines. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Proposes a CEGAR-based lazy quantifier instantiation loop analogous to --refine-arrays. Key findings from the deep-dive: - Memory is overwhelmingly in the SAT solver clause database (178x blowup for poly_reduce: 27MB conversion vs 4.8GB solving) - 41 quantifier-heavy PQC proofs generate 10K+ ground instances from ~40 forall occurrences × 256-element arrays - Most instances are irrelevant; the solver needs only O(1) per quantifier in typical cases The design integrates into bv_refinementt's existing CEGAR loop, adding ~150 lines of new code.

Add a CEGAR-based lazy quantifier instantiation option analogous to --refine-arrays. When enabled, quantified formulas are not eagerly expanded into conjunctions/disjunctions over all ground indices. Instead, each quantifier gets a placeholder literal, and the CEGAR loop checks whether the SAT model satisfies each quantifier. Only violated instances are added incrementally. Architecture: - boolbv.h: add lazy_quantifiers flag and make finish_eager_conversion_quantifiers virtual - boolbv_quantifier.cpp: skip eager instantiation when lazy - bv_refinement.h: add refine_quantifiers config and overrides - refine_quantifiers.cpp: implement finish_eager_conversion_quantifiers (freeze variables, set placeholder literals) and quantifiers_overapproximated (evaluate model, add violated instances) - solver_factory: wire --refine-quantifiers option through Evaluation uses a hybrid strategy: first try get()+simplify_expr() which is cheap and works for field-sensitive arrays, then fall back to convert()+l_get() for array-theory arrays. Measured impact on PQC proofs: - poly_reduce: 4.8GB eager → 206MB refine (23x memory reduction) - poly_chknorm: 133MB eager → 45MB refine (3x memory reduction) Regression tests cover: basic forall, symbolic index, expected failure, multiple quantifiers, no-quantifier no-op, signed bounds.

Quantifiers with variable bounds (e.g., forall k. k < i => body(k) where i is symbolic) cannot be lazily refined because the refinement loop cannot enumerate their instances. Previously, all quantifiers were deferred, causing soundness failures on DFCC proofs where loop invariants use variable-bound quantifiers. Fix: partition quantifiers into two groups: - Constant bounds (k < 256): deferred for lazy CEGAR refinement - Variable bounds (k < i): eagerly instantiated via base class This fixes VERIFICATION FAILED on mldsa poly_reduce which has both constant-bound precondition quantifiers and variable-bound loop invariant quantifiers. Also reorder the file to place static helpers before class methods.

Add measured impact table showing --refine-quantifiers helps for large proofs (poly_reduce: 4.8GB→2.2GB) but adds overhead for smaller proofs. Document the finding that all PQC quantifiers are in VCC assumptions, and the partition strategy for variable-bound quantifiers.

…nd CaDiCaL Four fixes for crashes discovered during PQC benchmark runs: 1. Guard against quantifiers with non-symbol bound variables (e.g., typecast expressions). These are sent to eager instantiation instead of crashing with an invariant violation in to_quantifier_expr(). 2. Defer all convert() calls in quantifiers_overapproximated() to the end, after all get() evaluations complete. This prevents convert() from adding clauses that invalidate CaDiCaL's satisfied state, which caused a fatal error in val(). 3. Skip simplify_expr() on expressions containing nested quantifiers (which may have non-symbol variables), avoiding an invariant violation in the simplifier. 4. Move quantifiers_overapproximated() before arrays_overapproximated() in the CEGAR loop so that get() calls see a valid SAT state.

…arge arrays When a quantifier has constant bounds and the bound variable is used as an array index, skip the eager full-range instantiation (which generates one instance per value in [lb, ub]) and let the quantifier fall through to the Ye/de Moura complete instantiation path. This path only generates instances for the ground index terms actually present in the formula, which for loop-invariant proofs is typically much smaller than the full range. The optimization only applies when ALL arrays indexed by the bound variable are large (> 64 elements, i.e., not field-sensitive). For field-sensitive arrays, the bv_cache contains individual element symbols rather than index_exprt entries, so the Ye/de Moura approach cannot find ground terms. In that case, eager instantiation is used as before. As a safety net, finish_eager_conversion_quantifiers falls back to eager instantiation when the Ye/de Moura path returns no result, and collect_ground_indices adds all indices for arrays with known constant size when the cache scan finds nothing. Measured impact on PQC proofs (CaDiCaL SAT backend): poly_reduce: 4816 MB / 131s -> 557 MB / 31s (8.6x mem, 4.2x speed) poly_chknorm: 133 MB / 102s -> 102 MB / 0.8s (128x speed) mlkem/poly_add: 7099 MB / timeout -> 367 MB / 25s (was TIMEOUT) ntt_layer: 17862 MB / timeout -> 718 MB / 31s (was TIMEOUT) invntt_layer: 28204 MB / OOM -> 2148 MB / 137s (was OOM) mlkem/ntt_b_block: 7201 MB / timeout -> 477 MB / 120s (was TIMEOUT) mlkem/ntt_layer: 9352 MB / timeout -> 406 MB / 15s (was TIMEOUT)

… check Remove the duplicate field-sensitive array fallback block in collect_ground_indices (was added twice by mistake). Also remove the SSA symbol check from eager_quantifier_instantiation that was added to prevent under-instantiation for array literals, as the array literal check in collect_ground_indices already handles this case correctly. The core optimization remains: when a quantifier's bound variable indexes large (>64 element) arrays and the range is >=16, skip eager full-range instantiation and defer to the Ye/de Moura complete instantiation path.

…tantiation The Ye/de Moura complete instantiation optimization collects ground index terms from the bitvector cache to instantiate quantifiers. Two sources of non-ground terms can pollute this collection: 1. boolbvt::fresh_binding() creates boolbvt::scope::N::id symbols when converting quantifier bodies. These represent bound variables and are not ground terms. 2. goto_symext::rewrite_quantifiers() creates free SSA variables when rewriting universal quantifiers in assertions. These free variables appear in the cache as index terms but are not ground — using them for instantiation leads to incomplete instantiation and spurious VERIFICATION FAILED results. This commit addresses both sources: - Filters cache entries whose index contains a boolbvt::scope:: fresh binding symbol (source 1). - For each non-literal bounded-array context (size <= 256), always adds constant indices 0..size-1 to the ground index set. This ensures complete instantiation even when the cache contains only non-ground symbolic terms from rewrite_quantifiers (source 2). Array literal contexts already have their full index range added separately.

tautschnig and others added 28 commits February 25, 2026 20:10

Eager quantifier elimination: support empty ranges

69e12ec

conjunction(...) and disjunction(...) helper functions produce the appropriate result for empty operand sequences. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Quantifier instantiation via simplistic E-matching

48f4060

Use E-matching on index expressions to instantiate quantifiers when eager instantiation (aka enumerative instantiation) cannot be applied. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Upgrade arrays-uf-always test from KNOWNBUG to CORE

19ed621

The arrays.cpp fix now handles member expressions with arbitrary struct operands, so the test no longer crashes and can be a CORE test. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

tautschnig self-assigned this Mar 28, 2026

tautschnig changed the title ~~[WIP] Quantifiers eleminiation work~~ [WIP] Quantifier eleminiation work Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Quantifier eleminiation work#8904

[WIP] Quantifier eleminiation work#8904
tautschnig wants to merge 28 commits intodiffblue:developfrom
tautschnig:quantifiers-backup

tautschnig commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tautschnig commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants