Skip to content

[WIP] Quantifier eleminiation work#8904

Draft
tautschnig wants to merge 28 commits intodiffblue:developfrom
tautschnig:quantifiers-backup
Draft

[WIP] Quantifier eleminiation work#8904
tautschnig wants to merge 28 commits intodiffblue:developfrom
tautschnig:quantifiers-backup

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

tautschnig and others added 28 commits February 25, 2026 20:10
conjunction(...) and disjunction(...) helper functions produce the
appropriate result for empty operand sequences.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Use E-matching on index expressions to instantiate quantifiers when
eager instantiation (aka enumerative instantiation) cannot be applied.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Replace the simplistic E-matching in instantiate_one_quantifier with a
complete instantiation approach based on:

  Yeting Ge and Leonardo de Moura, "Complete instantiation for quantified
  formulas in Satisfiability Modulo Theories", CAV 2009.

The paper shows that for essentially uninterpreted formulas (where
quantified variables only appear as arguments of uninterpreted functions),
the quantifier can be eliminated by instantiating with a finite set of
ground terms computed as the least fixed point of a system of set
constraints (ΔF). In CBMC's context, array accesses (index_exprt) play
the role of uninterpreted functions.

Key changes:

- find_index_contexts: Properly decomposes array index expressions into
  bound_var + offset form using pre-order traversal. Handles patterns
  var, cast(var), var+r, r+var, var-r. This implements the offset
  extension from Section 4 of the paper, where for f(x_i + r) we
  generate constraints S_{k,i} + r ⊆ A_{f,j} and A_{f,j} - r ⊆ S_{k,i}.

- collect_ground_indices: Collects ground index terms from both
  index_exprt (array reads) and with_exprt (array writes) in the
  bitvector cache, seeding the sets A_{f,j} from the paper.

- compute_instantiation_set: Computes the least fixed point of the set
  constraint system ΔF. Uses integer arithmetic when all offsets and
  ground indices are numeric constants (avoiding expression growth), with
  a symbolic fallback using simplify_expr. Bounded by max_iterations and
  max_terms to handle non-stratified (non-FEU) cases.

- arrays_match: SSA-aware array comparison using L1 object identifiers
  to match across SSA versions. For nested array accesses (e.g.,
  multi-dimensional arrays like a[i][j]), recursively matches the base
  array and compares simplified indices, since the pattern may contain
  unsimplified arithmetic (e.g., cast(0 % 2)) that is semantically
  equal to a constant in the cache.

Instantiation terms are sorted (via irept::operator<) before generating
the conjunction/disjunction to ensure deterministic SAT clause ordering.
Without this, MiniSat performance varies from sub-second to minutes on
the same UNSAT instance depending on hash-dependent iteration order,
which changes with file paths embedded in source locations.

This enables verification of properties that depend on relationships
between adjacent array elements, such as:
  forall j: arr[j+1] == arr[j] + 1
which the previous E-matching could not handle because it did not
generate the ground terms needed for the offset positions.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add two regression tests exercising the complete instantiation approach:

- Quantifiers-offset-instantiation: Tests the offset pattern arr[j+1]
  from Section 4 of Ge & de Moura (2009). The quantifier
    forall j < n-1: arr[j+1] == arr[j] + 1
  requires the set constraint fixed-point to generate ground terms for
  both arr[j] and arr[j+1] positions. Without the offset handling, the
  instantiation misses the terms needed to chain the equalities.

- Quantifiers-fixpoint-instantiation: Tests that the fixed-point
  computation correctly propagates through multiple iterations. The
  property arr[2] > 0 requires two steps of reasoning:
    j=0: arr[0]>0 => arr[1]>0
    j=1: arr[1]>0 => arr[2]>0
  This exercises the iterative nature of the ΔF fixed-point from
  Section 3 of the paper, where instantiating with one set of terms
  generates new ground terms that feed back into the next iteration.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add TIMEOUT 1200 to the set_tests_properties call in both copies of the
add_test_pl_profile macro (regression/CMakeLists.txt and
regression/libcprover-cpp/CMakeLists.txt). This ensures CTest will kill
and report as failed any regression test that exceeds 20 minutes,
preventing CI jobs from hanging indefinitely on tests that time out.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add a -t <secs> option (and TESTPL_TIMEOUT environment variable) to
test.pl for enforcing a wall-clock timeout on each individual test.

The implementation replaces the system() call in run() with fork/exec,
placing the child in its own process group (setpgrp). When the timeout
fires via Perl's alarm(), the entire process group is killed with
SIGKILL, ensuring all child processes (bash, cbmc, solvers) are cleaned
up. On timeout the test is marked as failed and a synthetic output file
is written so that the pattern-matching logic does not crash.

The timeout defaults to 0 (disabled), preserving existing behaviour
unless -t is explicitly passed or TESTPL_TIMEOUT is set.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add a KNOWNBUG regression test for the invariant violation in arrays.cpp
when --arrays-uf-always is used with code containing array-of-structs
where struct members are arrays accessed through a nondet index.

The arrays theory (collect_arrays, add_array_constraints) does not handle
member expressions where the struct operand is an index expression,
causing CBMC to crash with SIGABRT.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add a regression test for quantifier instantiation with a non-constant
upper bound. The test uses assume(forall r. (r < k) ==> t[r] < 10)
where k is a symbolic variable later constrained to 1, then asserts
t[0] < 10.

This currently fails on the SAT backend because the complete
instantiation algorithm does not find ground index terms when the
array in the quantifier body is an SSA array literal that does not
match the SSA symbol used in other array accesses.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add detailed analysis of two issues found during PQC proof benchmarking:

1. invariant_violation_analysis.md: --arrays-uf-always crash on
   array-of-structs patterns (pre-existing CBMC bug in arrays.cpp)

2. sat_failure_analysis.md: SAT/SMT disagreement caused by incomplete
   quantifier instantiation with variable bounds (bug in the complete
   instantiation algorithm on this branch)

Both include standalone minimal C reproducers in scripts/reproducers/.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
When a forall expression has a variable (non-constant) upper bound,
the eager quantifier instantiation fails and falls through to the
complete instantiation algorithm. The complete instantiation collects
ground index terms from the bv_cache by matching array expressions.

After SSA renaming, the array in the quantifier body becomes an array
literal (array_exprt, e.g., {t#1[0], t#1[1]}), while array accesses
elsewhere use SSA symbols (e.g., t#1). The arrays_match function
cannot match these structurally different representations, causing
collect_ground_indices to return an empty set. With no ground indices,
the forall is instantiated with zero terms — effectively becoming
true. When this occurs in an assume, the assumption is vacuous,
leading to spurious counterexamples.

Fix: when a context's array is an array_exprt (fixed-size array
literal), add all valid indices 0..size-1 as ground indices directly.
This is sound because these are exactly the valid indices for the
array, and instantiating with all of them is complete. A cap of 256
elements prevents blowup on large arrays.

This fixes SAT/SMT disagreements on PQC proofs (mlkem-native and
mldsa-native) that use DFCC loop contracts with forall invariants
where the loop counter is the quantifier bound.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Re-run the full mlkem-native (153 proofs) and mldsa-native (175 proofs)
benchmark with the quantifier instantiation fix (2ac1dee), 30GB
per-proof memory limit, and 2 parallel processes.

Key result: ZERO FAILURES across all backends. The quantifier fix
eliminated all 16 SAT/SMT disagreements from the previous run (4 on
both SAT solvers + 12 MiniSat-only).

The benchmark script now supports --parallel N for running N proofs
concurrently, and --memlimit MB for per-proof memory limits.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
collect_arrays() and add_array_constraints() restricted member
expressions to only those whose struct operand was a symbol or
nondet_symbol. With --arrays-uf-always, member expressions can have
more complex struct operands (index, nested member, struct literal)
when arrays are embedded in structs. These are base array expressions
that need no special constraint generation, just like symbols.

The overly strict invariant caused crashes on tests like
Array_operations2, array-bug-6230, byte_update18, and bounds_check1
when run with --arrays-uf-always.

From: Michael Tautschnig <tautschn@amazon.com>
Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
The arrays.cpp fix now handles member expressions with arbitrary struct
operands, so the test no longer crashes and can be a CORE test.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
The script only added tools/bin to PATH when --install-deps was
explicitly passed. Tools installed in a previous run (Z3, Bitwuzla)
were not found when using --skip-clone --skip-build without
--install-deps. Fix by always checking for and adding the tools/bin
directory to PATH if it exists.

Also adds --parallel N option for running N proofs concurrently.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Re-run the full benchmark (153 mlkem + 175 mldsa proofs) with:
- Arrays fix: zero invariant violations (was 3-7 crashes)
- Bitwuzla: all 54 previously-skipped mlkem proofs now run (153/153 SMT success)
- 30 GB memory limit, 2 parallel processes

Results:
- mlkem: 153 SMT success, 142 CaDiCaL success, 147 MiniSat success
- mldsa: 174 SMT success, 144 CaDiCaL success, 154 MiniSat success
- 2 remaining SAT failures (polyveck_add, polyvec_matrix_pointwise_montgomery)
  are --arrays-uf-always soundness issues, not crashes
- Memory: SMT avg 119MB, CaDiCaL avg 826MB, MiniSat avg 539MB
- Peak: 21.2 GB (kem_dec/CaDiCaL), well within 30GB limit

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
With --arrays-uf-always, indexing an array of structs containing array
members through a nondeterministic index produces a spurious counterexample
on the SAT backend. SMT + --arrays-uf-always and SAT without it both
verify successfully.

Minimal reproducer: 9-line C program with struct S { int d[1]; },
a[2] initialized to 1, nondet index, assertion a[i].d[0] == 1.

Trigger requires all three: --arrays-uf-always, struct with array member,
nondet outer index. Plain int members do not trigger.

Affects mldsa-native proofs polyveck_add and
polyvec_matrix_pointwise_montgomery (both SAT solvers).

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Apply arrays2.patch: in convert_index, when the array operand is a member
expression whose struct operand is not a symbol/nondet_symbol and the array
type has a known finite size, bypass the array decision procedure and fall
through to the bounded-array encoding.

This fixes the spurious counterexample for small structs (array members
with <=64 elements). Upgrade the regression test from KNOWNBUG to CORE.

A separate KNOWNBUG test is added for large structs (array members with
65+ elements) where the fix is incomplete: the outer array-of-structs
access still goes through the array theory and fails to properly constrain
the struct bitvector. This is the root cause of the remaining SAT failures
on mldsa-native proofs polyveck_add and polyvec_matrix_pointwise_montgomery.

Co-authored-by: Michael Tautschnig <tautschn@amazon.com>
Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
When --arrays-uf-always is used, symbol arrays have two disconnected
representations: map literals (used in struct contexts) and element-wise
free variables (from the array theory). The array theory constrains the
element-wise variables but not the map literals, causing spurious
counterexamples when a symbol array appears inside a struct that is an
element of another array.

Fix: in boolbv_set_equality_to_true, for symbol arrays with known finite
size up to MAX_FLATTENED_ARRAY_SIZE elements, connect the map literals to
the element-wise bitvectors via convert_bv on index expressions for each
element. This ensures array theory constraints propagate to struct contexts.

Upgrade arrays-uf-always-large-struct-soundness from KNOWNBUG to CORE.

Co-authored-by: Michael Tautschnig <tautschn@amazon.com>
Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Re-run the full benchmark (153 mlkem + 175 mldsa proofs) with all fixes:
- Quantifier instantiation for array literals in SSA
- Accept member expressions with arbitrary struct operands
- Bypass array theory for member-of-index expressions
- Connect array symbol map literals to element-wise constraints

Results: 0 FAILURE across all 328 proofs x 3 backends.
- mlkem: 153 SMT success, 142 CaDiCaL success, 147 MiniSat success
- mldsa: 174 SMT success, 144 CaDiCaL success, 154 MiniSat success
- Previously-failing polyveck_add and polyvec_matrix_pointwise_montgomery
  now TIMEOUT instead of FAILURE (no more spurious counterexamples)
- Memory: SMT avg 119MB, CaDiCaL avg 773MB, MiniSat avg 524MB

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
timeout_oom_analysis.md: Root cause analysis of 46 SAT timeout/OOM proofs.
Three categories: Keccak+no-AFS array theory blowup (5), quantifier
instantiation expansion (41), large matrix operations (1). 84% of timeouts
are memory-bound; 9 low-memory MiniSat timeouts are CPU-bound.

pqc_benchmark_context.md: Compacted context document for resuming this
work in future sessions. Contains branch state, all bugs found/fixed,
final benchmark results, file locations, and commit guidelines.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Proposes a CEGAR-based lazy quantifier instantiation loop analogous
to --refine-arrays. Key findings from the deep-dive:

- Memory is overwhelmingly in the SAT solver clause database (178x
  blowup for poly_reduce: 27MB conversion vs 4.8GB solving)
- 41 quantifier-heavy PQC proofs generate 10K+ ground instances
  from ~40 forall occurrences × 256-element arrays
- Most instances are irrelevant; the solver needs only O(1) per
  quantifier in typical cases

The design integrates into bv_refinementt's existing CEGAR loop,
adding ~150 lines of new code.
Add a CEGAR-based lazy quantifier instantiation option analogous to
--refine-arrays. When enabled, quantified formulas are not eagerly
expanded into conjunctions/disjunctions over all ground indices.
Instead, each quantifier gets a placeholder literal, and the CEGAR
loop checks whether the SAT model satisfies each quantifier. Only
violated instances are added incrementally.

Architecture:
- boolbv.h: add lazy_quantifiers flag and make
  finish_eager_conversion_quantifiers virtual
- boolbv_quantifier.cpp: skip eager instantiation when lazy
- bv_refinement.h: add refine_quantifiers config and overrides
- refine_quantifiers.cpp: implement finish_eager_conversion_quantifiers
  (freeze variables, set placeholder literals) and
  quantifiers_overapproximated (evaluate model, add violated instances)
- solver_factory: wire --refine-quantifiers option through

Evaluation uses a hybrid strategy: first try get()+simplify_expr()
which is cheap and works for field-sensitive arrays, then fall back
to convert()+l_get() for array-theory arrays.

Measured impact on PQC proofs:
- poly_reduce: 4.8GB eager → 206MB refine (23x memory reduction)
- poly_chknorm: 133MB eager → 45MB refine (3x memory reduction)

Regression tests cover: basic forall, symbolic index, expected
failure, multiple quantifiers, no-quantifier no-op, signed bounds.
Quantifiers with variable bounds (e.g., forall k. k < i => body(k)
where i is symbolic) cannot be lazily refined because the refinement
loop cannot enumerate their instances. Previously, all quantifiers
were deferred, causing soundness failures on DFCC proofs where loop
invariants use variable-bound quantifiers.

Fix: partition quantifiers into two groups:
- Constant bounds (k < 256): deferred for lazy CEGAR refinement
- Variable bounds (k < i): eagerly instantiated via base class

This fixes VERIFICATION FAILED on mldsa poly_reduce which has both
constant-bound precondition quantifiers and variable-bound loop
invariant quantifiers.

Also reorder the file to place static helpers before class methods.
Add measured impact table showing --refine-quantifiers helps for
large proofs (poly_reduce: 4.8GB→2.2GB) but adds overhead for
smaller proofs. Document the finding that all PQC quantifiers are
in VCC assumptions, and the partition strategy for variable-bound
quantifiers.
…nd CaDiCaL

Four fixes for crashes discovered during PQC benchmark runs:

1. Guard against quantifiers with non-symbol bound variables (e.g.,
   typecast expressions). These are sent to eager instantiation instead
   of crashing with an invariant violation in to_quantifier_expr().

2. Defer all convert() calls in quantifiers_overapproximated() to the
   end, after all get() evaluations complete. This prevents convert()
   from adding clauses that invalidate CaDiCaL's satisfied state,
   which caused a fatal error in val().

3. Skip simplify_expr() on expressions containing nested quantifiers
   (which may have non-symbol variables), avoiding an invariant
   violation in the simplifier.

4. Move quantifiers_overapproximated() before arrays_overapproximated()
   in the CEGAR loop so that get() calls see a valid SAT state.
…arge arrays

When a quantifier has constant bounds and the bound variable is used as
an array index, skip the eager full-range instantiation (which generates
one instance per value in [lb, ub]) and let the quantifier fall through
to the Ye/de Moura complete instantiation path. This path only generates
instances for the ground index terms actually present in the formula,
which for loop-invariant proofs is typically much smaller than the full
range.

The optimization only applies when ALL arrays indexed by the bound
variable are large (> 64 elements, i.e., not field-sensitive). For
field-sensitive arrays, the bv_cache contains individual element symbols
rather than index_exprt entries, so the Ye/de Moura approach cannot find
ground terms. In that case, eager instantiation is used as before.

As a safety net, finish_eager_conversion_quantifiers falls back to eager
instantiation when the Ye/de Moura path returns no result, and
collect_ground_indices adds all indices for arrays with known constant
size when the cache scan finds nothing.

Measured impact on PQC proofs (CaDiCaL SAT backend):
  poly_reduce:       4816 MB / 131s -> 557 MB / 31s  (8.6x mem, 4.2x speed)
  poly_chknorm:       133 MB / 102s -> 102 MB / 0.8s (128x speed)
  mlkem/poly_add:    7099 MB / timeout -> 367 MB / 25s (was TIMEOUT)
  ntt_layer:        17862 MB / timeout -> 718 MB / 31s (was TIMEOUT)
  invntt_layer:     28204 MB / OOM -> 2148 MB / 137s (was OOM)
  mlkem/ntt_b_block: 7201 MB / timeout -> 477 MB / 120s (was TIMEOUT)
  mlkem/ntt_layer:   9352 MB / timeout -> 406 MB / 15s (was TIMEOUT)
… check

Remove the duplicate field-sensitive array fallback block in
collect_ground_indices (was added twice by mistake). Also remove
the SSA symbol check from eager_quantifier_instantiation that was
added to prevent under-instantiation for array literals, as the
array literal check in collect_ground_indices already handles this
case correctly.

The core optimization remains: when a quantifier's bound variable
indexes large (>64 element) arrays and the range is >=16, skip
eager full-range instantiation and defer to the Ye/de Moura
complete instantiation path.
…tantiation

The Ye/de Moura complete instantiation optimization collects ground index
terms from the bitvector cache to instantiate quantifiers. Two sources of
non-ground terms can pollute this collection:

1. boolbvt::fresh_binding() creates boolbvt::scope::N::id symbols when
   converting quantifier bodies. These represent bound variables and are
   not ground terms.

2. goto_symext::rewrite_quantifiers() creates free SSA variables when
   rewriting universal quantifiers in assertions. These free variables
   appear in the cache as index terms but are not ground — using them
   for instantiation leads to incomplete instantiation and spurious
   VERIFICATION FAILED results.

This commit addresses both sources:

- Filters cache entries whose index contains a boolbvt::scope:: fresh
  binding symbol (source 1).

- For each non-literal bounded-array context (size <= 256), always adds
  constant indices 0..size-1 to the ground index set. This ensures
  complete instantiation even when the cache contains only non-ground
  symbolic terms from rewrite_quantifiers (source 2). Array literal
  contexts already have their full index range added separately.
@tautschnig tautschnig self-assigned this Mar 28, 2026
@tautschnig tautschnig changed the title [WIP] Quantifiers eleminiation work [WIP] Quantifier eleminiation work Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants