Skip to content

Merge upstream v1.0.0 beta#3

Open
gilles-peskine-arm wants to merge 86 commits intoMbed-TLS:developmentfrom
gilles-peskine-arm:merge-upstream-v1.0.0-beta
Open

Merge upstream v1.0.0 beta#3
gilles-peskine-arm wants to merge 86 commits intoMbed-TLS:developmentfrom
gilles-peskine-arm:merge-upstream-v1.0.0-beta

Conversation

@gilles-peskine-arm
Copy link
Copy Markdown

@gilles-peskine-arm gilles-peskine-arm commented Apr 13, 2026

Merge version 1.0.0-beta, released mid-March. We were previously one commit before 1.1.0-alpha2 released in early January.

There are no changes to mldsa/mldsa_native.h or mldsa/mldsa_native_config.h, so I don't expect any change to be needed in our code.

PR checklist

  • This pull request is must be on the Mbed-TLS fork, not upstream, because: updating our fork
  • TF-PSA-Crypto PR TODO

mkannwischer and others added 30 commits January 20, 2026 04:27
This commit adds the missing bmi2 flag to the ML-DSA-87 yml.
bmi2 is currently not strictly needed as asm is not using any bmi2
instructions, but in practice there are no CPUs that have AVX2, but not bmi2
so we might as well keep it as we may use later (as in rejection sampling
in mlkem-native) or the compiler may use it.

This commit also cleans up ordering of Darwin and Linux in the ymls.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
The test_rng_fail binaries were not linking against EXTRA_SOURCES
because RNG_FAIL_TESTS was not included in ALL_TESTS. This caused
failures on baremetal platforms that require platform-specific
source files defined in EXTRA_SOURCES.

Add RNG_FAIL_TESTS to ALL_TESTS so the existing dependency rules
automatically include EXTRA_SOURCES for rng_fail test binaries.
Skip inclusion of notrandombytes.c since rng_fail uses a custom
rng wrapper and `#include`s notrandombytes.c directly.

Finally, run `rng_fail` in the baremetal CI.

Fixes pq-code-package#899

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Previously, the RNG failure test relied on a custom configuration
and thus had to be skipped in the custom config action.

Now, the RNG failure test no longer relies on a custom config
and can hence be included in the custom config action.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
- This commit adds the MLD_ namespace prefix to all macros in
  KeccakP_1600_times4_SIMD256.c.

- This commit also adds the mld_ prefix to the following static symbols
  in KeccakP_1600_times4_SIMD256.c, similar with:
  pq-code-package/mlkem-native#1407

  - keccakf1600RoundConstants
  - rho8
  - rho56

- Without these changes, building mlkem-native together with
  mldsa-native that defines the same symbols in a
  single compilation unit can lead to symbol clashes.

Signed-off-by: willieyz <willie.zhao@chelpis.com>
open-quantum-safe/liboqs#2284 has been merged.
This commit switches the CI tests to the upstream integration to ensure we
do not break it in the future.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
AWS-LC merged aws/aws-lc#2902 which integrated the
C backend of mldsa-native - optimized backends will follow later.
liboqs merged open-quantum-safe/liboqs#2284 which
integrates the C, x86_64, and AArch64 backend into liboqs (replacing the
pqcrystals implementation).

This commit updates the README accordingly.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
MLD_MUST_CHECK_RETURN_VALUE was misplaced at randombytes.
This commit moves it to mlk_randombytes.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Liam Fisher <liam.louis.fisher@gmail.com>
Add linting for shell scripts in the scipts directory as well as any
*.sh file accross the project using shellcheck.

Signed-off-by: Andreas Hatziiliou <andreas.hatziiliou@savoirfairelinux.com>
Fix errors brought up by the linter.

Signed-off-by: Andreas Hatziiliou <andreas.hatziiliou@savoirfairelinux.com>
Make the benchmark parameters (NWARMUP, NITERATIONS, NTESTS)
configurable via CFLAGS by wrapping them in #ifndef guards and
renaming to MLD_BENCHMARK_NWARMUP, MLD_BENCHMARK_NITERATIONS,
and MLD_BENCHMARK_NTESTS.

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Add PMU-based cycle counting support for Armv8.1-M Cortex-M processors.
This uses the CMSIS PMU APIs for portable cycle counter access.

Signed-off-by: willieyz <willie.zhao@chelpis.com>
The cycle counts will be zero, but it still tests the PMU code builds.

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Add unit tests for keccakf1600 x1 and x4 permutation functions that
compare native implementations against the C reference implementation.

- test_keccakf1600_permute: tests x1 permutation when
  MLD_USE_FIPS202_X1_NATIVE
- test_keccakf1600x4_permute: tests x4 permutation when
  MLD_USE_FIPS202_X4_NATIVE
- Expose mld_keccakf1600_permute_c via MLD_STATIC_TESTABLE for testing

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Add the first HOL Light functional correctness proof for an aarch64
ML-DSA function: poly_caddq, which performs conditional addition of q
to reduce polynomial coefficients from (-q, q) to [0, q).

This commit includes:
- HOL Light proof infrastructure for aarch64 (Makefile, utilities)
- Functional correctness proof showing the assembly computes
  `ival(x i) rem &8380417` for each coefficient
- autogen support for generating aarch64 HOL Light assembly
- Updates of both HOL-Light and s2n-bignum
- HOL-Light CI is extended and closely aligned with mlkem-native

- Resolves pq-code-package#493

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
- The floor() in floor((f + 127) >> 7) was somewhat unecessary as
  the usual semantic for the right-shift operator (>>) has integer
  output anyway. Seeing as the right-shift operator is not used in other
  explanation comments, we decided to rewrite it as division by 2^7 for
  better consistency.
- The bound of f1'' is correct but the proof was misleading. The new
  proof should be clearer.

Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Based on Hanno Becker's proposal, the new explanation explains how
round-(f1' / B) can be replaced with rounding-mulhi, regardless of the
type of rounding used in the mulhi. In addition, by bounding the
approximation error to be strictly less than 1 / B, the exactness of the
Barrett division is also justified.

To avoid excessive repetition, we prove the GAMMA2 = (Q-1)/88 case in
the C implementation, remark how the same proof can be adapted to the
GAMMA2 = (Q-1)/32 case, and finally refer to them when explaining the
AVX2 implementation.

Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Adapt the new explanation to the NEON implementation.

Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Formalize the mathematical argument that Barrett division correctly
computes round-half-down division for the parameters used in ML-DSA's
Decompose routine (both C/AVX2 and AArch64 variants).

Note: Proofs operate on unbounded integers, not fixed-width words.
Connecting to actual implementations requires additional word-level
and source-level reasoning.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Bumps [actions/checkout](https://github.com/actions/checkout) from 5.0.0 to 6.0.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@08c6903...de0fac2)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

- Also update the actions/checkout's version in
  .github/actions/cbmc/action.yml manually.

Signed-off-by: dependabot[bot] <support@github.com>
Core differences to mlkem-native:
 - We only test against a fixed version, not against most recent main (the test
   against main does not seem to reveal interesting problems so far anyway)
 - We fix a typo integration-awslc.yml matrix.flags -> matrix.test.flags

I have opened pq-code-package/mlkem-native#1535 to
consolidate those.

- Resolves pq-code-package#902

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add CI workflow to verify that mldsa-native builds correctly
within the OpenTitan (expo) build system. Only builds the test
binary without running the Verilator simulation, as ML-DSA is
too slow for simulation in CI.

- Resolves pq-code-package#886

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
… bytes)

The ACVP specification (https://pages.nist.gov/ACVP/draft-celi-acvp-ml-dsa.html)
says that the message length is at most 65536 bits (8192 bytes).
We incorrectly used 65536 bytes which results in excessive memory allocation
in acvp_mldsa.c  which can be problematic on devices with limited stack/memory.
This commit corrects the maximum length.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
- bench.yml: Remove non-existent matrix.opt.name from job name
- ci.yml: Remove ec2_ami_id pass-through (no matrix entry defines it)

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Rename custom-shell to custom_shell to match the input name defined
in the bench action.

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Signed-off-by: willieyz <willie.zhao@chelpis.com>
Register custom runner labels (macos-15-intel, pqcp-arm64,
pqcp-x64) so actionlint recognizes them as valid self-hosted runners.
(note: pqcp-ppc64 did not use in mldsa-native yet, remove here)

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Quote shell variables to prevent globbing and word splitting (SC2086),
quote command substitution (SC2046), and fix redirect ordering (SC2069).

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Pass gh_token to setup-aws-lc, and
setup-oqs action calls that were missing the required input.

Signed-off-by: willieyz <willie.zhao@chelpis.com>
mkannwischer and others added 18 commits March 3, 2026 14:36
Due to relocation, we have to temporarily disable CI that is running on most
of our self-hosted runners:
 - Arm Cortex-A72 (Raspberry Pi 4)
 - Arm Cortex-A55 (Snapdragon 888)
 - SpacemiT K1 8 (Banana Pi F3)
 -  Mac Mini (M1, 2020)

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Abhi S <saxena_abhinav@icloud.com>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
- Fix ensures clause to account for all three return values (-1, 0, 1)
  and make the stronger postcondition conditional on non-fallback.
- Use mld_ct_cmask_nonzero_u32() in mld_poly_chknorm() caller.
- Rename REDUCE32_RANGE_MAX to MLD_REDUCE32_RANGE_MAX.
- Fix stale comments referencing ntt.h instead of poly.h.

Signed-off-by: Rod Chapman <rodchap@amazon.com>
polyz_unpack_{17,19}_asm are the only AArch64 assembly routines that
load from potentially unaligned user-provided buffers (the signature
passed to verify). All other assembly (NTT, rej_uniform, etc.) operates
on aligned internal buffers.

The ld1 with .4s/.2d element sizes and the ldr s/d instructions used
here require 4/8-byte alignment on Device memory (bare-metal AArch64
without MMU). Replace with .16b element sizes and ld1 {v.8b}, which
do not require alignment.

With this fixed, remove the MLD_TEST_NO_UNALIGNED workaround from the
aarch64-virt baremetal platform so the unaligned-buffer functional test
runs on baremetal as well.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add decreases clauses to all loops in keccakf1600.c, fips202.c, and
fips202x4.c so that CBMC proves termination.
Add requires(r > 0) preconditions to keccak_absorb and
mld_keccak_absorb_once_x4, since termination of their rate-based loops
depends on a positive rate.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add decreases clauses to all loops in poly.c, poly_kl.c, polyvec.c,
packing.c, sign.c, and ct.h so that CBMC proves termination.

The rejection sampling while loops in poly.c (mld_poly_uniform,
mld_poly_uniform_4x), poly_kl.c (mld_poly_uniform_eta_4x,
mld_poly_uniform_eta, mld_poly_challenge) are intentionally left without
decreases clauses: their termination depends on the XOF producing enough
valid samples, which is a probabilistic property.

The inner loop of mld_unpack_hints in packing.c required an additional
invariant (j <= new_hint_count && new_hint_count <= MLDSA_OMEGA) to help
the SMT solver verify the decreases clause, since the loop bound is
derived from untrusted input rather than a compile-time constant.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add a check to scripts/check-contracts that every __loop__ annotation
contains a decreases clause for termination proofs. Rejection sampling
loops are excepted as they only terminate probabilistically.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
poly_challenge has three loops, two of which have a trivially bounded
loop condition. This commit adds corresponding `decreases` clauses,
leaving only the probablistically terminating loop without a clause.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Previously, check-contracts would only allow listing entire functions
as exceptions to the loop termination checker, thereby allowing _all_
loops in a function to remain without termination clauses.

This commit allows to specify how many loops in a function may be
left without a termination clause. This would have caught the case
of poly_challenge missing termination clauses for two of its three
loops, which the previous commit fixed.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Add a HOL Light functional correctness proof for the aarch64 ML-DSA
function poly_chknorm, which checks whether any polynomial coefficient
has absolute value >= a given bound.

This commit includes:
- Functional correctness proof showing the assembly computes
  `bitval(?i. i < 256 /\ abs(ival(x i)) >= ival bound)`
- autogen support for generating aarch64 HOL Light assembly
- Update of s2n-bignum

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
The 3-register ld1 {v0, v1, v2} loads 48 bytes but only 36 (polyz_unpack_17)
or 40 (polyz_unpack_19) are needed, causing a 12/8-byte overread past the
input buffer on the last iteration. Split into a 2-register ld1 for
the first 32 bytes plus an overlapping 1-register ld1 for the remainder.
The TBL indices for v2 are adjusted to account for v2's shifted load
offset

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Port the ML-DSA Forward NTT implementation and its HOL Light proof of
correctness from s2n-bignum to mldsa-native. The proof verifies the
AArch64 NEON NTT implementation at the object-code level, showing that
the output coefficients are congruent to the forward NTT of the input
modulo 8380417, with bounded output coefficients. A constant-time and
memory safety proof is also included.

New files:
- aarch64/mldsa/mldsa_ntt.S: Assembly derived from dev/aarch64_opt/src/ntt.S
- aarch64/proofs/mldsa_ntt.ml: HOL Light proof (MLDSA_NTT_CORRECT,
MLDSA_NTT_SUBROUTINE_CORRECT, and MLDSA_NTT_SUBROUTINE_SAFE theorems)
- aarch64/proofs/mldsa_specs.ml: Self-contained ML-DSA specifications and
congruence/bounds propagation infrastructure, ported from s2n-bignum's
common/mlkem_mldsa.ml with only ARM ML-DSA relevant definitions
- aarch64/proofs/subroutine_signatures.ml: ML-DSA subroutine signatures
for the safety proof infrastructure

Modified files:
- aarch64/proofs/aarch64_utils.ml: Add MEMORY_128_FROM_32_TAC
- aarch64/Makefile: Add mldsa_ntt to build targets

Signed-off-by: dkostic <dkostic@amazon.com>
Add CBMC contract for mld_ntt_asm matching the bounds from the
AArch64 NTT HOL Light proof (input: abs <= 8380416, output: abs <= 75423752).
Add corresponding CBMC proof for mld_ntt_native using the contract.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Consolidate the shared ML-DSA specifications and congruence/bounds
propagation infrastructure into a single file at
proofs/hol_light/common/mldsa_specs.ml, used by both AArch64 and
x86_64 proofs.

The merged file contains:
- Shared: bitreverse8, reorder, BITREVERSE8_CLAUSES, congruence/bounds
  infrastructure (CONGBOUND_WORD_*, ASM_CONGBOUND_RULE, etc.), SIMD
  simplification tactics
- x86_64-specific: AVX2 NTT ordering, mldsa_montred/barred/montmul
- AArch64-specific: arm_mldsa_forward_ntt, arm_mldsa_barmul

ASM_CONGBOUND_RULE now handles both arm_mldsa_barmul and
mldsa_montred/barred/montmul cases.

Signed-off-by: dkostic <dkostic@amazon.com>
Conflicts:
* `README.md`: keep our version, which is completely different.
@gilles-peskine-arm gilles-peskine-arm added the needs-review Every commit must be reviewed by at least two team members. label Apr 13, 2026
@gilles-peskine-arm gilles-peskine-arm added needs-reviewer This PR needs someone to pick it up for review priority-high High priority - will be reviewed soon size-xs Estimated task size: extra small (a few hours at most) labels Apr 13, 2026
@gilles-peskine-arm gilles-peskine-arm changed the base branch from main to development April 13, 2026 16:19
@gilles-peskine-arm gilles-peskine-arm removed the needs-reviewer This PR needs someone to pick it up for review label Apr 13, 2026
Copy link
Copy Markdown

@bjwtaylor bjwtaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a can of worms we want to keep closed, however a couple of things worth being aware of.

Comment thread mldsa/src/sign.c
}
ret = mld_sign_signature(sm, smlen, sm + MLDSA_CRYPTO_BYTES, mlen, ctx,
ctxlen, sk, context);
*smlen += mlen;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return a length if mld_sign_signature errors?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, ideally mld_sign should not set *smlen if it hasn't written that many bytes to sm. Also, the behavior is inconsistent among different functions, e.g. mld_sign_signature_extmu does set *siglen = 0 on RNG failure.

But this is very minor, in a function that we won't even use. So we aren't going to patch the code here. We will only patch mldsa-native if we have a critical security or business requirement. (And for security I would hope that we wouldn't disagree with upstream, and the only time we would make our own security patch is if we're backporting a fix or making our own quicker than upstream publishes theirs.)

So this should be a suggestion to the upstream project, not something we'll consider implementing.

Comment thread mldsa/src/sign.c
/* Common prefix: 0x00/0x01 || ctxlen || ctx */
prefix[0] = (hashalg == MLD_PREHASH_NONE) ? 0 : 1;
prefix[1] = (uint8_t)ctxlen;
if (ctxlen > 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check not NULL also, it's compliant to the API contract though still leaves the possibility to crash the code as supposed to a clean error?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean checking ctx != NULL? As you note, the API contract does not allow ctx == NULL unless ctxlen == 0, and the case where ctxlen == 0 is handled correctly (skipping the memcpy call, to avoid undefined behavior).

A C function cannot defend itself against invalid pointers. Handling invalid pointers that happen to be null is rarely useful since it's usually an unrecoverable anyway. It's better to let the platform's invalid pointer dereference mechanism handle it, e.g. a segfault. So the code is following best practices here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-review Every commit must be reviewed by at least two team members. priority-high High priority - will be reviewed soon size-xs Estimated task size: extra small (a few hours at most)

Projects

Development

Successfully merging this pull request may close these issues.