Merge upstream v1.0.0 beta#3
Merge upstream v1.0.0 beta#3gilles-peskine-arm wants to merge 86 commits intoMbed-TLS:developmentfrom
Conversation
This commit adds the missing bmi2 flag to the ML-DSA-87 yml. bmi2 is currently not strictly needed as asm is not using any bmi2 instructions, but in practice there are no CPUs that have AVX2, but not bmi2 so we might as well keep it as we may use later (as in rejection sampling in mlkem-native) or the compiler may use it. This commit also cleans up ordering of Darwin and Linux in the ymls. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
The test_rng_fail binaries were not linking against EXTRA_SOURCES because RNG_FAIL_TESTS was not included in ALL_TESTS. This caused failures on baremetal platforms that require platform-specific source files defined in EXTRA_SOURCES. Add RNG_FAIL_TESTS to ALL_TESTS so the existing dependency rules automatically include EXTRA_SOURCES for rng_fail test binaries. Skip inclusion of notrandombytes.c since rng_fail uses a custom rng wrapper and `#include`s notrandombytes.c directly. Finally, run `rng_fail` in the baremetal CI. Fixes pq-code-package#899 Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Previously, the RNG failure test relied on a custom configuration and thus had to be skipped in the custom config action. Now, the RNG failure test no longer relies on a custom config and can hence be included in the custom config action. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
- This commit adds the MLD_ namespace prefix to all macros in KeccakP_1600_times4_SIMD256.c. - This commit also adds the mld_ prefix to the following static symbols in KeccakP_1600_times4_SIMD256.c, similar with: pq-code-package/mlkem-native#1407 - keccakf1600RoundConstants - rho8 - rho56 - Without these changes, building mlkem-native together with mldsa-native that defines the same symbols in a single compilation unit can lead to symbol clashes. Signed-off-by: willieyz <willie.zhao@chelpis.com>
open-quantum-safe/liboqs#2284 has been merged. This commit switches the CI tests to the upstream integration to ensure we do not break it in the future. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
AWS-LC merged aws/aws-lc#2902 which integrated the C backend of mldsa-native - optimized backends will follow later. liboqs merged open-quantum-safe/liboqs#2284 which integrates the C, x86_64, and AArch64 backend into liboqs (replacing the pqcrystals implementation). This commit updates the README accordingly. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
MLD_MUST_CHECK_RETURN_VALUE was misplaced at randombytes. This commit moves it to mlk_randombytes. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Liam Fisher <liam.louis.fisher@gmail.com>
Add linting for shell scripts in the scipts directory as well as any *.sh file accross the project using shellcheck. Signed-off-by: Andreas Hatziiliou <andreas.hatziiliou@savoirfairelinux.com>
Fix errors brought up by the linter. Signed-off-by: Andreas Hatziiliou <andreas.hatziiliou@savoirfairelinux.com>
Make the benchmark parameters (NWARMUP, NITERATIONS, NTESTS) configurable via CFLAGS by wrapping them in #ifndef guards and renaming to MLD_BENCHMARK_NWARMUP, MLD_BENCHMARK_NITERATIONS, and MLD_BENCHMARK_NTESTS. Signed-off-by: willieyz <willie.zhao@chelpis.com>
Add PMU-based cycle counting support for Armv8.1-M Cortex-M processors. This uses the CMSIS PMU APIs for portable cycle counter access. Signed-off-by: willieyz <willie.zhao@chelpis.com>
The cycle counts will be zero, but it still tests the PMU code builds. Signed-off-by: willieyz <willie.zhao@chelpis.com>
Add unit tests for keccakf1600 x1 and x4 permutation functions that compare native implementations against the C reference implementation. - test_keccakf1600_permute: tests x1 permutation when MLD_USE_FIPS202_X1_NATIVE - test_keccakf1600x4_permute: tests x4 permutation when MLD_USE_FIPS202_X4_NATIVE - Expose mld_keccakf1600_permute_c via MLD_STATIC_TESTABLE for testing Signed-off-by: willieyz <willie.zhao@chelpis.com>
Add the first HOL Light functional correctness proof for an aarch64 ML-DSA function: poly_caddq, which performs conditional addition of q to reduce polynomial coefficients from (-q, q) to [0, q). This commit includes: - HOL Light proof infrastructure for aarch64 (Makefile, utilities) - Functional correctness proof showing the assembly computes `ival(x i) rem &8380417` for each coefficient - autogen support for generating aarch64 HOL Light assembly - Updates of both HOL-Light and s2n-bignum - HOL-Light CI is extended and closely aligned with mlkem-native - Resolves pq-code-package#493 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
- The floor() in floor((f + 127) >> 7) was somewhat unecessary as the usual semantic for the right-shift operator (>>) has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency. - The bound of f1'' is correct but the proof was misleading. The new proof should be clearer. Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Based on Hanno Becker's proposal, the new explanation explains how round-(f1' / B) can be replaced with rounding-mulhi, regardless of the type of rounding used in the mulhi. In addition, by bounding the approximation error to be strictly less than 1 / B, the exactness of the Barrett division is also justified. To avoid excessive repetition, we prove the GAMMA2 = (Q-1)/88 case in the C implementation, remark how the same proof can be adapted to the GAMMA2 = (Q-1)/32 case, and finally refer to them when explaining the AVX2 implementation. Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Adapt the new explanation to the NEON implementation. Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Formalize the mathematical argument that Barrett division correctly computes round-half-down division for the parameters used in ML-DSA's Decompose routine (both C/AVX2 and AArch64 variants). Note: Proofs operate on unbounded integers, not fixed-width words. Connecting to actual implementations requires additional word-level and source-level reasoning. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Bumps [actions/checkout](https://github.com/actions/checkout) from 5.0.0 to 6.0.2. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@08c6903...de0fac2) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.2 dependency-type: direct:production update-type: version-update:semver-major ... - Also update the actions/checkout's version in .github/actions/cbmc/action.yml manually. Signed-off-by: dependabot[bot] <support@github.com>
Core differences to mlkem-native: - We only test against a fixed version, not against most recent main (the test against main does not seem to reveal interesting problems so far anyway) - We fix a typo integration-awslc.yml matrix.flags -> matrix.test.flags I have opened pq-code-package/mlkem-native#1535 to consolidate those. - Resolves pq-code-package#902 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add CI workflow to verify that mldsa-native builds correctly within the OpenTitan (expo) build system. Only builds the test binary without running the Verilator simulation, as ML-DSA is too slow for simulation in CI. - Resolves pq-code-package#886 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
… bytes) The ACVP specification (https://pages.nist.gov/ACVP/draft-celi-acvp-ml-dsa.html) says that the message length is at most 65536 bits (8192 bytes). We incorrectly used 65536 bytes which results in excessive memory allocation in acvp_mldsa.c which can be problematic on devices with limited stack/memory. This commit corrects the maximum length. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
- bench.yml: Remove non-existent matrix.opt.name from job name - ci.yml: Remove ec2_ami_id pass-through (no matrix entry defines it) Signed-off-by: willieyz <willie.zhao@chelpis.com>
Rename custom-shell to custom_shell to match the input name defined in the bench action. Signed-off-by: willieyz <willie.zhao@chelpis.com>
Signed-off-by: willieyz <willie.zhao@chelpis.com>
Register custom runner labels (macos-15-intel, pqcp-arm64, pqcp-x64) so actionlint recognizes them as valid self-hosted runners. (note: pqcp-ppc64 did not use in mldsa-native yet, remove here) Signed-off-by: willieyz <willie.zhao@chelpis.com>
Quote shell variables to prevent globbing and word splitting (SC2086), quote command substitution (SC2046), and fix redirect ordering (SC2069). Signed-off-by: willieyz <willie.zhao@chelpis.com>
Pass gh_token to setup-aws-lc, and setup-oqs action calls that were missing the required input. Signed-off-by: willieyz <willie.zhao@chelpis.com>
Due to relocation, we have to temporarily disable CI that is running on most of our self-hosted runners: - Arm Cortex-A72 (Raspberry Pi 4) - Arm Cortex-A55 (Snapdragon 888) - SpacemiT K1 8 (Banana Pi F3) - Mac Mini (M1, 2020) Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Abhi S <saxena_abhinav@icloud.com>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
- Fix ensures clause to account for all three return values (-1, 0, 1) and make the stronger postcondition conditional on non-fallback. - Use mld_ct_cmask_nonzero_u32() in mld_poly_chknorm() caller. - Rename REDUCE32_RANGE_MAX to MLD_REDUCE32_RANGE_MAX. - Fix stale comments referencing ntt.h instead of poly.h. Signed-off-by: Rod Chapman <rodchap@amazon.com>
polyz_unpack_{17,19}_asm are the only AArch64 assembly routines that
load from potentially unaligned user-provided buffers (the signature
passed to verify). All other assembly (NTT, rej_uniform, etc.) operates
on aligned internal buffers.
The ld1 with .4s/.2d element sizes and the ldr s/d instructions used
here require 4/8-byte alignment on Device memory (bare-metal AArch64
without MMU). Replace with .16b element sizes and ld1 {v.8b}, which
do not require alignment.
With this fixed, remove the MLD_TEST_NO_UNALIGNED workaround from the
aarch64-virt baremetal platform so the unaligned-buffer functional test
runs on baremetal as well.
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add decreases clauses to all loops in keccakf1600.c, fips202.c, and fips202x4.c so that CBMC proves termination. Add requires(r > 0) preconditions to keccak_absorb and mld_keccak_absorb_once_x4, since termination of their rate-based loops depends on a positive rate. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add decreases clauses to all loops in poly.c, poly_kl.c, polyvec.c, packing.c, sign.c, and ct.h so that CBMC proves termination. The rejection sampling while loops in poly.c (mld_poly_uniform, mld_poly_uniform_4x), poly_kl.c (mld_poly_uniform_eta_4x, mld_poly_uniform_eta, mld_poly_challenge) are intentionally left without decreases clauses: their termination depends on the XOF producing enough valid samples, which is a probabilistic property. The inner loop of mld_unpack_hints in packing.c required an additional invariant (j <= new_hint_count && new_hint_count <= MLDSA_OMEGA) to help the SMT solver verify the decreases clause, since the loop bound is derived from untrusted input rather than a compile-time constant. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add a check to scripts/check-contracts that every __loop__ annotation contains a decreases clause for termination proofs. Rejection sampling loops are excepted as they only terminate probabilistically. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
poly_challenge has three loops, two of which have a trivially bounded loop condition. This commit adds corresponding `decreases` clauses, leaving only the probablistically terminating loop without a clause. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Previously, check-contracts would only allow listing entire functions as exceptions to the loop termination checker, thereby allowing _all_ loops in a function to remain without termination clauses. This commit allows to specify how many loops in a function may be left without a termination clause. This would have caught the case of poly_challenge missing termination clauses for two of its three loops, which the previous commit fixed. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Add a HOL Light functional correctness proof for the aarch64 ML-DSA function poly_chknorm, which checks whether any polynomial coefficient has absolute value >= a given bound. This commit includes: - Functional correctness proof showing the assembly computes `bitval(?i. i < 256 /\ abs(ival(x i)) >= ival bound)` - autogen support for generating aarch64 HOL Light assembly - Update of s2n-bignum Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
The 3-register ld1 {v0, v1, v2} loads 48 bytes but only 36 (polyz_unpack_17)
or 40 (polyz_unpack_19) are needed, causing a 12/8-byte overread past the
input buffer on the last iteration. Split into a 2-register ld1 for
the first 32 bytes plus an overlapping 1-register ld1 for the remainder.
The TBL indices for v2 are adjusted to account for v2's shifted load
offset
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Port the ML-DSA Forward NTT implementation and its HOL Light proof of correctness from s2n-bignum to mldsa-native. The proof verifies the AArch64 NEON NTT implementation at the object-code level, showing that the output coefficients are congruent to the forward NTT of the input modulo 8380417, with bounded output coefficients. A constant-time and memory safety proof is also included. New files: - aarch64/mldsa/mldsa_ntt.S: Assembly derived from dev/aarch64_opt/src/ntt.S - aarch64/proofs/mldsa_ntt.ml: HOL Light proof (MLDSA_NTT_CORRECT, MLDSA_NTT_SUBROUTINE_CORRECT, and MLDSA_NTT_SUBROUTINE_SAFE theorems) - aarch64/proofs/mldsa_specs.ml: Self-contained ML-DSA specifications and congruence/bounds propagation infrastructure, ported from s2n-bignum's common/mlkem_mldsa.ml with only ARM ML-DSA relevant definitions - aarch64/proofs/subroutine_signatures.ml: ML-DSA subroutine signatures for the safety proof infrastructure Modified files: - aarch64/proofs/aarch64_utils.ml: Add MEMORY_128_FROM_32_TAC - aarch64/Makefile: Add mldsa_ntt to build targets Signed-off-by: dkostic <dkostic@amazon.com>
Add CBMC contract for mld_ntt_asm matching the bounds from the AArch64 NTT HOL Light proof (input: abs <= 8380416, output: abs <= 75423752). Add corresponding CBMC proof for mld_ntt_native using the contract. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Consolidate the shared ML-DSA specifications and congruence/bounds propagation infrastructure into a single file at proofs/hol_light/common/mldsa_specs.ml, used by both AArch64 and x86_64 proofs. The merged file contains: - Shared: bitreverse8, reorder, BITREVERSE8_CLAUSES, congruence/bounds infrastructure (CONGBOUND_WORD_*, ASM_CONGBOUND_RULE, etc.), SIMD simplification tactics - x86_64-specific: AVX2 NTT ordering, mldsa_montred/barred/montmul - AArch64-specific: arm_mldsa_forward_ntt, arm_mldsa_barmul ASM_CONGBOUND_RULE now handles both arm_mldsa_barmul and mldsa_montred/barred/montmul cases. Signed-off-by: dkostic <dkostic@amazon.com>
Conflicts: * `README.md`: keep our version, which is completely different.
bjwtaylor
left a comment
There was a problem hiding this comment.
Maybe a can of worms we want to keep closed, however a couple of things worth being aware of.
| } | ||
| ret = mld_sign_signature(sm, smlen, sm + MLDSA_CRYPTO_BYTES, mlen, ctx, | ||
| ctxlen, sk, context); | ||
| *smlen += mlen; |
There was a problem hiding this comment.
Should this return a length if mld_sign_signature errors?
There was a problem hiding this comment.
I agree, ideally mld_sign should not set *smlen if it hasn't written that many bytes to sm. Also, the behavior is inconsistent among different functions, e.g. mld_sign_signature_extmu does set *siglen = 0 on RNG failure.
But this is very minor, in a function that we won't even use. So we aren't going to patch the code here. We will only patch mldsa-native if we have a critical security or business requirement. (And for security I would hope that we wouldn't disagree with upstream, and the only time we would make our own security patch is if we're backporting a fix or making our own quicker than upstream publishes theirs.)
So this should be a suggestion to the upstream project, not something we'll consider implementing.
| /* Common prefix: 0x00/0x01 || ctxlen || ctx */ | ||
| prefix[0] = (hashalg == MLD_PREHASH_NONE) ? 0 : 1; | ||
| prefix[1] = (uint8_t)ctxlen; | ||
| if (ctxlen > 0) |
There was a problem hiding this comment.
Should this check not NULL also, it's compliant to the API contract though still leaves the possibility to crash the code as supposed to a clean error?
There was a problem hiding this comment.
I guess you mean checking ctx != NULL? As you note, the API contract does not allow ctx == NULL unless ctxlen == 0, and the case where ctxlen == 0 is handled correctly (skipping the memcpy call, to avoid undefined behavior).
A C function cannot defend itself against invalid pointers. Handling invalid pointers that happen to be null is rarely useful since it's usually an unrecoverable anyway. It's better to let the platform's invalid pointer dereference mechanism handle it, e.g. a segfault. So the code is following best practices here.
Merge version 1.0.0-beta, released mid-March. We were previously one commit before 1.1.0-alpha2 released in early January.
There are no changes to
mldsa/mldsa_native.hormldsa/mldsa_native_config.h, so I don't expect any change to be needed in our code.PR checklist