Neon MVP by hildebrandmw · Pull Request #777 · microsoft/DiskANN

hildebrandmw · 2026-02-16T05:49:26Z

Adds a (mostly) complete AArch64 Neon backend to diskann-wide and wires it through diskann-vector, diskann-quantization, and diskann-benchmark-simd.

This PR has existed in a largely completed state for quite a while now - but as usual the last 10% takes a considerable amount of work. So here it is.

`diskann-wide` — Neon backend

Neon implementations for all SIMD types matching the existing x86_64 (V3/V4) backends:

16 register types across 64-bit and 128-bit widths: u8x8, i8x8, f32x2, u8x16, i8x16, u16x8,
i16x8, u32x4, i32x4, f32x4, u64x2, i64x2, f16x4, f16x8.
Doubled types (f32x8, f32x16, u8x32, i8x32, i32x8, etc.) via the existing Doubled machinery.
Masks: move_mask, from_mask, and optimized keep_first for all 8 mask widths.
Arithmetic: Add, Sub, Mul, FMA, Abs, MinMax.
Comparisons: Full SIMDPartialEq and SIMDPartialOrd.
Bit operations: Not, And, Or, Xor, Shr, Shl (with Miri fallbacks for variable shifts).
Dot products: i16×i16→i32, u8×i8→i32, i8×u8→i32 using vdotq_s32 (requires +dotprod).
Reductions: sum_tree via pairwise addition (vpaddq).
Conversions: f16↔f32 (lossless and cast), u8→i16, i8→i16, i32→f32, split/join for all appropriate types.

Optimized load_simd_first (algorithms/load_first.rs):

Rather than falling back to scalar Emulated element-by-element loads, partial loads use Neon-native primitives:

≤8 bytes: GPR-only overlapping reads — no SIMD instructions needed.
8–16 bytes: Two overlapping vld1_u8 loads combined with vqtbl1q_u8 (TBL shuffle). Includes a Miri shim since Miri does not support vqtbl1q_u8.
32-bit / 64-bit element types: Simple if-else chains using vld1_lane / vcombine.

The aarch64_define_loadstore! macro accepts a $load_first function, and f16x4/f16x8 delegate to the u16x4/u16x8 primitives respectively.

Doubled types implement load_simd_first / store_simd_first branchlessly by passing the full count to the first half and first.saturating_sub(HALF) to the second.

Test infrastructure:

test_neon() helper with WIDE_TEST_MIN_ARCH env-var support, matching the x86_64 test_arch_number() pattern. Supports "all" / "neon" (panics if unavailable) and "scalar" (skips).
All tests use if let Some(arch) = test_neon() { ... } — graceful skip when Neon is unavailable, hard failure when explicitly requested.

`diskann-vector` — Neon distance kernels

14 SIMDSchema implementations covering:

L2, InnerProduct, Cosine for f32, f16, u8, i8.
L1Norm for f32 and f16.
All use scalar epilogues (SIMD epilogues deferred pending Arm64 benchmarking).

`diskann-quantization`

Neon Hadamard transform impl (delegates to scalar via retarget()).
Bit distances almost universally target the scalar architecture as well via retarget().
Neon test paths for bit-slice distances (1–8 bit), bit-transpose distances, and full distances.

`diskann-benchmark-simd`

Neon kernel registrations for f32, f16, u8, and i8.
Refactored per-architecture DispatchRule impls into a match_arch! macro.
Improved dispatch scoring for better mismatch diagnostics.
Added test-aarch64.json and architecture-aware integration test selection.

Other changes

.cargo/config.toml: Enables +neon,+dotprod for aarch64 targets.
.github/workflows/ci.yml: Added aarch64-unknown-linux-gnu to cross-compilation targets.
diskann-providers: Relaxed a PQ distance test tolerance (6e-7 → 6.3e-7) for the different floating opint association used by the Neon implementations.

Design decisions

Compile-time architecture gating. The Neon backend uses a compile-time token rather than runtime feature detection. Neon is mandatory on AArch64.
Runtime dispatch can be added later if needed.
+dotprod required. Needed for vdotq in dot-product kernels. This excludes pre-2018 cores but shoud covers mainstream server and desktop targets (Graviton 2+, Apple M1+, Ampere Altra). ARMv8.4+ mandates it.
Scalar epilogues in diskann-vector. The SIMD epilogues could use load_simd_first for a potential win on i8/u8 cosine where the masked load cost is amortized across multiple operations, but real Arm64 benchmarking is needed first.

Suggested reviewing order

diskann-wide/src/arch/aarch64/mod.rs — Architecture definition, Neon token, dispatch, test_neon().
diskann-wide/src/arch/aarch64/macros.rs — The macro infrastructure that all type files build on.
diskann-wide/src/arch/aarch64/masks.rs — Mask representations and operations (move_mask, from_mask, keep_first).
diskann-wide/src/arch/aarch64/algorithms/load_first.rs — Optimized partial load primitives. Read bottom-up: impl functions first, then wrappers.
One representative type file (e.g., f32x4_.rs for 128-bit float, or i32x4_.rs for dot products) — the rest are structurally identical.
diskann-wide/src/arch/aarch64/double.rs and diskann-wide/src/doubled.rs — Doubled types and branchless partial load/store.
diskann-vector/src/distance/simd.rs — Neon distance kernels.
diskann-benchmark-simd/src/lib.rs — match_arch! refactor and Neon registration.
diskann-quantization/ — Neon test paths (mechanical).

codecov-commenter · 2026-02-16T06:37:50Z

Codecov Report

❌ Patch coverage is 84.11215% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.99%. Comparing base (7cd231a) to head (c18da52).

Files with missing lines	Patch %	Lines
diskann-benchmark-simd/src/lib.rs	47.05%	18 Missing ⚠️
diskann-vector/src/distance/simd.rs	83.33%	4 Missing ⚠️
diskann-wide/src/test_utils/dot_product.rs	94.93%	4 Missing ⚠️
diskann-benchmark-simd/src/bin.rs	50.00%	3 Missing ⚠️
diskann-vector/src/distance/implementations.rs	0.00%	3 Missing ⚠️
diskann-vector/src/conversion.rs	66.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #777      +/-   ##
==========================================
- Coverage   89.00%   88.99%   -0.02%     
==========================================
  Files         428      428              
  Lines       78417    78565     +148     
==========================================
+ Hits        69795    69917     +122     
- Misses       8622     8648      +26

Flag	Coverage Δ
miri	`88.99% <84.11%> (-0.02%)`	⬇️
unittests	`88.99% <84.11%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-providers/src/model/pq/distance/dynamic.rs	`86.40% <ø> (ø)`
diskann-quantization/src/algorithms/hadamard.rs	`97.94% <ø> (ø)`
diskann-quantization/src/bits/distances.rs	`91.49% <100.00%> (+0.05%)`	⬆️
diskann-quantization/src/spherical/iface.rs	`92.90% <ø> (+0.32%)`	⬆️
diskann-vector/src/distance/distance_provider.rs	`100.00% <ø> (ø)`
diskann-wide/src/arch/mod.rs	`83.79% <ø> (ø)`
diskann-wide/src/doubled.rs	`86.72% <100.00%> (+0.02%)`	⬆️
diskann-wide/src/emulated.rs	`95.20% <100.00%> (+0.59%)`	⬆️
diskann-wide/src/helpers.rs	`100.00% <ø> (ø)`
diskann-wide/src/lib.rs	`86.66% <ø> (ø)`
... and 7 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hildebrandmw · 2026-02-16T21:05:31Z

Codecov Report

❌ Patch coverage is 84.11215% with 34 lines in your changes missing coverage. Please review. ✅ Project coverage is 88.99%. Comparing base (7cd231a) to head (c18da52).

Files with missing lines Patch % Lines
diskann-benchmark-simd/src/lib.rs 47.05% 18 Missing ⚠️
diskann-vector/src/distance/simd.rs 83.33% 4 Missing ⚠️
diskann-wide/src/test_utils/dot_product.rs 94.93% 4 Missing ⚠️
diskann-benchmark-simd/src/bin.rs 50.00% 3 Missing ⚠️
diskann-vector/src/distance/implementations.rs 0.00% 3 Missing ⚠️
diskann-vector/src/conversion.rs 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #777      +/-   ##
==========================================
- Coverage   89.00%   88.99%   -0.02%     
==========================================
  Files         428      428              
  Lines       78417    78565     +148     
==========================================
+ Hits        69795    69917     +122     
- Misses       8622     8648      +26     
Flag Coverage Δ
miri 88.99% <84.11%> (-0.02%) ⬇️
unittests 88.99% <84.11%> (-0.02%) ⬇️
Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-providers/src/model/pq/distance/dynamic.rs 86.40% <ø> (ø)
diskann-quantization/src/algorithms/hadamard.rs 97.94% <ø> (ø)
diskann-quantization/src/bits/distances.rs 91.49% <100.00%> (+0.05%) ⬆️
diskann-quantization/src/spherical/iface.rs 92.90% <ø> (+0.32%) ⬆️
diskann-vector/src/distance/distance_provider.rs 100.00% <ø> (ø)
diskann-wide/src/arch/mod.rs 83.79% <ø> (ø)
diskann-wide/src/doubled.rs 86.72% <100.00%> (+0.02%) ⬆️
diskann-wide/src/emulated.rs 95.20% <100.00%> (+0.59%) ⬆️
diskann-wide/src/helpers.rs 100.00% <ø> (ø)
diskann-wide/src/lib.rs 86.66% <ø> (ø)
... and 7 more
... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

I can see that coverage on non-x86-64 architectures is going to be fun to deal with ...

Mark Hildebrand added 4 commits February 15, 2026 21:47

Add a Neon backend.

c636a90

Address Clippy.

e4cd335

Enable neon and dotprod

5e8b87a

Fix benchmark-simd

ff0c5a4

Mark Hildebrand added 5 commits February 16, 2026 10:36

Checkpoint.

6a26bd6

Wrapping up!.

8f3aedc

Fix typo.

076a501

Here we gooo!

b762f79

Disable inclusion of x86_64 module when building rust-doc.

c18da52

hildebrandmw changed the title ~~Neon MVP.~~ Neon MVP Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neon MVP#777

Neon MVP#777
hildebrandmw wants to merge 9 commits intomainfrom
mhildebr/neon

hildebrandmw commented Feb 16, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 16, 2026 •

edited

Loading

Uh oh!

hildebrandmw commented Feb 16, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hildebrandmw commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

diskann-wide — Neon backend

diskann-vector — Neon distance kernels

diskann-quantization

diskann-benchmark-simd

Other changes

Design decisions

Suggested reviewing order

Uh oh!

codecov-commenter commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hildebrandmw commented Feb 16, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hildebrandmw commented Feb 16, 2026 •

edited

Loading

`diskann-wide` — Neon backend

`diskann-vector` — Neon distance kernels

`diskann-quantization`

`diskann-benchmark-simd`

codecov-commenter commented Feb 16, 2026 •

edited

Loading