Improve HashSet<T> performance by enabling JIT bounds check elimination by danmoseley · Pull Request #125893 · dotnet/runtime

danmoseley · 2026-03-21T20:42:47Z

Improve HashSet performance by enabling JIT bounds check elimination

Change while (i >= 0) to while ((uint)i < (uint)entries.Length) in all hash-chain traversal loops in HashSet<T>, matching the pattern already used in Dictionary<TKey,TValue>.

Rationale

Dictionary<TKey,TValue> uses while ((uint)i < (uint)entries.Length) for its hash-chain loops (see FindValue, TryInsert, Remove). This unsigned comparison serves as both the loop exit condition and an implicit bounds check on entries[i], allowing the JIT to eliminate the redundant range check.

HashSet<T> uses while (i >= 0) for the same purpose. While functionally equivalent (chain indices are always non-negative, with -1 as sentinel), this signed comparison only tells the JIT that i is non-negative — not that it's within array bounds. The JIT must therefore emit a separate bounds check on every entries[i] access.

Note: HashSet<T>.AlternateEqualityComparer.FindValue already uses the unsigned pattern (with a do/while + (uint)i >= (uint)entries.Length guard); this PR brings the remaining 7 loops into alignment.

Changes

All changes are in HashSet.cs, one-line loop condition substitutions:

FindItemIndex — 2 loops (value-type and comparer branches)
AddIfNotPresent — 2 loops (value-type and comparer branches)
Remove — 1 loop
AlternateEqualityComparer.Add — 1 loop
AlternateEqualityComparer.Remove — 1 loop

JIT codegen

FindItemIndex<int> under FullOpts (x64):

Before (385 bytes): signed loop + separate bounds check

; loop top
cmp ecx, r13d
jae RNGCHKFAIL          ; <-- bounds check
...
; loop bottom
test ecx, ecx
jge LOOP_TOP             ; signed: i >= 0

After (379 bytes): unsigned loop, bounds check eliminated

; loop bottom
cmp r13d, ecx
ja LOOP_TOP              ; unsigned: Length > i (doubles as bounds check)
; no RNGCHKFAIL

Benchmark results

BenchmarkDotNet v0.16.0, Intel Core i9-14900K, .NET 11.0.0-dev, --affinity 1 (pinned to P-core).
Benchmark harness: --coreRun comparing baseline vs optimized CoreLib. Results confirmed stable across multiple runs; suspicious values were re-run with swapped --coreRun order to rule out positional bias.

Int32 (value type, default comparer devirtualized + inlined)

Benchmark	Size	Ratio	Notes
ContainsTrue	512	0.90	10% faster
ContainsTrueComparer	512	0.50	2x faster (see note below)
Remove_Hit	16	0.97
Remove_Hit	512	0.96
Remove_Hit	4096	0.94–0.98
Remove_Miss	all	1.00	neutral
ContainsFalse	512	1.00	neutral
AddGivenSize	512	1.00	neutral
CreateAddAndRemove	512	1.00	neutral
CreateAddAndClear	512	1.00	neutral
CtorFromCollection	512	1.00	neutral
IterateForEach	512	1.00	neutral

ContainsTrueComparer 0.50: This benchmark uses a custom IEqualityComparer<int> wrapping the default comparer, so it exercises FindItemIndex's comparer branch. Confirmed across 3 separate runs (0.50, 0.48, 0.52).

Miss paths unaffected: ContainsFalse and Remove_Miss are neutral as expected — on a miss with a good hash function, the bucket chain is typically empty or has a single entry, so the loop body barely executes and the per-iteration bounds check saving has minimal impact.

Add paths neutral: AddGivenSize and CreateAddAndClear are neutral because Add benchmarks are dominated by memory allocation and resize, not the duplicate-check chain walk.

String (reference type)

Benchmark	Size	Ratio	Notes
All benchmarks	all	1.00	neutral

The bounds check is still eliminated for string (FindItemIndex: 345→335 bytes), but string hash and equality comparison costs dominate per-element work, making the saved instruction negligible.

Summary

The improvement is concentrated on value types with the default comparer, where EqualityComparer<T>.Default.Equals is devirtualized and inlined to a simple comparison. In that case the bounds check is a meaningful fraction of per-element work in the inner loop.

AlternateEqualityComparer paths: Not exercised by existing benchmarks, but changed for consistency — AlternateEqualityComparer.FindValue already uses the unsigned pattern in the same file, so leaving Add/Remove with while (i >= 0) would create an inconsistency within the same inner class.

No regressions observed.

Alternatives considered

Only 3 of the 7 changed loops have benchmarks that show measurable improvement (FindItemIndex x2, Remove). The remaining 4 (AddIfNotPresent x2, AlternateEqualityComparer Add/Remove) could be left unchanged to minimize the diff. However, that would increase inconsistency: AlternateEqualityComparer.FindValue already uses the unsigned pattern, and having a mix of while (i >= 0) and while ((uint)i < (uint)entries.Length) across hash-chain loops in the same file would be harder to reason about than a uniform pattern. Each change is a single mechanical token substitution with no behavioral difference.

Change while (i >= 0) to while ((uint)i < (uint)entries.Length) in all 7 hash-chain traversal loops, matching the pattern already used in Dictionary<TKey,TValue>. This lets the JIT eliminate the separate bounds check on entries[i], as the unsigned loop condition serves as both loop exit and implicit range check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-03-21T20:43:58Z

@EgorBot -linux_amd -osx_arm64

using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;

public class HashSetBench
{
    private sealed class WrapComparer : IEqualityComparer<int>
    {
        public bool Equals(int x, int y) => EqualityComparer<int>.Default.Equals(x, y);
        public int GetHashCode(int obj) => EqualityComparer<int>.Default.GetHashCode(obj);
    }

    private HashSet<int> _setInt;
    private HashSet<int> _setIntComparer;
    private HashSet<string> _setString;
    private int[] _foundInt;
    private int[] _missingInt;
    private string[] _foundString;

    [GlobalSetup]
    public void Setup()
    {
        _foundInt = Enumerable.Range(0, 512).ToArray();
        _missingInt = Enumerable.Range(10000, 512).ToArray();
        _setInt = new HashSet<int>(_foundInt);
        _setIntComparer = new HashSet<int>(_foundInt, new WrapComparer());
        _foundString = _foundInt.Select(i => i.ToString()).ToArray();
        _setString = new HashSet<string>(_foundString);
    }

    [Benchmark]
    public bool ContainsTrue_Int()
    {
        bool r = false;
        var set = _setInt;
        var found = _foundInt;
        for (int i = 0; i < found.Length; i++)
            r ^= set.Contains(found[i]);
        return r;
    }

    [Benchmark]
    public bool ContainsTrueComparer_Int()
    {
        bool r = false;
        var set = _setIntComparer;
        var found = _foundInt;
        for (int i = 0; i < found.Length; i++)
            r ^= set.Contains(found[i]);
        return r;
    }

    [Benchmark]
    public bool ContainsFalse_Int()
    {
        bool r = false;
        var set = _setInt;
        var keys = _missingInt;
        for (int i = 0; i < keys.Length; i++)
            r ^= set.Contains(keys[i]);
        return r;
    }

    [Benchmark]
    public bool Remove_Hit_Int()
    {
        var set = _setInt;
        var keys = _foundInt;
        bool r = false;
        for (int i = 0; i < keys.Length; i++)
        {
            r = set.Remove(keys[i]);
            set.Add(keys[i]);
        }
        return r;
    }

    [Benchmark]
    public bool ContainsTrue_String()
    {
        bool r = false;
        var set = _setString;
        var found = _foundString;
        for (int i = 0; i < found.Length; i++)
            r ^= set.Contains(found[i]);
        return r;
    }
}

Copilot

Pull request overview

This PR updates HashSet<T>’s hash-chain traversal loop conditions to use an unsigned index-vs-length comparison, aligning with the established Dictionary<TKey,TValue> pattern so the JIT can eliminate redundant bounds checks in the hot inner loops.

Changes:

Replaced while (i >= 0) with while ((uint)i < (uint)entries.Length) in FindItemIndex (2 loops) and AddIfNotPresent (2 loops).
Replaced while (i >= 0) with while ((uint)i < (uint)entries.Length) in Remove.
Applied the same pattern to AlternateLookup<TAlternate>’s Add and Remove loops for consistency with existing unsigned-guarded traversal in FindValue.

dotnet-policy-service · 2026-03-21T20:45:02Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

danmoseley · 2026-03-21T20:47:30Z

Two other changes were evaluated and dropped:

Remove branch splitting — adding typeof(T).IsValueType && comparer == null guard to Remove (matching the pattern in FindItemIndex/AddIfNotPresent and Dictionary.Remove). This enables devirtualization of EqualityComparer<T>.Default.Equals for value types, but Remove<int> codegen grew from 365 to 376 bytes. Benchmarks showed no measurable difference (ratios 0.97-1.01 across sizes 16/512/4096 for both hit and miss).

Entry.HashCode int to uint — changing Entry.HashCode from int to uint to match Dictionary.Entry.hashCode. Benchmarks showed no benefit on Contains or Remove, and a possible ~8% regression on Add (ratio 1.08 on re-run, consistently above 1.0). Investigating this separate to this PR.

_{This analysis was performed by Copilot.}

danmoseley · 2026-03-21T22:07:26Z

Investigation: Entry.HashCode int→uint (matching Dictionary)

I investigated whether changing Entry.HashCode from int to uint (as Dictionary uses) would provide additional benefit on top of the loop condition changes in this PR.

Result: codegen-neutral. The JIT produces byte-for-byte identical instructions for AddIfNotPresent<int> (624 bytes) regardless of whether HashCode is int or uint. This makes sense — cmp eax, ecx is the same instruction for signed and unsigned equality, and the (uint) cast on GetHashCode() is a no-op at machine level.

The ~8% Add regression I initially measured was benchmarking noise (confirmed by the identical codegen). Not worth the churn for zero codegen difference.

This investigation was performed with GitHub Copilot assistance.

danmoseley · 2026-03-21T22:09:02Z

@EgorBot -linux_amd

danmoseley · 2026-03-22T01:05:19Z

@EgorBot -linux_amd

using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;

public class HashSetBench
{
    private sealed class WrapComparer : IEqualityComparer<int>
    {
        public bool Equals(int x, int y) => EqualityComparer<int>.Default.Equals(x, y);
        public int GetHashCode(int obj) => EqualityComparer<int>.Default.GetHashCode(obj);
    }

    private HashSet<int> _setInt;
    private HashSet<int> _setIntComparer;
    private HashSet<string> _setString;
    private int[] _foundInt;
    private int[] _missingInt;
    private string[] _foundString;

    [GlobalSetup]
    public void Setup()
    {
        _foundInt = Enumerable.Range(0, 512).ToArray();
        _missingInt = Enumerable.Range(10000, 512).ToArray();
        _setInt = new HashSet<int>(_foundInt);
        _setIntComparer = new HashSet<int>(_foundInt, new WrapComparer());
        _foundString = _foundInt.Select(i => i.ToString()).ToArray();
        _setString = new HashSet<string>(_foundString);
    }

    [Benchmark]
    public bool ContainsTrue_Int()
    {
        bool r = false;
        var set = _setInt;
        var found = _foundInt;
        for (int i = 0; i < found.Length; i++)
            r ^= set.Contains(found[i]);
        return r;
    }

    [Benchmark]
    public bool ContainsTrueComparer_Int()
    {
        bool r = false;
        var set = _setIntComparer;
        var found = _foundInt;
        for (int i = 0; i < found.Length; i++)
            r ^= set.Contains(found[i]);
        return r;
    }

    [Benchmark]
    public bool ContainsFalse_Int()
    {
        bool r = false;
        var set = _setInt;
        var keys = _missingInt;
        for (int i = 0; i < keys.Length; i++)
            r ^= set.Contains(keys[i]);
        return r;
    }

    [Benchmark]
    public bool Remove_Hit_Int()
    {
        var set = _setInt;
        var keys = _foundInt;
        bool r = false;
        for (int i = 0; i < keys.Length; i++)
        {
            r = set.Remove(keys[i]);
            set.Add(keys[i]);
        }
        return r;
    }

    [Benchmark]
    public bool ContainsTrue_String()
    {
        bool r = false;
        var set = _setString;
        var found = _foundString;
        for (int i = 0; i < found.Length; i++)
            r ^= set.Contains(found[i]);
        return r;
    }
}

danmoseley · 2026-03-22T01:05:40Z

my bad, I forgot to include benchmark code so 2026-03-21 22:14:17.015 â�Œ Too many benchmarks discovered: 4262.

let's try again

EgorBo · 2026-03-22T01:31:59Z

my bad, I forgot to include benchmark code so 2026-03-21 22:14:17.015 â�Œ Too many benchmarks discovered: 4262.

let's try again

Yeah, when no code snippet is provided, it assumes you want dotnet/performance benchmarks. typically, it expects BDN's --filter to know what kind of benchmarks to run, but the bot has a hard limit (around 50 or so)

EgorBo · 2026-03-22T01:32:17Z

@MihuBot

danmoseley · 2026-03-22T02:11:31Z

ContainsFalse regression on Turin — codegen analysis

The egorbot Turin results show a reproducible ~11% regression on ContainsFalse_Int (0.85 and 0.89 across two runs). ARM64 and local Intel show neutral (0.99). Here's why.

Root cause: The loop condition change adds entries.Length to the critical path at loop entry.

Baseline: dec ecx; js — the js (jump-if-sign) reuses flags from dec, so it can decide whether to enter the loop with zero extra work.

PR: dec ecx; mov r13d,[entries.Length]; cmp r13d,ecx; jbe — must wait for the entries.Length memory load to complete before the comparison can execute.

For ContainsFalse, every lookup misses. With 512 items in a 521-bucket table (98% load factor), nearly every miss hits an occupied bucket, traverses one entry (hash mismatch), then exits via entry.Next == -1. The loop body is entered exactly once, so the eliminated bounds check (saving 2 instructions inside the loop) doesn't accumulate enough to offset the added entry-path latency.

Why only Turin (Zen 5)? Intel Golden Cove and Apple M2 both showed 0.99 — their more aggressive out-of-order execution likely hides the entries.Length load latency via speculative execution. Zen 5 appears more sensitive to this specific dependency chain.

Tradeoff assessment: The wins clearly dominate:

ContainsTrue_Int: +5–7% on all platforms
ContainsTrueComparer_Int: +71–75% on all platforms
ContainsFalse_Int: -11–15% on Turin only (neutral elsewhere)
Remove, String: neutral everywhere

Real workloads rarely consist of 100% misses, so any mix of hits and misses will net positive.

This analysis was performed with GitHub Copilot assistance.

danmoseley · 2026-03-22T02:13:49Z

Benchmark summary (egorbot)

All benchmarks use 512-element HashSet<int> or HashSet<string>. Ratio = PR/main (lower is faster).

AMD EPYC 9V45 (Zen 5, Turin) -- two runs:

Benchmark	Run 1	Run 2	Verdict
ContainsTrue_Int	0.95	0.94	Faster
ContainsTrueComparer_Int	0.57	0.57	Faster
ContainsFalse_Int	1.17	1.12	Slower
Remove_Hit_Int	1.00	1.00	Same
ContainsTrue_String	1.01	0.99	Same

Apple M2 (ARM64):

Benchmark	Ratio	Verdict
ContainsTrue_Int	0.93	Faster
ContainsTrueComparer_Int	0.58	Faster
ContainsFalse_Int	1.01	Same
Remove_Hit_Int	0.99	Same
ContainsTrue_String	1.00	Same

ContainsTrue and ContainsTrueComparer improve on all platforms. ContainsFalse regresses on AMD Turin (Zen 5) only -- not on Intel x64 or Apple ARM64 (see codegen analysis above -- extra entries.Length load on the miss-path critical path, hidden by OOO execution on other microarchitectures). Remove and String are neutral everywhere.

danmoseley · 2026-03-22T02:17:47Z

OK I think all the evidence is in and this good. Ready for review.

danmoseley · 2026-03-22T02:19:54Z

Literally all validation legs passed? 🤯🎉

Copilot AI review requested due to automatic review settings March 21, 2026 20:42

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 21, 2026

dotnet-policy-service bot assigned danmoseley Mar 21, 2026

Copilot started reviewing on behalf of danmoseley March 21, 2026 20:43 View session

danmoseley added area-System.Collections and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Mar 21, 2026

EgorBot mentioned this pull request Mar 21, 2026

Benchmarks for dotnet/runtime#125893 (for @danmoseley) EgorBot/Benchmarks#59

Open

Copilot AI reviewed Mar 21, 2026

View reviewed changes

EgorBot mentioned this pull request Mar 21, 2026

Benchmarks for dotnet/runtime#125893 (for @danmoseley) EgorBot/Benchmarks#60

Open

EgorBot mentioned this pull request Mar 22, 2026

Benchmarks for dotnet/runtime#125893 (for @danmoseley) EgorBot/Benchmarks#61

Open

MihuBot mentioned this pull request Mar 22, 2026

[JitDiff X64] [danmoseley] Improve HashSet<T> performance by enabling JIT bo ... MihuBot/runtime-utils#1831

Open

danmoseley requested a review from stephentoub March 22, 2026 02:17

stephentoub approved these changes Mar 22, 2026

View reviewed changes

stephentoub merged commit b2bba6d into dotnet:main Mar 22, 2026
151 checks passed

danmoseley deleted the hashset-perf-opt branch March 22, 2026 06:39

Conversation

danmoseley commented Mar 21, 2026

Improve HashSet performance by enabling JIT bounds check elimination

Rationale

Changes

JIT codegen

Benchmark results

Int32 (value type, default comparer devirtualized + inlined)

String (reference type)

Summary

Alternatives considered

Uh oh!

danmoseley commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

dotnet-policy-service bot commented Mar 21, 2026

Uh oh!

danmoseley commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danmoseley commented Mar 21, 2026

Uh oh!

danmoseley commented Mar 21, 2026

Uh oh!

danmoseley commented Mar 22, 2026

Uh oh!

danmoseley commented Mar 22, 2026

Uh oh!

EgorBo commented Mar 22, 2026

Uh oh!

EgorBo commented Mar 22, 2026

Uh oh!

danmoseley commented Mar 22, 2026

Uh oh!

danmoseley commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark summary (egorbot)

Uh oh!

danmoseley commented Mar 22, 2026

Uh oh!

danmoseley commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danmoseley commented Mar 21, 2026 •

edited

Loading

danmoseley commented Mar 22, 2026 •

edited

Loading