Improve HashSet<T> performance by enabling JIT bounds check elimination#125893
Improve HashSet<T> performance by enabling JIT bounds check elimination#125893stephentoub merged 1 commit intodotnet:mainfrom
Conversation
Change while (i >= 0) to while ((uint)i < (uint)entries.Length) in all 7 hash-chain traversal loops, matching the pattern already used in Dictionary<TKey,TValue>. This lets the JIT eliminate the separate bounds check on entries[i], as the unsigned loop condition serves as both loop exit and implicit range check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@EgorBot -linux_amd -osx_arm64 using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
public class HashSetBench
{
private sealed class WrapComparer : IEqualityComparer<int>
{
public bool Equals(int x, int y) => EqualityComparer<int>.Default.Equals(x, y);
public int GetHashCode(int obj) => EqualityComparer<int>.Default.GetHashCode(obj);
}
private HashSet<int> _setInt;
private HashSet<int> _setIntComparer;
private HashSet<string> _setString;
private int[] _foundInt;
private int[] _missingInt;
private string[] _foundString;
[GlobalSetup]
public void Setup()
{
_foundInt = Enumerable.Range(0, 512).ToArray();
_missingInt = Enumerable.Range(10000, 512).ToArray();
_setInt = new HashSet<int>(_foundInt);
_setIntComparer = new HashSet<int>(_foundInt, new WrapComparer());
_foundString = _foundInt.Select(i => i.ToString()).ToArray();
_setString = new HashSet<string>(_foundString);
}
[Benchmark]
public bool ContainsTrue_Int()
{
bool r = false;
var set = _setInt;
var found = _foundInt;
for (int i = 0; i < found.Length; i++)
r ^= set.Contains(found[i]);
return r;
}
[Benchmark]
public bool ContainsTrueComparer_Int()
{
bool r = false;
var set = _setIntComparer;
var found = _foundInt;
for (int i = 0; i < found.Length; i++)
r ^= set.Contains(found[i]);
return r;
}
[Benchmark]
public bool ContainsFalse_Int()
{
bool r = false;
var set = _setInt;
var keys = _missingInt;
for (int i = 0; i < keys.Length; i++)
r ^= set.Contains(keys[i]);
return r;
}
[Benchmark]
public bool Remove_Hit_Int()
{
var set = _setInt;
var keys = _foundInt;
bool r = false;
for (int i = 0; i < keys.Length; i++)
{
r = set.Remove(keys[i]);
set.Add(keys[i]);
}
return r;
}
[Benchmark]
public bool ContainsTrue_String()
{
bool r = false;
var set = _setString;
var found = _foundString;
for (int i = 0; i < found.Length; i++)
r ^= set.Contains(found[i]);
return r;
}
} |
There was a problem hiding this comment.
Pull request overview
This PR updates HashSet<T>’s hash-chain traversal loop conditions to use an unsigned index-vs-length comparison, aligning with the established Dictionary<TKey,TValue> pattern so the JIT can eliminate redundant bounds checks in the hot inner loops.
Changes:
- Replaced
while (i >= 0)withwhile ((uint)i < (uint)entries.Length)inFindItemIndex(2 loops) andAddIfNotPresent(2 loops). - Replaced
while (i >= 0)withwhile ((uint)i < (uint)entries.Length)inRemove. - Applied the same pattern to
AlternateLookup<TAlternate>’sAddandRemoveloops for consistency with existing unsigned-guarded traversal inFindValue.
|
Tagging subscribers to this area: @dotnet/area-system-collections |
|
Two other changes were evaluated and dropped: Remove branch splitting — adding Entry.HashCode int to uint — changing This analysis was performed by Copilot. |
|
Investigation: I investigated whether changing Result: codegen-neutral. The JIT produces byte-for-byte identical instructions for The ~8% Add regression I initially measured was benchmarking noise (confirmed by the identical codegen). Not worth the churn for zero codegen difference. This investigation was performed with GitHub Copilot assistance. |
|
@EgorBot -linux_amd |
|
@EgorBot -linux_amd using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
public class HashSetBench
{
private sealed class WrapComparer : IEqualityComparer<int>
{
public bool Equals(int x, int y) => EqualityComparer<int>.Default.Equals(x, y);
public int GetHashCode(int obj) => EqualityComparer<int>.Default.GetHashCode(obj);
}
private HashSet<int> _setInt;
private HashSet<int> _setIntComparer;
private HashSet<string> _setString;
private int[] _foundInt;
private int[] _missingInt;
private string[] _foundString;
[GlobalSetup]
public void Setup()
{
_foundInt = Enumerable.Range(0, 512).ToArray();
_missingInt = Enumerable.Range(10000, 512).ToArray();
_setInt = new HashSet<int>(_foundInt);
_setIntComparer = new HashSet<int>(_foundInt, new WrapComparer());
_foundString = _foundInt.Select(i => i.ToString()).ToArray();
_setString = new HashSet<string>(_foundString);
}
[Benchmark]
public bool ContainsTrue_Int()
{
bool r = false;
var set = _setInt;
var found = _foundInt;
for (int i = 0; i < found.Length; i++)
r ^= set.Contains(found[i]);
return r;
}
[Benchmark]
public bool ContainsTrueComparer_Int()
{
bool r = false;
var set = _setIntComparer;
var found = _foundInt;
for (int i = 0; i < found.Length; i++)
r ^= set.Contains(found[i]);
return r;
}
[Benchmark]
public bool ContainsFalse_Int()
{
bool r = false;
var set = _setInt;
var keys = _missingInt;
for (int i = 0; i < keys.Length; i++)
r ^= set.Contains(keys[i]);
return r;
}
[Benchmark]
public bool Remove_Hit_Int()
{
var set = _setInt;
var keys = _foundInt;
bool r = false;
for (int i = 0; i < keys.Length; i++)
{
r = set.Remove(keys[i]);
set.Add(keys[i]);
}
return r;
}
[Benchmark]
public bool ContainsTrue_String()
{
bool r = false;
var set = _setString;
var found = _foundString;
for (int i = 0; i < found.Length; i++)
r ^= set.Contains(found[i]);
return r;
}
} |
|
my bad, I forgot to include benchmark code so let's try again |
Yeah, when no code snippet is provided, it assumes you want dotnet/performance benchmarks. typically, it expects BDN's |
|
ContainsFalse regression on Turin — codegen analysis The egorbot Turin results show a reproducible ~11% regression on Root cause: The loop condition change adds Baseline: PR: For Why only Turin (Zen 5)? Intel Golden Cove and Apple M2 both showed 0.99 — their more aggressive out-of-order execution likely hides the Tradeoff assessment: The wins clearly dominate:
Real workloads rarely consist of 100% misses, so any mix of hits and misses will net positive. This analysis was performed with GitHub Copilot assistance. |
Benchmark summary (egorbot)All benchmarks use 512-element AMD EPYC 9V45 (Zen 5, Turin) -- two runs:
Apple M2 (ARM64):
|
|
OK I think all the evidence is in and this good. Ready for review. |
|
Literally all validation legs passed? 🤯🎉 |
Improve HashSet performance by enabling JIT bounds check elimination
Change
while (i >= 0)towhile ((uint)i < (uint)entries.Length)in all hash-chain traversal loops inHashSet<T>, matching the pattern already used inDictionary<TKey,TValue>.Rationale
Dictionary<TKey,TValue>useswhile ((uint)i < (uint)entries.Length)for its hash-chain loops (seeFindValue,TryInsert,Remove). This unsigned comparison serves as both the loop exit condition and an implicit bounds check onentries[i], allowing the JIT to eliminate the redundant range check.HashSet<T>useswhile (i >= 0)for the same purpose. While functionally equivalent (chain indices are always non-negative, with -1 as sentinel), this signed comparison only tells the JIT thatiis non-negative — not that it's within array bounds. The JIT must therefore emit a separate bounds check on everyentries[i]access.Note:
HashSet<T>.AlternateEqualityComparer.FindValuealready uses the unsigned pattern (with ado/while+(uint)i >= (uint)entries.Lengthguard); this PR brings the remaining 7 loops into alignment.Changes
All changes are in
HashSet.cs, one-line loop condition substitutions:FindItemIndex— 2 loops (value-type and comparer branches)AddIfNotPresent— 2 loops (value-type and comparer branches)Remove— 1 loopAlternateEqualityComparer.Add— 1 loopAlternateEqualityComparer.Remove— 1 loopJIT codegen
FindItemIndex<int>under FullOpts (x64):Before (385 bytes): signed loop + separate bounds check
After (379 bytes): unsigned loop, bounds check eliminated
Benchmark results
BenchmarkDotNet v0.16.0, Intel Core i9-14900K, .NET 11.0.0-dev,
--affinity 1(pinned to P-core).Benchmark harness:
--coreRuncomparing baseline vs optimized CoreLib. Results confirmed stable across multiple runs; suspicious values were re-run with swapped--coreRunorder to rule out positional bias.Int32 (value type, default comparer devirtualized + inlined)
ContainsTrueComparer 0.50: This benchmark uses a custom
IEqualityComparer<int>wrapping the default comparer, so it exercises FindItemIndex's comparer branch. Confirmed across 3 separate runs (0.50, 0.48, 0.52).Miss paths unaffected:
ContainsFalseandRemove_Missare neutral as expected — on a miss with a good hash function, the bucket chain is typically empty or has a single entry, so the loop body barely executes and the per-iteration bounds check saving has minimal impact.Add paths neutral:
AddGivenSizeandCreateAddAndClearare neutral because Add benchmarks are dominated by memory allocation and resize, not the duplicate-check chain walk.String (reference type)
The bounds check is still eliminated for
string(FindItemIndex: 345→335 bytes), but string hash and equality comparison costs dominate per-element work, making the saved instruction negligible.Summary
The improvement is concentrated on value types with the default comparer, where
EqualityComparer<T>.Default.Equalsis devirtualized and inlined to a simple comparison. In that case the bounds check is a meaningful fraction of per-element work in the inner loop.AlternateEqualityComparer paths: Not exercised by existing benchmarks, but changed for consistency —
AlternateEqualityComparer.FindValuealready uses the unsigned pattern in the same file, so leavingAdd/Removewithwhile (i >= 0)would create an inconsistency within the same inner class.No regressions observed.
Alternatives considered
Only 3 of the 7 changed loops have benchmarks that show measurable improvement (FindItemIndex x2, Remove). The remaining 4 (AddIfNotPresent x2, AlternateEqualityComparer Add/Remove) could be left unchanged to minimize the diff. However, that would increase inconsistency:
AlternateEqualityComparer.FindValuealready uses the unsigned pattern, and having a mix ofwhile (i >= 0)andwhile ((uint)i < (uint)entries.Length)across hash-chain loops in the same file would be harder to reason about than a uniform pattern. Each change is a single mechanical token substitution with no behavioral difference.