Skip to content

Fix FileSystemWatcher startup race on Linux#125740

Closed
lewing wants to merge 1 commit intodotnet:mainfrom
lewing:fix/fsw-inotify-startup-race
Closed

Fix FileSystemWatcher startup race on Linux#125740
lewing wants to merge 1 commit intodotnet:mainfrom
lewing:fix/fsw-inotify-startup-race

Conversation

@lewing
Copy link
Member

@lewing lewing commented Mar 18, 2026

Summary

Fix a race condition in FileSystemWatcher on Linux where events can be missed if they occur immediately after Start() returns, before the ProcessEvents background thread enters its read() loop.

Root Cause

PR #117148 refactored the Linux FileSystemWatcher to use a single shared inotify instance. The refactor introduced a race in Start(): after StartThread() spawns the background thread and CreateRootWatch() registers the inotify watch, Start() returns without ensuring the processing thread is actually reading events. If a filesystem operation triggers an event in this window, the event is missed.

This manifests as "Created event did not occur as expected" failures in SymbolicLink_Changed_Tests tests, particularly under jitstress on slower hardware (arm32) where the race window is wider.

Fix

Add a ManualResetEventSlim that:

  1. ProcessEvents() signals just before entering its TryReadEvent loop
  2. Start() waits on after releasing the watchers lock

For shared inotify instances (when a subsequent watcher reuses an already-running instance), the event is already set so WaitForReadReady() returns immediately.

Testing

The fix should resolve:

Related

/cc @tmds @adamsitnik @jozkee

@lewing lewing requested a review from akoeplinger as a code owner March 18, 2026 18:56
Copilot AI review requested due to automatic review settings March 18, 2026 18:56
@lewing lewing marked this pull request as draft March 18, 2026 18:57
@lewing lewing force-pushed the fix/fsw-inotify-startup-race branch from 47a5000 to a5bf384 Compare March 18, 2026 18:58
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@lewing lewing requested review from adamsitnik and jozkee March 18, 2026 18:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Linux FileSystemWatcher startup race by adding a synchronization point so Start() can wait until the inotify processing thread is ready to read events.

Changes:

  • Add a ManualResetEventSlim readiness signal in the Linux inotify processing thread and wait for it in Watcher.Start().
  • Extend ConvertDllsToWebcil / WASM Browser targets to classify “pass-through” candidates separately and materialize shared framework assets per project.
  • Update global.json SDK/tooling versions.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

File Description
src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Linux.cs Adds a read-readiness signal and waits in Start() to avoid missed inotify events.
src/tasks/Microsoft.NET.Sdk.WebAssembly.Pack.Tasks/ConvertDllsToWebCil.cs Adds a new MSBuild task output (PassThroughCandidates) to classify non-converted inputs.
src/mono/nuget/Microsoft.NET.Sdk.WebAssembly.Pack/build/Microsoft.NET.Sdk.WebAssembly.Browser.targets Uses PassThroughCandidates to materialize shared framework assets and avoid identity collisions.
global.json Changes SDK/tools and MSBuild SDK versions (not described in PR summary).

You can also share your feedback on Copilot code review. Take the survey.

Ensure the inotify processing thread is ready to read events before
Start() returns. Without this synchronization, filesystem events that
occur immediately after Start() can be missed if they happen before
ProcessEvents() enters its read loop.

The race was introduced by dotnet#117148 (single shared inotify instance
refactor) and manifests as missed Created events in SymbolicLink tests,
particularly under jitstress on slower hardware (arm32).

Add a ManualResetEventSlim that ProcessEvents() signals before entering
the TryReadEvent loop, and that Start() waits on after releasing the
watchers lock. For shared inotify instances (subsequent watchers reusing
an existing instance), the event is already set so the wait is a no-op.

Fixes dotnet#125737

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lewing lewing force-pushed the fix/fsw-inotify-startup-race branch from a5bf384 to c28805a Compare March 18, 2026 19:02
@lewing lewing requested a review from Copilot March 18, 2026 19:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Linux FileSystemWatcher race introduced by the shared-inotify refactor, where Start() could return before the background processing thread is reliably able to consume inotify events, leading to missed/late events in stress environments.

Changes:

  • Add a ManualResetEventSlim to synchronize Watcher.Start() with the inotify processing thread’s readiness.
  • Signal readiness from ProcessEvents() before entering the TryReadEvent loop and again on exit to ensure waiters are released.
  • Block Start() on WaitForReadReady() after releasing the shared watchers lock.

You can also share your feedback on Copilot code review. Take the survey.

@lewing lewing requested a review from tmds March 18, 2026 19:13
@lewing lewing marked this pull request as ready for review March 18, 2026 19:19
@tmds
Copy link
Member

tmds commented Mar 18, 2026

Start() returns without ensuring the processing thread is actually reading events. If a filesystem operation triggers an event in this window, the event is missed.

When the Start method returns events should be queued up against the inotify and _emitEvents is true.
It doesn't matter that ProcessEvents may not be reading the inotify yet.

#117148 affects the timing of inotify stop/start. I suggest we first increase the timeouts on these tests to have some more certainty that a failed test indicates a functional issue rather than missing an event due to a timeout.

@lewing
Copy link
Member Author

lewing commented Mar 18, 2026

Start() returns without ensuring the processing thread is actually reading events. If a filesystem operation triggers an event in this window, the event is missed.

When the Start method returns events should be queued up against the inotify and _emitEvents is true. It doesn't matter that ProcessEvents may not be reading the inotify yet.

#117148 affects the timing of inotify stop/start. I suggest we first increase the timeouts on these tests to have some more certainty that a failed test indicates a functional issue rather than missing an event due to a timeout.

I'm happy to hand this off, I just looked at all the failures in https://github.com/dotnet/runtime/issue-03630 and decided something wasn't working as designed. If the issue is with the tests then we should fix or disable all of them rather than just letting the KBE be a catch-all

@lewing
Copy link
Member Author

lewing commented Mar 18, 2026

#125744

@tmds
Copy link
Member

tmds commented Mar 18, 2026

#125744

Thanks. I'll close this one because I don't think the described race exists.

@tmds tmds closed this Mar 18, 2026
lewing added a commit that referenced this pull request Mar 19, 2026
Increase `WaitForExpectedEventTimeout` from 500ms to 1000ms per @tmds
suggestion in
#125740 (comment).

The inotify refactor (#117148) changed stop/start timing for watches,
which can delay event delivery under stress conditions. The 500ms
timeout is too tight for jitstress runs on slower hardware (arm32),
causing false failures in SymbolicLink tests that are being incorrectly
attributed to #103630.

This is a single constant change that affects 94 tests using the default
`ExpectEvent` timeout. It should help distinguish timing issues from
functional bugs.

Ref: #125737, #125740

/cc @tmds @adamsitnik @jozkee

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileSystemWatcher SymbolicLink tests failing after inotify refactor (#117148)

3 participants