Fix FileSystemWatcher startup race on Linux#125740
Fix FileSystemWatcher startup race on Linux#125740lewing wants to merge 1 commit intodotnet:mainfrom
Conversation
47a5000 to
a5bf384
Compare
|
Tagging subscribers to this area: @dotnet/area-system-io |
There was a problem hiding this comment.
Pull request overview
Fixes a Linux FileSystemWatcher startup race by adding a synchronization point so Start() can wait until the inotify processing thread is ready to read events.
Changes:
- Add a
ManualResetEventSlimreadiness signal in the Linux inotify processing thread and wait for it inWatcher.Start(). - Extend
ConvertDllsToWebcil/ WASM Browser targets to classify “pass-through” candidates separately and materialize shared framework assets per project. - Update
global.jsonSDK/tooling versions.
Reviewed changes
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Linux.cs | Adds a read-readiness signal and waits in Start() to avoid missed inotify events. |
| src/tasks/Microsoft.NET.Sdk.WebAssembly.Pack.Tasks/ConvertDllsToWebCil.cs | Adds a new MSBuild task output (PassThroughCandidates) to classify non-converted inputs. |
| src/mono/nuget/Microsoft.NET.Sdk.WebAssembly.Pack/build/Microsoft.NET.Sdk.WebAssembly.Browser.targets | Uses PassThroughCandidates to materialize shared framework assets and avoid identity collisions. |
| global.json | Changes SDK/tools and MSBuild SDK versions (not described in PR summary). |
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Linux.cs
Show resolved
Hide resolved
Ensure the inotify processing thread is ready to read events before Start() returns. Without this synchronization, filesystem events that occur immediately after Start() can be missed if they happen before ProcessEvents() enters its read loop. The race was introduced by dotnet#117148 (single shared inotify instance refactor) and manifests as missed Created events in SymbolicLink tests, particularly under jitstress on slower hardware (arm32). Add a ManualResetEventSlim that ProcessEvents() signals before entering the TryReadEvent loop, and that Start() waits on after releasing the watchers lock. For shared inotify instances (subsequent watchers reusing an existing instance), the event is already set so the wait is a no-op. Fixes dotnet#125737 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
a5bf384 to
c28805a
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a Linux FileSystemWatcher race introduced by the shared-inotify refactor, where Start() could return before the background processing thread is reliably able to consume inotify events, leading to missed/late events in stress environments.
Changes:
- Add a
ManualResetEventSlimto synchronizeWatcher.Start()with the inotify processing thread’s readiness. - Signal readiness from
ProcessEvents()before entering theTryReadEventloop and again on exit to ensure waiters are released. - Block
Start()onWaitForReadReady()after releasing the shared watchers lock.
You can also share your feedback on Copilot code review. Take the survey.
When the #117148 affects the timing of inotify stop/start. I suggest we first increase the timeouts on these tests to have some more certainty that a failed test indicates a functional issue rather than missing an event due to a timeout. |
I'm happy to hand this off, I just looked at all the failures in https://github.com/dotnet/runtime/issue-03630 and decided something wasn't working as designed. If the issue is with the tests then we should fix or disable all of them rather than just letting the KBE be a catch-all |
|
Thanks. I'll close this one because I don't think the described race exists. |
Increase `WaitForExpectedEventTimeout` from 500ms to 1000ms per @tmds suggestion in #125740 (comment). The inotify refactor (#117148) changed stop/start timing for watches, which can delay event delivery under stress conditions. The 500ms timeout is too tight for jitstress runs on slower hardware (arm32), causing false failures in SymbolicLink tests that are being incorrectly attributed to #103630. This is a single constant change that affects 94 tests using the default `ExpectEvent` timeout. It should help distinguish timing issues from functional bugs. Ref: #125737, #125740 /cc @tmds @adamsitnik @jozkee Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Fix a race condition in
FileSystemWatcheron Linux where events can be missed if they occur immediately afterStart()returns, before theProcessEventsbackground thread enters itsread()loop.Root Cause
PR #117148 refactored the Linux
FileSystemWatcherto use a single shared inotify instance. The refactor introduced a race inStart(): afterStartThread()spawns the background thread andCreateRootWatch()registers the inotify watch,Start()returns without ensuring the processing thread is actually reading events. If a filesystem operation triggers an event in this window, the event is missed.This manifests as
"Created event did not occur as expected"failures inSymbolicLink_Changed_Teststests, particularly under jitstress on slower hardware (arm32) where the race window is wider.Fix
Add a
ManualResetEventSlimthat:ProcessEvents()signals just before entering itsTryReadEventloopStart()waits on after releasing the watchers lockFor shared inotify instances (when a subsequent watcher reuses an already-running instance), the event is already set so
WaitForReadReady()returns immediately.Testing
The fix should resolve:
SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create(currently failing on linux arm32 jitstress)SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories(currently disabled via Disable FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories #124849)Related
/cc @tmds @adamsitnik @jozkee