Skip to content

Windows wheels [B]: fix XNNPACK test hang by forcing single-threaded threadpool#18373

Draft
manuelcandales wants to merge 4 commits intomainfrom
manuel/windows-wheels-fix-B
Draft

Windows wheels [B]: fix XNNPACK test hang by forcing single-threaded threadpool#18373
manuelcandales wants to merge 4 commits intomainfrom
manuel/windows-wheels-fix-B

Conversation

@manuelcandales
Copy link
Contributor

pthreadpool's condvar-based synchronization on Windows can deadlock with multiple threads due to a lost-wakeup bug where signal_num_recruited_threads uses cnd_signal on a condition variable shared between two different wait conditions. Force num_threads=1 on Windows to avoid the issue entirely.

The previous fix (setting sslBackend in pre_build_script.sh) only
applied to nested tokenizer submodules. The top-level submodule
checkout still used schannel via the reusable workflow's
`submodules: true`, causing SEC_E_ILLEGAL_MESSAGE errors when
cloning from git.gitlab.arm.com.

Move all submodule initialization into the pre-build script where
we can control the SSL backend, and disable submodule checkout in
the workflow.
Move submodule initialization above the aarch64 sed workaround so
the file it edits is guaranteed to exist even if the caller disables
submodule checkout. Also remove the redundant UNAME_S assignment
later in the script.
The default 60-minute timeout from pytorch/test-infra is too tight for
the Windows wheel build + smoke test, causing jobs to be cancelled.
…adpool

pthreadpool's condvar-based synchronization on Windows can deadlock with multiple
threads due to a lost-wakeup bug where signal_num_recruited_threads uses cnd_signal
on a condition variable shared between two different wait conditions.
Force num_threads=1 on Windows to avoid the issue entirely.
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 20, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18373

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 04fb95d with merge base 94e9ca6 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2026
@manuelcandales manuelcandales force-pushed the manuel/build-windows-wheels-fix-2 branch from 8033f32 to 77989a2 Compare March 20, 2026 20:15
Base automatically changed from manuel/build-windows-wheels-fix-2 to main March 20, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant