Skip to content

fix: There is no Stream(gpu, {}) in current thread#3391

Open
kyr0 wants to merge 1 commit intoml-explore:mainfrom
jsheaven:fix/metal-there-is-no-stream
Open

fix: There is no Stream(gpu, {}) in current thread#3391
kyr0 wants to merge 1 commit intoml-explore:mainfrom
jsheaven:fix/metal-there-is-no-stream

Conversation

@kyr0
Copy link
Copy Markdown

@kyr0 kyr0 commented Apr 10, 2026

Proposed changes

This is for quickly discussing the following idea:

CommandEncoder& get_command_encoder(Stream s) {
  auto& encoders = get_command_encoders();
  auto it = encoders.find(s.index);
  if (it == encoders.end()) {
    // Auto-register the stream on this thread.  MLX streams are thread-local,
    // but arrays may carry references to streams created on other threads
    // (e.g. module-level streams created at import time).  Rather than
    // throwing, lazily create a CommandEncoder so the current thread can
    // dispatch work for that stream.
    auto& d = device(s.device);
    it = encoders.try_emplace(s.index, d, s.index, d.residency_set()).first;
  }
  return it->second;
}

Instead of:

CommandEncoder& get_command_encoder(Stream s) {
  auto& encoders = get_command_encoders();
  auto it = encoders.find(s.index);
  if (it == encoders.end()) {
  throw std::runtime_error(
        fmt::format("There is no Stream(gpu, {}) in current thread.", s.index));
  }
  return it->second;
}

When MLX operations in any client application are serialized through a single executor, this is safe. If people use it without serialization... it might break anyways? But for all consumers who build sane, multi-threaded code, this will simply work instead of giving them unnecessary headaches?

I'm wondering what I'm overlooking.

(I tested this with oMLX which had issues with newer versions of mlx - anything above 0.31.1 in my custom fork - and it works flawlessly)

#3078

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@Harperbot
Copy link
Copy Markdown

+1 from a downstream user. Our scoring pipeline lazily imports mlx_lm from concurrent.futures.ThreadPoolExecutor workers, so arrays in those threads frequently end up referring to streams that were created on a different thread (e.g. module-level setup on the main thread before the executor was spawned). This is exactly the path that hits the "There is no Stream(gpu, {}) in current thread" runtime error today, even after #3348 made CommandEncoder thread-local.

The auto-register fallback you propose is exactly what would let us drop our threading.RLock workaround and run three MAGI debaters in parallel again. Happy to test against a build of this PR if the maintainers want a downstream validation data point — we have a production-shape benchmark (a daily multi-model batch, 3-model ensemble, M1 Ultra 48 GPU cores).

@zcbenz
Copy link
Copy Markdown
Collaborator

zcbenz commented Apr 10, 2026

Can you try the new ThreadLocalStream class introduced in #3355, which achieves exactly the same thing with this PR.

We can not accept the change you proposed because it gives user false impression they are committing to the same queue, which we might want to support in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants