fix: There is no Stream(gpu, {}) in current thread by kyr0 · Pull Request #3391 · ml-explore/mlx

kyr0 · 2026-04-10T10:23:02Z

Proposed changes

This is for quickly discussing the following idea:

CommandEncoder& get_command_encoder(Stream s) {
  auto& encoders = get_command_encoders();
  auto it = encoders.find(s.index);
  if (it == encoders.end()) {
    // Auto-register the stream on this thread.  MLX streams are thread-local,
    // but arrays may carry references to streams created on other threads
    // (e.g. module-level streams created at import time).  Rather than
    // throwing, lazily create a CommandEncoder so the current thread can
    // dispatch work for that stream.
    auto& d = device(s.device);
    it = encoders.try_emplace(s.index, d, s.index, d.residency_set()).first;
  }
  return it->second;
}

Instead of:

CommandEncoder& get_command_encoder(Stream s) {
  auto& encoders = get_command_encoders();
  auto it = encoders.find(s.index);
  if (it == encoders.end()) {
  throw std::runtime_error(
        fmt::format("There is no Stream(gpu, {}) in current thread.", s.index));
  }
  return it->second;
}

When MLX operations in any client application are serialized through a single executor, this is safe. If people use it without serialization... it might break anyways? But for all consumers who build sane, multi-threaded code, this will simply work instead of giving them unnecessary headaches?

I'm wondering what I'm overlooking.

(I tested this with oMLX which had issues with newer versions of mlx - anything above 0.31.1 in my custom fork - and it works flawlessly)

#3078

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

Harperbot · 2026-04-10T16:08:47Z

+1 from a downstream user. Our scoring pipeline lazily imports mlx_lm from concurrent.futures.ThreadPoolExecutor workers, so arrays in those threads frequently end up referring to streams that were created on a different thread (e.g. module-level setup on the main thread before the executor was spawned). This is exactly the path that hits the "There is no Stream(gpu, {}) in current thread" runtime error today, even after #3348 made CommandEncoder thread-local.

The auto-register fallback you propose is exactly what would let us drop our threading.RLock workaround and run three MAGI debaters in parallel again. Happy to test against a build of this PR if the maintainers want a downstream validation data point — we have a production-shape benchmark (a daily multi-model batch, 3-model ensemble, M1 Ultra 48 GPU cores).

zcbenz · 2026-04-10T23:42:28Z

Can you try the new ThreadLocalStream class introduced in #3355, which achieves exactly the same thing with this PR.

We can not accept the change you proposed because it gives user false impression they are committing to the same queue, which we might want to support in future.

fix: There is no Stream(gpu, {}) in current thread

b4ff114

Harperbot mentioned this pull request Apr 10, 2026

Request: Support for concurrent inference of independent models from separate threads #3078

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: There is no Stream(gpu, {}) in current thread#3391

fix: There is no Stream(gpu, {}) in current thread#3391
kyr0 wants to merge 1 commit intoml-explore:mainfrom
jsheaven:fix/metal-there-is-no-stream

kyr0 commented Apr 10, 2026

Uh oh!

Harperbot commented Apr 10, 2026

Uh oh!

zcbenz commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kyr0 commented Apr 10, 2026

Proposed changes

Checklist

Uh oh!

Harperbot commented Apr 10, 2026

Uh oh!

zcbenz commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants