GPU TTS - Remaining Bugs & Issues

GPU CLI Issues

1. Workspace sync DNS resolution failure (intermittent)

Symptom: workspace sync failed: Failed to sync workspace to pod: Transfer protocol error: stamp fetch failed: SSH connection failed: TCP connection failed: failed to lookup address information: nodename nor servname provided, or not known
Cause: When the relay connects but SSH upgrade hasn't happened yet, the sync tries to resolve a hostname that isn't available. Seems to happen when the relay connection is slow or when the pod provisions before SSH is ready.
Workaround: Retry — sometimes works on second/third attempt after daemon restart
Severity: Intermittent blocking — ~50% of provisioning attempts fail

2. `gpu.jsonc` schema documentation mismatch

Fields in reference docs that don't match actual schema:
- hooks.readiness.command (string) → actual schema requires run (array of strings)
- inputs requires key + label fields, not name
- inputs.options requires objects with label + value, not plain strings
Reference file: /references/config.md shows simplified examples that don't pass schema validation
Severity: Medium — causes config validation errors on first attempt

3. `uv` not installed on pod images (intermittent)

Symptom: bash: line 1: uv: command not found during pip install phase, causing all dependency installation to fail
Cause: GPU CLI tries to use uv as the package manager but some pod images don't have uv installed
Note: On later attempts, the CLI auto-installed uv ("Installing uv package manager... downloading uv 0.10.4"). Inconsistent behavior — sometimes it auto-installs, sometimes it doesn't.
Workaround: Remove environment.python entirely and use environment.shell.steps with { "run": "pip install -r requirements.txt" } instead.
Severity: Intermittent blocking

4. apt-get lock contention on reused pods

Symptom: E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 748 (apt-get)
Cause: When a pod is reused from a previous run, a previous apt-get process may still be holding the dpkg lock
Workaround: None reliably — the install just silently fails for apt packages
Severity: Medium — apt packages may not install on reused pods

5. SSH metadata not available during workspace sync

Symptom: workspace sync failed: Failed to sync workspace to pod: Remote helper bootstrap failed: No SSH metadata for pod after 10s
Cause: The SSH upgrade from relay to direct SSH fails or is too slow, leaving the sync without SSH metadata
Related to: Bug #1 (DNS resolution failure) — both are sync-phase connectivity issues
Workaround: Retry after daemon restart
Severity: Intermittent blocking

6. GPU CLI auto-installs from requirements.txt even without environment.python config

Symptom: Even with no environment section in gpu.jsonc, GPU CLI auto-detects requirements.txt and runs uv pip install during the install phase
Impact: This is actually helpful but undocumented. However, it makes the environment.python config redundant and confusing.
Note: The auto-install uses uv (auto-downloaded) and caches packages on the global volume — subsequent runs show Python dependencies already installed (hash: ...)
Severity: Not a bug per se, but confusing behavior that conflicts with docs

7. torchvision circular import on pod images with pre-installed PyTorch

Symptom: RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error: partially initialized module 'torchvision' has no attribute 'extension' (most likely due to a circular import)
Cause: Pod images come with pre-installed PyTorch (e.g. torch==2.4.1+cu124, torchvision==0.19.1+cu124). When uv pip install upgrades torch/torchaudio from requirements.txt (to 2.6.0), the old torchvision (0.19.1) becomes incompatible. The transformers library triggers the circular import when loading models.
Workaround: Add torchvision explicitly to requirements.txt so it gets upgraded alongside torch/torchaudio to a compatible version
Severity: Blocking — any package that imports transformers models will fail on first deploy

8. `gpu run` output doesn't stream pip install completion

Symptom: gpu run shows pip download and uninstall progress but stops streaming output before showing the "Installed X packages" line. The health check loop runs blind while pip finishes installing in the background.
Cause: The relay connection or log streaming truncates long pip install output. The install actually completes (visible in gpu logs) but gpu run doesn't stream it.
Severity: Low — cosmetic, but makes debugging difficult during deploys

9. Reverse sync overwrites local file edits

Symptom: After editing files locally (e.g. tts_server.py, gpu.jsonc), gpu run syncs OLD versions from the pod back to the local machine, reverting all local changes
Cause: GPU CLI performs a bidirectional sync — it syncs workspace TO the pod before running, but also syncs FROM the pod back to local after the run (or when the relay dies). The outputs field in gpu.jsonc (["output/"]) should limit what syncs back, but the entire workspace appears to sync bidirectionally.
Impact: Any local edits made between gpu run calls get silently overwritten. This is especially destructive when iterating on server code — you fix a bug, the sync reverts the fix.
Workaround: Stop the pod before editing files, or re-apply edits after each gpu run
Severity: Blocking — makes iterative development extremely difficult

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU TTS - Remaining Bugs & Issues

GPU CLI Issues

1. Workspace sync DNS resolution failure (intermittent)

2. `gpu.jsonc` schema documentation mismatch

3. `uv` not installed on pod images (intermittent)

4. apt-get lock contention on reused pods

5. SSH metadata not available during workspace sync

6. GPU CLI auto-installs from requirements.txt even without environment.python config

7. torchvision circular import on pod images with pre-installed PyTorch

8. `gpu run` output doesn't stream pip install completion

9. Reverse sync overwrites local file edits

FilesExpand file tree

BUGS.md

Latest commit

History

BUGS.md

File metadata and controls

GPU TTS - Remaining Bugs & Issues

GPU CLI Issues

1. Workspace sync DNS resolution failure (intermittent)

2. gpu.jsonc schema documentation mismatch

3. uv not installed on pod images (intermittent)

4. apt-get lock contention on reused pods

5. SSH metadata not available during workspace sync

6. GPU CLI auto-installs from requirements.txt even without environment.python config

7. torchvision circular import on pod images with pre-installed PyTorch

8. gpu run output doesn't stream pip install completion

9. Reverse sync overwrites local file edits

2. `gpu.jsonc` schema documentation mismatch

3. `uv` not installed on pod images (intermittent)

8. `gpu run` output doesn't stream pip install completion