Summary
When TensorBoard's Reloader thread encounters an unhandled exception during a reload cycle (e.g., a transient network error while reading from a remote filesystem like GCS), the thread terminates permanently. TensorBoard's web server continues running, but no new data is ever loaded — the dashboard silently serves stale data with no indication to the user.
Steps to reproduce
- Start TensorBoard pointing to a GCS logdir:
tensorboard --logdir gs://bucket/path --bind_all --load_fast=false
- Temporarily lose network connectivity (e.g., Wi-Fi disconnect, VPN timeout).
- Restore connectivity.
Expected behavior
TensorBoard logs the error and retries on the next reload cycle. Data loading resumes once connectivity is restored.
Actual behavior
The Reloader thread dies with an unhandled exception:
Exception in thread Reloader:
...
google.auth.exceptions.TransportError: ... Failed to resolve 'oauth2.googleapis.com' ...
After this, TensorBoard never reloads data again, even after network is restored. The only recovery is to restart TensorBoard.
Root cause
The _reload function in data_ingester.py has no exception handling around the reload loop body:
def _reload():
while True:
# ... reload logic with no try/except ...
time.sleep(self._reload_interval)
Any exception propagates out of the loop, killing the thread/process.
Environment
- TensorBoard 2.20.0
- macOS (Apple Silicon)
- Python 3.12
- Using
gcsfs for GCS filesystem support (no TensorFlow installed)
Summary
When TensorBoard's
Reloaderthread encounters an unhandled exception during a reload cycle (e.g., a transient network error while reading from a remote filesystem like GCS), the thread terminates permanently. TensorBoard's web server continues running, but no new data is ever loaded — the dashboard silently serves stale data with no indication to the user.Steps to reproduce
Expected behavior
TensorBoard logs the error and retries on the next reload cycle. Data loading resumes once connectivity is restored.
Actual behavior
The
Reloaderthread dies with an unhandled exception:After this, TensorBoard never reloads data again, even after network is restored. The only recovery is to restart TensorBoard.
Root cause
The
_reloadfunction indata_ingester.pyhas no exception handling around the reload loop body:Any exception propagates out of the loop, killing the thread/process.
Environment
gcsfsfor GCS filesystem support (no TensorFlow installed)