Compute worker issues

Hopefully solving several points: #2223

## 1. Containers not removed

- [ ] 11/02/2026: submissions containers staying up forever 

<img width="1920" height="203" alt="Image" src="https://github.com/user-attachments/assets/98a01784-a7fe-4277-82f5-dd47d596c493" />


## 2. Wrong log when storage is full

When docker pull fails because of full storage, we have no clear logs.
See: 
- #2206 
- #2217

- [ ] Have the right error logs, and have them on the platform's UI
- [ ] Detect errors in `_get_container_image()`

Then it gets stuck in `Running` state.

## 3. Progress bar

Related: `show_progress` and the progress bar adds up to the mess:
- [ ] Make `show_progress()` more robust (not treating missing keys as errors)

- [x] Avoid printing the multiple error lines like this (#2223):

```python
2026-02-28 02:38:37.854 | ERROR    | compute_worker:show_progress:137 - There was an error showing the progress bar
2026-02-28 02:38:37.854 | ERROR    | compute_worker:show_progress:138 - 6
2026-02-28 02:38:37.955 | ERROR    | compute_worker:show_progress:137 - There was an error showing the progress bar
2026-02-28 02:38:37.955 | ERROR    | compute_worker:show_progress:138 - 1
```

## 4. Logs

- [x] Sometimes no submission logs (#2223)
- [ ] Add logs at the start of submission container with metadata of the competition and submission
- [ ] Add a clear log in the computer worker container with the competition title when receiving a submission
- [x] Similarly to other problems reported, sometimes we only have "Time limit exceeded" and no other logs (e.g. #1994) (#2223)
- [ ] Docker pull and progress bar should be shown during preparation:

<img width="695" height="129" alt="Image" src="https://github.com/user-attachments/assets/12f0d1e8-7600-49e2-9a61-9e58aa072d03" />

## 5. No space left

How to manage the disks? Should we limit docker images size?

## 6. Submissions not marked as Failed

#### Submissions stuck in "Running" or "Scoring" or status

- [x] Submissions stuck in "Scoring" state instead of "Failed" when the compute worker crashes (#2030, #2223)

Related issues:

- #2258 (grouped issue)
- #1203
- #1184
- #1257
- #1821
- #1994
- #2169
- #2177

- [x] Similarly, it looks like the status get stuck to "Preparing" when failing during this process.

Example failure during "Preparing":
```python
[2025-09-18 11:25:05,234: ERROR/ForkPoolWorker-2] Task compute_worker_run[fd956bf5-3e2d-4168-ab48-f0896dc80993] raised unexpected: OSError(28, 'No space left on device')
Traceback (most recent call last):
[...]
OSError: [Errno 28] No space left on device
```



## 7. Duplication of submission files

- [ ] #1874


## 8. To check

The log level is defined in this way in `compute_worker.py`:

```python
configure_logging(
    os.environ.get("LOG_LEVEL", "INFO"), os.environ.get("SERIALIZED", "false")
)
```

Generally we want as much log as possible, so we may want to be in "DEBUG" log level.


---
---

## 8. Directory structure problem

- [x] #1905


## 9. Docker pull failing

- [x] Docker pull failing

```
Pull for image: codalab/codalab-legacy:py39 returned a non-zero exit code! Check if the docker image exists on docker hub.
```

Related issues:

- #1184
- #1278
- #1263

Solution:

- To have more logs, we need to update `compute_worker.py` so we print more logs in the logger (#1283).

## 10. Logs at the wrong place

- [x] Docker pull error during scoring are written in ingestion stderr instead of scoring stdrerr (#1204)
Solved by: #1214 


## 11. No hostname in server status when status is "Preparing"

- [x] The "Preparing" status means that the worker is downloading the necessary data and programs to run the submission. We should already have a hostname in the server status page during this phase, but it is not the case. (fixed in https://github.com/codalab/codabench/pull/2030)

https://www.codabench.org/server_status


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute worker issues #1205

1. Containers not removed

2. Wrong log when storage is full

3. Progress bar

4. Logs

5. No space left

6. Submissions not marked as Failed

Submissions stuck in "Running" or "Scoring" or status

7. Duplication of submission files

8. To check

8. Directory structure problem

9. Docker pull failing

10. Logs at the wrong place

11. No hostname in server status when status is "Preparing"

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compute worker issues #1205

Description

1. Containers not removed

2. Wrong log when storage is full

3. Progress bar

4. Logs

5. No space left

6. Submissions not marked as Failed

Submissions stuck in "Running" or "Scoring" or status

7. Duplication of submission files

8. To check

8. Directory structure problem

9. Docker pull failing

10. Logs at the wrong place

11. No hostname in server status when status is "Preparing"

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions