include verbose output alongside errors by ericcurtin · Pull Request #772 · docker/model-runner

ericcurtin · 2026-03-21T17:26:31Z

Summary

When ExtractLlamaCppError matches a known error pattern (e.g. CUDA OOM, Metal OOM, model loading failure), it now includes both the user-friendly hint and the original verbose llama.cpp output
Previously the verbose output was discarded, making it harder to diagnose edge cases

Example output before:

llama.cpp failed: not enough GPU memory to load the model (CUDA)

Example output after:

llama.cpp failed: not enough GPU memory to load the model (CUDA)

Verbose output:
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 10723.15 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate CUDA0 buffer of size 11244037120
...

Fixes #733

Test plan

Updated existing unit tests in errors_test.go to expect the new format
All tests pass: go test ./pkg/inference/backends/llamacpp/...

🤖 Generated with Claude Code

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Consider whether including the full raw llama.cpp output in user-facing errors could leak sensitive paths or configuration details, and if so, restrict or sanitize what is surfaced depending on context.
For very large llama.cpp logs, returning the entire verbose output in the error string might become unwieldy; you may want to truncate to a reasonable size or include only the most relevant lines while keeping the rest available in debug logs.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider whether including the full raw llama.cpp output in user-facing errors could leak sensitive paths or configuration details, and if so, restrict or sanitize what is surfaced depending on context.
- For very large llama.cpp logs, returning the entire verbose output in the error string might become unwieldy; you may want to truncate to a reasonable size or include only the most relevant lines while keeping the rest available in debug logs.

## Individual Comments

### Comment 1
<location path="pkg/inference/backends/llamacpp/errors_test.go" line_range="16-17" />
<code_context>
-			expected: "not enough GPU memory to load the model (Metal)",
+			name:  "Metal buffer allocation failure",
+			input: "ggml_metal_buffer_init: error: failed to allocate buffer, size = 2048.00 MiB",
+			expected: "not enough GPU memory to load the model (Metal)\n\nVerbose output:\n" +
+				"ggml_metal_buffer_init: error: failed to allocate buffer, size = 2048.00 MiB",
 		},
 		{
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for inputs with leading/trailing whitespace to verify the TrimSpace + formatting behavior.

Since `ExtractLlamaCppError` now wraps the friendly message with the TrimSpace’d verbose output, please add a case where the input has leading/trailing newlines or spaces (like a typical llama.cpp log ending in a newline). This will confirm we don’t introduce extra blank lines, don’t strip meaningful content, and that the `\n\nVerbose output:\n` layout remains stable with such inputs.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pkg/inference/backends/llamacpp/errors_test.go

gemini-code-assist

Code Review

This pull request enhances error reporting for the llama.cpp backend by including the original verbose output alongside a user-friendly error message when a known error pattern is detected. This is a useful improvement for debugging. The implementation in ExtractLlamaCppError is straightforward, and the unit tests have been correctly updated to reflect the new error format. The changes are correct and achieve the stated goal.

When a known error pattern is matched, preserve the original verbose output below the user-friendly hint so users can still diagnose issues from the error message without needing to check log files separately. Fixes #733 Signed-off-by: Eric Curtin <eric.curtin@docker.com>

ericcurtin force-pushed the fix-issue branch from 6738188 to ff95732 Compare March 21, 2026 17:27

sourcery-ai bot reviewed Mar 21, 2026

View reviewed changes

pkg/inference/backends/llamacpp/errors_test.go Show resolved Hide resolved

ericcurtin changed the title ~~fix: include verbose output alongside friendly error message in llama.cpp errors~~ include verbose output alongside errors Mar 21, 2026

gemini-code-assist bot reviewed Mar 21, 2026

View reviewed changes

ericcurtin force-pushed the fix-issue branch from ff95732 to 7c3cb3e Compare March 21, 2026 17:31

ericcurtin force-pushed the fix-issue branch from 7c3cb3e to ce6d7a8 Compare March 21, 2026 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include verbose output alongside errors#772

include verbose output alongside errors#772
ericcurtin wants to merge 1 commit intomainfrom
fix-issue

ericcurtin commented Mar 21, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericcurtin commented Mar 21, 2026

Summary

Test plan

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant