Skip to content

include verbose output alongside errors#772

Open
ericcurtin wants to merge 1 commit intomainfrom
fix-issue
Open

include verbose output alongside errors#772
ericcurtin wants to merge 1 commit intomainfrom
fix-issue

Conversation

@ericcurtin
Copy link
Contributor

Summary

  • When ExtractLlamaCppError matches a known error pattern (e.g. CUDA OOM, Metal OOM, model loading failure), it now includes both the user-friendly hint and the original verbose llama.cpp output
  • Previously the verbose output was discarded, making it harder to diagnose edge cases

Example output before:

llama.cpp failed: not enough GPU memory to load the model (CUDA)

Example output after:

llama.cpp failed: not enough GPU memory to load the model (CUDA)

Verbose output:
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 10723.15 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate CUDA0 buffer of size 11244037120
...

Fixes #733

Test plan

  • Updated existing unit tests in errors_test.go to expect the new format
  • All tests pass: go test ./pkg/inference/backends/llamacpp/...

🤖 Generated with Claude Code

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider whether including the full raw llama.cpp output in user-facing errors could leak sensitive paths or configuration details, and if so, restrict or sanitize what is surfaced depending on context.
  • For very large llama.cpp logs, returning the entire verbose output in the error string might become unwieldy; you may want to truncate to a reasonable size or include only the most relevant lines while keeping the rest available in debug logs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider whether including the full raw llama.cpp output in user-facing errors could leak sensitive paths or configuration details, and if so, restrict or sanitize what is surfaced depending on context.
- For very large llama.cpp logs, returning the entire verbose output in the error string might become unwieldy; you may want to truncate to a reasonable size or include only the most relevant lines while keeping the rest available in debug logs.

## Individual Comments

### Comment 1
<location path="pkg/inference/backends/llamacpp/errors_test.go" line_range="16-17" />
<code_context>
-			expected: "not enough GPU memory to load the model (Metal)",
+			name:  "Metal buffer allocation failure",
+			input: "ggml_metal_buffer_init: error: failed to allocate buffer, size = 2048.00 MiB",
+			expected: "not enough GPU memory to load the model (Metal)\n\nVerbose output:\n" +
+				"ggml_metal_buffer_init: error: failed to allocate buffer, size = 2048.00 MiB",
 		},
 		{
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for inputs with leading/trailing whitespace to verify the TrimSpace + formatting behavior.

Since `ExtractLlamaCppError` now wraps the friendly message with the TrimSpace’d verbose output, please add a case where the input has leading/trailing newlines or spaces (like a typical llama.cpp log ending in a newline). This will confirm we don’t introduce extra blank lines, don’t strip meaningful content, and that the `\n\nVerbose output:\n` layout remains stable with such inputs.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ericcurtin ericcurtin changed the title fix: include verbose output alongside friendly error message in llama.cpp errors include verbose output alongside errors Mar 21, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances error reporting for the llama.cpp backend by including the original verbose output alongside a user-friendly error message when a known error pattern is detected. This is a useful improvement for debugging. The implementation in ExtractLlamaCppError is straightforward, and the unit tests have been correctly updated to reflect the new error format. The changes are correct and achieve the stated goal.

When a known error pattern is matched, preserve the original verbose
output below the user-friendly hint so users can still diagnose issues
from the error message without needing to check log files separately.

Fixes #733

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keep verbose error message

1 participant