Open
Conversation
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider whether including the full raw llama.cpp output in user-facing errors could leak sensitive paths or configuration details, and if so, restrict or sanitize what is surfaced depending on context.
- For very large llama.cpp logs, returning the entire verbose output in the error string might become unwieldy; you may want to truncate to a reasonable size or include only the most relevant lines while keeping the rest available in debug logs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider whether including the full raw llama.cpp output in user-facing errors could leak sensitive paths or configuration details, and if so, restrict or sanitize what is surfaced depending on context.
- For very large llama.cpp logs, returning the entire verbose output in the error string might become unwieldy; you may want to truncate to a reasonable size or include only the most relevant lines while keeping the rest available in debug logs.
## Individual Comments
### Comment 1
<location path="pkg/inference/backends/llamacpp/errors_test.go" line_range="16-17" />
<code_context>
- expected: "not enough GPU memory to load the model (Metal)",
+ name: "Metal buffer allocation failure",
+ input: "ggml_metal_buffer_init: error: failed to allocate buffer, size = 2048.00 MiB",
+ expected: "not enough GPU memory to load the model (Metal)\n\nVerbose output:\n" +
+ "ggml_metal_buffer_init: error: failed to allocate buffer, size = 2048.00 MiB",
},
{
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for inputs with leading/trailing whitespace to verify the TrimSpace + formatting behavior.
Since `ExtractLlamaCppError` now wraps the friendly message with the TrimSpace’d verbose output, please add a case where the input has leading/trailing newlines or spaces (like a typical llama.cpp log ending in a newline). This will confirm we don’t introduce extra blank lines, don’t strip meaningful content, and that the `\n\nVerbose output:\n` layout remains stable with such inputs.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Contributor
There was a problem hiding this comment.
Code Review
This pull request enhances error reporting for the llama.cpp backend by including the original verbose output alongside a user-friendly error message when a known error pattern is detected. This is a useful improvement for debugging. The implementation in ExtractLlamaCppError is straightforward, and the unit tests have been correctly updated to reflect the new error format. The changes are correct and achieve the stated goal.
When a known error pattern is matched, preserve the original verbose output below the user-friendly hint so users can still diagnose issues from the error message without needing to check log files separately. Fixes #733 Signed-off-by: Eric Curtin <eric.curtin@docker.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ExtractLlamaCppErrormatches a known error pattern (e.g. CUDA OOM, Metal OOM, model loading failure), it now includes both the user-friendly hint and the original verbose llama.cpp outputExample output before:
Example output after:
Fixes #733
Test plan
errors_test.goto expect the new formatgo test ./pkg/inference/backends/llamacpp/...🤖 Generated with Claude Code