Add tool usage visibility to chat messages with popover details#880
Add tool usage visibility to chat messages with popover details#880
Conversation
Add a dedicated tool usage indicator to assistant messages in both
document and corpus chat views. When an agent uses tools, a blue
wrench badge ("X tools used") appears in the message header. Hovering
over it shows a rich popover with each tool call's name, input
arguments, and output result.
Backend changes capture tool result content in timeline entries (previously
only the tool name was stored). Search tools (similarity_search,
search_exact_text) now also emit tool_result timeline entries with
human-readable summaries.
- Add `result` field to TimelineEntry schema and TypedDict
- Capture tool result content in TimelineBuilder (truncated to 500 chars)
- Pass tool result through ThoughtEvent metadata in pydantic_ai_agents
- Add ToolUsageIndicator component with hover ToolCallPopover
- Add MessageHeader layout for inline badge display
- Pass tool_result from WebSocket data in CorpusChat and ChatTray
https://claude.ai/code/session_01GHGe5Ye7Ag9uDrw31mSBXc
Code Review: PR #880 - Add tool usage visibility to chat messagesOverall AssessmentThis PR adds valuable UX improvements by exposing tool usage information in the chat UI. The implementation is well-structured with good separation of concerns between frontend and backend. However, there are several areas that need attention before merging. ✅ Strengths
🔴 Critical Issues1. Missing Cleanup in ToolUsageIndicator (frontend/src/components/widgets/chat/ChatMessage.tsx:1158)The Fix needed: const ToolUsageIndicator: React.FC<{
timeline: TimelineEntry[];
}> = ({ timeline }) => {
const toolCalls = useMemo(() => extractToolCalls(timeline), [timeline]);
const [isOpen, setIsOpen] = useState(false);
const closeTimeoutRef = useRef<ReturnType<typeof setTimeout> | null>(null);
// ADD THIS useEffect:
useEffect(() => {
return () => {
if (closeTimeoutRef.current) {
clearTimeout(closeTimeoutRef.current);
}
};
}, []);
// ... rest of component2. Tool Call Pairing Logic May Fail (frontend/src/components/widgets/chat/ChatMessage.tsx:1119-1142)The
Issue: If you call Suggested improvement: 3. Potential XSS Risk (frontend/src/components/widgets/chat/ChatMessage.tsx:1095)Tool arguments are rendered with <ToolCallResultBlock>{call.result}</ToolCallResultBlock>If tool results contain user-generated content or malicious strings, this could be an XSS vector. The CLAUDE.md mentions: "User-generated content in JSON fields must be escaped on frontend." Fix needed: Either sanitize
|
Critical fixes: - Fix memory leak: add useEffect cleanup for closeTimeoutRef on unmount - Fix tool call pairing: use consumed-set approach so duplicate calls to the same tool are correctly paired with their respective results - Add keyboard navigation (Enter/Space/Escape) and ARIA attributes (role, aria-expanded, aria-haspopup, aria-describedby) to popover Code quality: - Extract _extract_tool_result_summary() helper in pydantic_ai_agents to eliminate 4x duplicated result extraction with bare except blocks - Use TOOL_UNKNOWN_LABEL constant from constants.ts instead of magic string - Use stable call.id keys instead of array index in React list rendering - Add JSDoc to extractToolCalls, formatToolName, ToolUsageIndicator - Document XSS safety (React text nodes auto-escape, no innerHTML) - Improve truncation UX: include original char count in truncation suffix - Add inline doc to timeline_schema.py result field Tests: - Add backend test_timeline_utils.py: 14 tests covering tool result capture, truncation, pairing, reset, dict passthrough, type coercion - Add frontend chatMessageUtils.test.ts: 12 tests covering extractToolCalls pairing, duplicate tools, stable IDs, formatToolName https://claude.ai/code/session_01GHGe5Ye7Ag9uDrw31mSBXc
PR Review: Add tool usage visibility to chat messages with popover detailsSummaryThis is a well-implemented feature that significantly improves user visibility into LLM agent tool usage. The PR adds a polished UI indicator showing which tools were used in assistant responses, with detailed input/output information available via hover popover. Code Quality: Excellent ✅Strengths1. Comprehensive Testing
2. DRY Principle Applied
3. Memory Leak Prevention
4. Accessibility
5. Correct Pairing Algorithm
|
|
6. Security Considerations
7. Documentation
8. Performance Optimizations
Best Practices Adherence ✅Follows CLAUDE.md conventions:
|
Areas for Consideration1. Backend: Inconsistent Result Formatting The _extract_tool_result_summary() helper returns different formats:
Suggestion: Consider standardizing the format. If some tools return actual data (like ask_document answer text) while others return counts, users may be confused about what to expect. Either:
File: opencontractserver/llms/agents/pydantic_ai_agents.py:674-697 2. Frontend: TypeScript Type Safety The args field uses any type in TimelineEntry interface (line 52). Consider using Record<string, unknown> or JsonValue type instead of any for better type safety, consistent with ToolCallInfo.args. File: frontend/src/components/widgets/chat/ChatMessage.tsx:52 3. UX: Truncation Visibility When results are truncated, users see ... (truncated from 600 chars) but have no way to view the full content. Suggestion (optional enhancement): Consider adding a View full output button/link when truncated, or increase MAX_TOOL_RESULT_LENGTH if 500 chars proves too limiting in practice. File: opencontractserver/llms/agents/timeline_utils.py:84-88 4. UI Polish: Popover Positioning Popover is positioned right: 0 which works well for most cases, but on mobile its adjusted to right: -1rem. The popover could overflow viewport on very narrow screens or when badge is at screen edge. The current implementation is reasonable, but consider testing on various device widths. File: frontend/src/components/widgets/chat/ChatMessage.tsx:161-164 |
Performance Considerations ✅Strengths:
No concerns - Performance appears well-optimized. Security Concerns ✅Strengths:
No security vulnerabilities identified. Test Coverage ✅Backend (test_timeline_utils.py):
Frontend (chatMessageUtils.test.ts):
Coverage is comprehensive and follows project testing patterns. |
Bugs or IssuesNo blocking issues identified. The code is production-ready. Minor NitFile: opencontractserver/llms/agents/timeline_utils.py:87 The truncation message converts tool_result_content to string twice (once on line 83, once in the f-string). For efficiency, consider storing the original length before truncation. This is a very minor optimization and does not affect correctness. Overall AssessmentLGTM with minor suggestions 🚀 This is high-quality work that demonstrates:
The suggestions above are minor improvements, not blockers. The PR is ready to merge as-is, with those items potentially addressed in future iterations based on user feedback. Recommendation: Approve and merge ✅ |
PR Review: Tool Usage Visibility in Chat UII've completed a thorough review of this PR. Overall, this is excellent work with solid implementation, comprehensive testing, and good documentation. The code quality is high and follows the project's conventions well. ✅ Strengths1. Comprehensive Test Coverage
2. Security and XSS Prevention
3. Code Quality
4. Accessibility
5. Performance Considerations
6. Changelog and Documentation
🔍 Issues and RecommendationsCritical: Potential Pairing Bug in extractToolCallsLocation: frontend/src/components/widgets/chat/ChatMessage.tsx:876-940 The current implementation searches forward from each tool_call to find the first unconsumed tool_result with matching tool name. However, if tool_result entries appear before their corresponding tool_call in the timeline, they will never be paired. Recommended fix: Either:
Impact: Low if backend guarantees ordering, but could cause silent failures if timeline order changes. Minor: Inconsistent Error LoggingLocation: opencontractserver/llms/agents/pydantic_ai_agents.py:621-638 Using logger.error for diagnostic/informational messages. These should be logger.debug or logger.info. Impact: Low, but pollutes error logs with non-error information. Recommendation: Change diagnostic logs to appropriate level at lines 621-625, 635-637, and similar patterns. Code Style: Trailing CommasThe PR includes many formatting changes adding trailing commas. While this follows project conventions, these changes add noise to the diff. Recommendation: For future PRs, consider running formatters before starting work to avoid mixing formatting changes with feature changes. Impact: None functionally, but reduces review clarity. 🔒 Security Review✅ XSS Prevention: Properly handled via React text nodes 🎯 Performance Review✅ Memoization: useMemo for extractToolCalls and hasToolUsage Potential Concern: The extractToolCalls function uses O(n²) algorithm. For very large timelines this could be slow, though timelines are typically small. 📋 Test Coverage AssessmentFrontend: ✅ Excellent (10 test cases, all edge cases covered) 🎨 UI/UX Review✅ Visual Design: Clean, professional badge design 📝 Documentation Quality✅ Changelog: Excellent detail with file locations ✨ Adherence to CLAUDE.md Guidelines✅ No Claude credits: Correctly avoided in commit messages 🎯 Final RecommendationApproval Status: ✅ Approve with minor suggestions This is high-quality work that significantly improves the user experience. The implementation is solid, well-tested, and follows project conventions. Required Before Merge:
Optional Improvements (Post-Merge):
Great work on this feature! The tool usage visibility will be very helpful for users understanding what the agent is doing. 🚀 |
…age-k5Q20 # Conflicts: # CHANGELOG.md
Code Review: Tool Usage Visibility FeatureThis PR successfully adds tool usage visibility to the chat UI with a badge and popover showing tool call details. The implementation is well-structured with comprehensive test coverage on both frontend and backend. ✅ Strengths1. Excellent Architecture & Design
2. Outstanding Test Coverage
3. Security & Performance
4. Excellent User Experience
5. Code Quality
🔍 Issues & ConcernsCRITICAL: Potential React Key Stability IssueLocation: frontend/src/components/widgets/chat/ChatMessage.tsx:891 The key generation uses calls.length which can create non-unique keys: id: `${entry.tool ?? TOOL_UNKNOWN_LABEL}-${calls.length}`,Problem: For duplicate tool calls, this can cause React rendering issues or component state loss. Fix: Use timeline index instead: id: `${entry.tool ?? TOOL_UNKNOWN_LABEL}-${i}`, // Use timeline index iMINOR: Timeline Indicator LogicLocation: frontend/src/components/widgets/chat/ChatMessage.tsx:1898 The old timeline indicator is hidden when tools are used. Consider adding a clarifying comment: {/* Show step count indicator only when no tools were used (tool badge replaces it) */}
{effectiveHasTimeline && !hasToolUsage && (MINOR: Backend Tool Result Extraction DuplicationLocation: opencontractserver/llms/agents/pydantic_ai_agents.py:626-700 Duplicated result extraction logic for similarity_search and search_exact_text. Consider extracting to the _extract_tool_result_summary helper function to reduce duplication. MINOR: Error Handling Fallback MessageLocation: opencontractserver/llms/agents/pydantic_ai_agents.py:1300-1315 The fallback "Completed" is fine, but consider "Tool execution completed (details unavailable)" for clarity. 📊 Performance ConsiderationsGood:
Note: extractToolCalls has O(n²) complexity but acceptable for typical small timelines (<20 entries) 🔒 Security ReviewExcellent XSS Protection:
🎯 Recommendations SummaryMust Fix Before Merge
Nice to Have
Future Enhancements (Not blocking)
✅ Overall AssessmentHigh-quality PR with:
Recommendation: Approve with one critical fix (React key stability) The implementation successfully exposes tool usage in a polished, accessible way. Great work! 🧪 Testing NotesTo verify before merge:
Per CLAUDE.md baseline rules, ensure all tests pass and TypeScript compiles. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…cation - Use timeline index for stable React keys in extractToolCalls - Document forward-only pairing assumption in extractToolCalls - Change args type from any to Record<string, unknown> on TimelineEntry - Downgrade DIAGNOSTIC logger.error calls to logger.debug - Store string length before truncation to avoid double conversion - Fix prettier formatting across affected files
PR Review: Add Tool Usage Visibility to Chat MessagesThis PR adds a helpful feature that exposes tool usage information to users through a badge and popover UI. The implementation is comprehensive with good test coverage. Here's my detailed feedback: ✅ StrengthsCode Quality
Architecture
UX
🔍 Issues & Suggestions1. Backend: Changed logger.error to logger.debug
|
…over 10 functional tests covering badge visibility, hover/keyboard interaction, ARIA accessibility, multi-tool display, and pending tool state. 3 screenshot tests capturing the badge, single-tool popover, and multi-tool popover for auto-generated documentation.
Code Review: Tool Usage Visibility in Chat MessagesI've completed a thorough review of PR #880. This is an excellent feature addition that enhances transparency by exposing agent tool usage to users. The implementation is well-structured with comprehensive tests and documentation. Below are my findings: ✅ StrengthsArchitecture & Design
Code Quality
Testing
🔍 Issues & Recommendations1. Memory Leak Already Fixed ✅The 2. React Key Stability ✅Using timeline index in the key is stable for this use case since timeline entries are append-only and never reordered (line 1144 in ChatMessage.tsx). 3. Tool Call Pairing Logic ✅The consumed-set approach (lines 1133-1163) correctly handles duplicate tool calls. The forward-only search assumption is documented and valid. 4. Performance Consideration: MemoizationThe
5. XSS Safety ✅Line 842-844 documentation correctly notes that content is rendered as React text nodes (auto-escaped), not via 6. Backend: Type Coercion Safety ✅In 7. Logging Levels ✅The switch from 8. Frontend: Styled Component Complexity
9. Truncation UX ✅Line 87-89 shows truncated char count: 10. Popover PositioningThe popover uses Recommendation: Consider using a tooltip library (like Radix UI Popover) that handles auto-positioning based on viewport boundaries. Not blocking - current implementation works for typical use cases. 🔒 Security Review
📊 Performance Assessment
📝 Documentation & Changelog
🧪 Test Coverage AssessmentBackend
Frontend Unit
Frontend Component
Coverage is excellent - all critical paths tested. ✨ Final VerdictAPPROVED ✅ This is a well-engineered feature that follows the project's conventions and best practices. The code is:
Minor Suggestions (Non-Blocking)
Both are nice-to-haves and can be addressed in follow-up work if needed. 📋 Checklist
Great work! 🎉 |
…ettle Screenshots were faded because ChatMessage has transparent backgrounds designed for the chat panel's #f8fafc bg and a framer-motion entrance animation (opacity 0→1, 300ms). Added the proper background wrapper and a 500ms settle delay before capture.
Code Review: Tool Usage Visibility EnhancementThis PR successfully implements tool usage visibility in the chat UI with comprehensive testing and well-structured code. Here's my detailed feedback: ✅ Strengths1. Excellent Test Coverage
2. Clean Architecture
3. Security & Accessibility
4. Performance Considerations
5. User Experience
🔍 Potential Issues & Suggestions1. Backend: Tool Result Summary Extraction
|
Expand step-7 with screenshots showing the tool usage badge, single-tool popover, and multi-tool popover. Includes a reference table of available agent tools.
…hangelog - Truncate tool result summaries at source in _extract_tool_result_summary() using MAX_TOOL_RESULT_LENGTH (500 chars) to avoid passing large strings through ThoughtEvent metadata - Replace word-break: break-all with word-break: break-word for more natural text wrapping in message content - Fix CI screenshot workflow: backup freshly generated screenshots before rebase, abort failed rebase and fall back to merge, then force-restore screenshots regardless of conflict state. Also add --reporter=list flag to prevent test hangs in CI. - Enhance changelog with file locations, line numbers, and impact statements
PR Review: Add Tool Usage Visibility to Chat MessagesOverviewThis PR successfully implements tool usage transparency in the chat UI by adding a visual indicator badge and detailed popover showing tool call inputs and outputs. The implementation follows the project's architectural patterns and includes comprehensive testing. ✅ Strengths1. Excellent Architecture & Code Organization
2. Strong Testing Coverage
3. Performance & UX Considerations
4. Documentation & Maintainability
🔍 Areas for Improvement1. Security ConsiderationsXSS Risk in Tool Result Display (Medium Priority)
// Suggested addition in ChatMessage.tsx
import DOMPurify from 'dompurify';
// In ToolCallResultBlock rendering:
<ToolCallResultBlock>
{DOMPurify.sanitize(call.result, { ALLOWED_TAGS: [] })}
</ToolCallResultBlock>2. Code Quality IssuesPotential Race Condition in Timeout Management (Low Priority)
const handleMouseLeave = () => {
if (closeTimeoutRef.current) {
clearTimeout(closeTimeoutRef.current);
}
closeTimeoutRef.current = setTimeout(() => setIsOpen(false), 200);
};Missing Null Check in Backend (Low Priority)
if not hasattr(event, 'result') or not hasattr(event.result, 'content'):
return "Completed"
result_content = event.result.content3. Testing GapsMissing Test Cases:
Recommendation: Add test cases: // Frontend test
it("handles extremely long tool names gracefully", () => {
const longName = "very_long_tool_name_with_many_segments_that_exceeds_normal_length";
const result = formatToolName(longName);
expect(result.length).toBeLessThan(100); // Add length constraint
});4. Performance ConsiderationsUnnecessary Re-renders (Low Priority)
const toolCallInfo = useMemo(() => {
const calls = extractToolCalls(timeline);
return { calls, hasToolUsage: calls.length > 0 };
}, [timeline]);Timeline Builder N+1 Pattern (Low Priority)
5. Documentation IssuesIncomplete ARIA Labels
<ToolBadge
aria-label={`${toolCalls.length} ${toolCalls.length === 1 ? "tool" : "tools"} used. Press Enter to view details.`}
// ... other props
/>Missing JSDoc Comments
🎯 Critical IssuesNone Found ✅The PR is well-implemented with no blocking issues. The suggestions above are enhancements. 📊 Code Metrics
🚀 Recommendations for MergeBefore Merge (Optional but Recommended):
Follow-up PRs (Non-blocking):
💡 SuggestionsFuture Enhancements:
Consistency with CLAUDE.md:
Final Verdict: APPROVE with Minor Suggestions ✅This is a high-quality PR that adds valuable functionality with excellent test coverage and documentation. The implementation follows project conventions and architectural patterns. The suggested improvements are minor and non-blocking. Estimated Risk: Low |
Code Review - PR #880: Tool Usage VisibilityI've completed my review of this PR. Overall, this is excellent work with strong implementation quality, comprehensive testing, and good documentation. Here's my detailed feedback: ✅ StrengthsArchitecture & Design
Code Quality
Testing
Documentation
🔍 Issues FoundCritical IssuesNone found. Code Quality Issues1. Inconsistent truncation location (Minor)Location: The PR description claims truncation happens "at source" in
Impact: Double truncation logic creates maintenance burden. If Recommendation: Remove truncation from 2. Pairing algorithm could be more efficient (Minor)Location: The current O(n²) nested loop works fine for typical timelines (5-10 entries), but could be O(n) with a single pass using a stack. Current approach: for (let i = 0; i < timeline.length; i++) {
if (entry.type !== "tool_call") continue;
for (let j = i + 1; j < timeline.length; j++) { // O(n²)
if (timeline[j].type === "tool_result" && timeline[j].tool === entry.tool) {
// pair found
}
}
}Impact: Negligible for real-world usage (timelines rarely exceed 20 entries), but worth noting. Recommendation: Keep current implementation for clarity, but add a comment noting the complexity tradeoff. 3. word-break CSS property (Very Minor)Location: Changed from Recommendation: Use 💡 SuggestionsPerformance Optimizations1. Memoize tool call extractionThe Suggestion: Consider memoizing based on 2. Debounced hover close timeoutThe 200ms delay (line 1206) is reasonable, but consider making it a constant at the file level for easier tuning. Accessibility Enhancements1. Popover focus managementWhen the popover opens via keyboard (Enter/Space), focus should move into the popover for screen reader navigation. Currently focus stays on the badge. Suggestion: Add 2. Live region for tool resultsWhen streaming tool results in real-time, screen reader users don't know when results arrive. Consider adding Security1. XSS prevention documentation is excellentThe comment on lines 841-843 is perfect: // XSS safety note: Tool result and argument content is rendered as React text
// nodes (not via dangerouslySetInnerHTML), so HTML entities are auto-escaped.
// Content originates from backend agent tool execution, not user input.No issues here - just wanted to highlight this as a best practice example. 📋 Minor ObservationsFrontend
Backend
CI/Screenshots
🎯 Test CoverageWhat's tested:✅ Tool call/result pairing (including duplicates) What could be tested:
None of these are critical, but worth considering for future test additions. 📊 Performance ConsiderationsPositive:
Potential concerns:
🔐 Security AssessmentThreat Model:
No security concerns found. 🏁 ConclusionThis PR is ready to merge pending minor fixes: Required before merge:
Recommended (can be follow-up):
Nice to have:
🎖️ Overall AssessmentCode Quality: ⭐⭐⭐⭐⭐ (5/5) Total: ⭐⭐⭐⭐⭐ Excellent work! This feature significantly improves agent transparency and user trust. The implementation follows all project conventions, has comprehensive tests, and demonstrates strong engineering practices. cc: @JSv4 |
Summary
This PR enhances the chat UI to expose tool usage information to users by adding a tool usage badge to assistant messages and a detailed popover showing each tool call's inputs and outputs.
Key Changes
Frontend (ChatMessage Component)
ToolUsageIndicatorcomponent that displays a wrench icon badge ("X tools used") in the message header for assistant messages that executed toolsextractToolCalls(): Pairs tool_call and tool_result timeline entriesformatToolName(): Converts tool names to user-friendly display formatMessageHeadercontainer alongside the username, replacing the previousTimelineIndicatorwhen tools are usedBackend (Timeline & Agent Updates)
resultfield to timeline entries to store tool output contentsimilarity_search: Emits result summary ("Found X matching annotations")search_exact_text: Emits result summary ("Found X exact text matches")ask_document: Extracts and emits the answer textData Model Updates
TimelineEntryinterface to include optionalresultfieldToolCallInfointerface for popover data structuretool_resultfrom metadata to timeline entriesImplementation Details
timeline.some((e) => e.type === "tool_call")to determine if badge should displayhttps://claude.ai/code/session_01GHGe5Ye7Ag9uDrw31mSBXc