-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
StringyMcStringFace is a binary string extraction tool that analyzes executables and extracts meaningful strings with semantic classification and scoring. The tool needs a flexible output formatting system to support multiple output formats optimized for different use cases (interactive analysis, automation, YARA rule generation).
Currently, src/output/mod.rs exists but only contains a placeholder comment. The architecture is well-defined in docs/src/output-formats.md and docs/src/architecture.md, and the core data structures (FoundString, Encoding, Tag, etc.) are already implemented in src/types.rs.
Problem Statement
The tool requires a robust output formatting framework that:
- Supports multiple output formats (Human-readable tables, JSON Lines, YARA rules)
- Provides a consistent trait-based interface for formatters
- Allows easy extension for future formats (CSV, XML, Markdown)
- Handles edge cases (special character escaping, long string truncation, color coding)
- Integrates with the existing
FoundStringdata structure - Supports output customization (filtering, sorting, field selection)
Requirements (from Requirement 6.1)
- Trait Design: Create a
Formattertrait that defines the interface for all output formatters - Configuration: Implement output configuration options (format selection, color support, truncation limits)
- Core Implementations: Provide initial implementations for:
- Human-readable table format (default)
- JSON Lines format (machine-readable)
- YARA rule format (detection rules)
- Error Handling: Proper error types for formatting failures
- Testing: Unit tests for each formatter with edge cases
- Documentation: Inline documentation for the trait and configuration options
Proposed Solution
1. Module Structure
src/output/
├── mod.rs # Public API, Formatter trait, OutputConfig
├── human.rs # HumanFormatter (interactive table view)
├── json.rs # JsonFormatter (JSON Lines)
└── yara.rs # YaraFormatter (YARA rules)
2. Trait Design
pub trait Formatter {
/// Format a collection of strings for output
fn format(&self, strings: &[FoundString], config: &OutputConfig) -> Result<String>;
/// Format a single string (for streaming output)
fn format_one(&self, string: &FoundString, config: &OutputConfig) -> Result<String>;
/// Write header/preamble if needed
fn write_header(&self, writer: &mut dyn Write, config: &OutputConfig) -> Result<()>;
/// Write footer/postamble if needed
fn write_footer(&self, writer: &mut dyn Write, config: &OutputConfig) -> Result<()>;
}3. Configuration Structure
pub struct OutputConfig {
pub format: OutputFormat,
pub color: bool,
pub truncate_length: Option<usize>,
pub min_score: Option<i32>,
pub sections_filter: Option<Vec<String>>,
pub tags_filter: Option<Vec<Tag>>,
pub max_results: Option<usize>,
}
pub enum OutputFormat {
Human,
Json,
Yara,
}4. Implementation Details
Human Formatter
- Use
comfy-tableor similar crate for table rendering - Implement color coding based on score ranges (green: 80+, yellow: 60-79, red: <60)
- Handle truncation with "..." indicator for long strings
- Sort by score (descending) by default
JSON Formatter
- One JSON object per line (JSON Lines format)
- Leverage existing
serdederives onFoundString - No pretty-printing for pipeline compatibility
- Each object contains all fields from
FoundString
YARA Formatter
- Proper escaping for YARA string syntax
- Group strings by semantic tag (URLs, GUIDs, paths, etc.)
- Include metadata (source file, generation timestamp)
- Add appropriate
ascii/widemodifiers based on encoding - Filter to high-confidence strings (score >= 80) by default
5. Testing Strategy
- Unit tests for each formatter with sample
FoundStringinstances - Edge cases:
- Empty string collections
- Strings with special characters (quotes, backslashes, newlines)
- Very long strings (truncation)
- UTF-16 encoded strings in YARA output
- Null/missing fields (section, rva)
- Integration tests with real binary analysis output
6. Integration Points
- Called from
main.rsafter classification and ranking phases - Receives sorted
Vec<FoundString>from the analysis pipeline - Configuration driven by CLI arguments (parsed by
clap) - Output to stdout by default, file redirection supported
Acceptance Criteria
-
Formattertrait defined with documentation -
OutputConfigstructure implemented with all options -
HumanFormatterimplementation with color coding and tables -
JsonFormatterimplementation producing valid JSON Lines -
YaraFormatterimplementation with proper escaping and modifiers - Unit tests achieving >80% code coverage for output module
- Integration test demonstrating all three formats
- Inline documentation for public APIs
- README updated with output format examples (reference existing docs/src/output-formats.md)
References
- Architecture Documentation:
docs/src/architecture.md - Output Format Specifications:
docs/src/output-formats.md - Core Data Structures:
src/types.rs - Related Issue: Task-ID stringy-analyzer/output-formatting-framework
Dependencies
serde/serde_json(already in Cargo.toml)- Consider adding:
comfy-tableorprettytable-rsfor human-readable output - Consider adding:
coloredortermcolorfor color support
Estimated Effort
Medium complexity - requires trait design, three formatter implementations, and comprehensive testing. Estimated 1-2 days of focused development.