-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Overview
This task implements a YARA rule formatter for StringyMcStringFace that outputs extracted strings in valid YARA rule syntax. YARA is a widely-used pattern matching tool in malware research and incident response, making YARA-compatible output a critical feature for security analysts who want to create detection rules from extracted strings.
Context
The binary analyzer currently extracts strings from ELF, PE, and Mach-O binaries with rich metadata including:
- Encoding types (ASCII, UTF-8, UTF-16LE, UTF-16BE)
- Semantic tags (URLs, domains, IPs, file paths, etc.)
- Offsets, RVAs, and section information
- Relevance scores
To maximize utility for security practitioners, we need to format this data as valid YARA rules that can be immediately used for threat hunting and detection.
Technical Requirements
1. String Escaping
Implement proper C-style escaping for YARA text strings:
- Escape double quotes:
\" - Escape backslashes:
\\ - Escape newlines:
\n - Escape carriage returns:
\r - Escape tabs:
\t - Non-printable bytes:
\xNN(hex notation)
2. Encoding Mapping
Map FoundString encoding types to YARA string modifiers:
Encoding::Ascii/Encoding::Utf8→ascii(default)Encoding::Utf16Le/Encoding::Utf16Be→widemodifier- Consider adding
ascii widefor broader matching when appropriate
3. String Truncation
Apply truncation rules for excessively long strings:
- Maximum string length: 256 bytes (configurable)
- Truncate with indicator comment (e.g.,
// truncated from N bytes) - Consider hex string format for binary data or very long strings
4. Semantic Tag Integration
Leverage semantic tags to enhance rule conditions:
- Group strings by tag type in rule conditions
- Add metadata comments indicating tag classifications
- Generate compound conditions (e.g.,
any of ($url*))
5. Rule Structure
Generate complete, valid YARA rules:
rule binary_name_strings {
meta:
description = "Extracted strings from binary_name"
format = "ELF/PE/MachO"
generated_by = "StringyMcStringFace"
strings:
$s1 = "extracted_string" ascii
$s2 = "wide_string" wide
$url1 = "https://example.com" nocase
condition:
any of them
}Proposed Implementation
File: src/output/yara.rs
pub struct YaraFormatter {
max_string_length: usize,
include_metadata: bool,
rule_name: String,
}
impl YaraFormatter {
pub fn format_rule(&self, strings: &[FoundString], binary_info: &ContainerInfo) -> String;
fn escape_string(&self, s: &str) -> String;
fn get_string_modifiers(&self, string: &FoundString) -> Vec<&str>;
fn truncate_if_needed(&self, s: &str) -> (String, bool);
fn generate_condition(&self, strings: &[FoundString]) -> String;
}Module Registration: src/output/mod.rs
pub mod yara;
pub enum OutputFormat {
Json,
Yara,
// ... other formats
}
pub trait Formatter {
fn format(&self, strings: &[FoundString], info: &ContainerInfo) -> String;
}Example Output
Given extracted strings from a binary, the formatter should produce:
rule suspicious_binary_strings {
meta:
description = "Strings extracted from suspicious.exe"
format = "PE"
generated_by = "StringyMcStringFace v0.1"
extracted_count = 47
strings:
// URLs and network indicators
$url1 = "http://malicious.example.com/payload" nocase
$ip1 = "192.168.1.100" ascii
// File paths
$path1 = "C:\\Windows\\System32\\evil.dll" nocase
// Wide strings (UTF-16)
$wide1 = "WideString" wide
// Import/Export names
$imp1 = "CreateRemoteThread" ascii
$imp2 = "VirtualAllocEx" ascii
condition:
any of ($url*) or
2 of ($imp*) or
any of ($path*)
}Test Scenarios
Unit tests should cover:
-
String Escaping
- Quotes, backslashes, newlines in strings
- Non-ASCII characters (hex escape)
- Already-escaped content (no double-escaping)
-
Encoding Handling
- ASCII strings →
asciimodifier - UTF-16LE/BE →
widemodifier - Mixed encoding in single rule
- ASCII strings →
-
Truncation
- Strings under limit → no truncation
- Strings over limit → truncated with comment
- Extreme cases (empty strings, very long strings)
-
Rule Generation
- Valid YARA syntax (parseable by YARA)
- Proper section formatting (meta, strings, condition)
- Special characters in rule names
-
Semantic Tags
- URLs grouped and commented
- Network indicators (IPs, domains)
- Import/Export grouping
Dependencies
- Blocked by: Output Formatting Framework (Implement Output Formatter Framework with Trait-Based Architecture #25)
- Need base
OutputFormatenum andFormattertrait - May proceed with initial implementation if framework is defined
- Need base
Acceptance Criteria
-
src/output/yara.rsimplementsYaraFormatter - All special characters properly escaped per YARA specification
- Encoding types correctly mapped to YARA modifiers
- Long strings truncated at configurable threshold
- Generated rules pass YARA syntax validation
- Comprehensive unit tests with >90% coverage
- Integration test with sample binaries (ELF, PE, Mach-O)
- Documentation with usage examples
References
Related Issues
- Output Formatting Framework (dependency)
- Requirement 6.3 implementation
Task ID: stringy-analyzer/yara-friendly-output