-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Overview
This issue implements command-line filtering arguments that give users fine-grained control over string extraction from binary files. These filters are essential for reducing noise and focusing analysis on relevant string types, making Stringy more practical for real-world reverse engineering and malware analysis workflows.
Context
Stringy extracts and categorizes strings from binaries using semantic tags (URLs, file paths, GUIDs, format strings, etc.) and supports multiple encodings (ASCII, UTF-8, UTF-16LE/BE). However, without filtering capabilities, users receive all extracted strings regardless of their analysis goals. For example:
- Malware analysts often need only URLs and network indicators
- Reverse engineers may focus on format strings and error messages
- Performance: Filtering reduces output size and processing overhead
- Pipelines: Filtered JSON output enables targeted downstream processing
Requirements
This implements requirements 7.1, 7.2, 7.3, and 7.4 from the specification.
Proposed Solution
CLI Arguments
Implement the following filtering arguments using the clap crate:
-
--min-len <LENGTH>- Filter strings shorter than the specified length
- Type:
usize - Default:
4(consistent with standardstringscommand) - Example:
stringy --min-len 8 binary.exe
-
--enc <ENCODINGS>- Comma-separated list of encodings to include
- Type:
Vec<String> - Valid values:
ascii,utf8,utf16le,utf16be - Default: all encodings
- Example:
stringy --enc ascii,utf8 binary.exe
-
--only-tags <TAGS>- Comma-separated list of semantic tags to include (whitelist)
- Type:
Vec<String> - Valid values:
url,domain,ip,filepath,registry,guid,useragent,fmt,base64,crypto - Mutually exclusive with
--notags - Example:
stringy --only-tags url,domain,ip binary.exe
-
--notags- Exclude strings without any semantic tags (show only classified strings)
- Type:
boolflag - Mutually exclusive with
--only-tags - Example:
stringy --notags binary.exe
Implementation Approach
-
Argument Parsing (
src/cli.rsor main CLI module)- Define filtering arguments in the clap
Argsstruct - Implement validation for encoding names and tag names
- Add mutual exclusivity check for
--only-tagsand--notags
- Define filtering arguments in the clap
-
Filter Configuration (
src/types.rsor newsrc/filter.rs)- Create a
FilterConfigstruct to hold parsed filter settings - Implement
FilterConfig::from_args()to convert CLI args - Add validation methods for encoding and tag names
- Create a
-
Filtering Logic (in string extraction pipeline)
- Apply
min-lenfilter during or after extraction - Apply encoding filter by skipping unwanted encoding attempts
- Apply tag filters when assembling final output
- Ensure filtering preserves ranking/scoring when applicable
- Apply
-
Error Handling
- Invalid encoding names → clear error message with valid options
- Invalid tag names → clear error message with valid options
- Conflicting
--only-tagsand--notags→ clap conflict group
Example Usage
# Extract only URLs and domains with minimum length 10
stringy --only-tags url,domain --min-len 10 malware.exe
# Extract only UTF-16 strings (common in PE files)
stringy --enc utf16le malware.exe
# Extract only tagged/classified strings
stringy --notags suspicious.elf
# Combine filters for precision
stringy --min-len 8 --enc ascii,utf8 --only-tags filepath,registry binary.dllAcceptance Criteria
-
--min-lenargument filters strings by minimum length -
--encargument accepts comma-separated encoding list and filters accordingly -
--only-tagsargument accepts comma-separated tag list and shows only matching strings -
--notagsflag excludes untagged strings -
--only-tagsand--notagsare mutually exclusive (enforced by clap) - Invalid encoding names produce helpful error messages
- Invalid tag names produce helpful error messages
- Unit tests cover:
- Argument parsing for all flags
- Filter configuration validation
- Filtering logic for each argument type
- Mutual exclusivity enforcement
- Edge cases (empty results, no matches, etc.)
- Documentation updated with filtering examples
- Filtering works correctly in both human-readable and JSON output modes
Dependencies
- Blocked by: Basic CLI Structure (must be completed first)
- Depends on: String extraction and tagging system (framework exists)
Testing Strategy
-
Unit Tests (
tests/cli_filtering.rs):- Parse valid and invalid argument combinations
- Validate filter configuration creation
- Test filter application logic in isolation
-
Integration Tests:
- Run stringy with various filter combinations on test binaries
- Verify output contains only expected strings
- Test edge cases (no matches, all filtered out, etc.)
Task ID
stringy-analyzer/filtering-cli-arguments
Related Issues
- Basic CLI Structure (prerequisite)
- String Extraction Engine (#TBD)
- Semantic Tagging System (#TBD)