-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Feature
0 / 20 of 2 issues completed
Copy link
Labels
area:analyzerBinary analyzer functionalityBinary analyzer functionalitylang:rustRust implementationRust implementationneeds:testsNeeds test coverageNeeds test coveragestatus:backlogTask in backlogTask in backlogstory-points: 88 story points8 story pointstype:enhancementNew feature or requestNew feature or request
Milestone
Description
Summary
Create a flexible RankingEngine that assigns importance scores to extracted strings based on their semantic tags, source location, and section characteristics. This enables prioritization of potentially interesting strings in binary analysis.
Background
The string analyzer extracts and classifies strings from binaries with semantic tags (URLs, IPs, file paths, etc.), section types (code, data, resources), and source locations (imports, exports, section data). However, not all strings are equally interesting for analysis. A ranking system is needed to:
- Prioritize high-value strings (e.g., network indicators, file paths, registry keys)
- Deprioritize noise (e.g., common debug strings, version info)
- Weight by context (strings from executable sections vs. debug sections)
- Enable customizable scoring for different analysis scenarios (malware analysis, reverse engineering, compliance scanning)
Proposed Solution
Architecture
Create src/classification/ranking.rs with the following components:
- RankingEngine struct: Main scoring engine with configurable weights
- ScoreConfig struct: Configuration for tag weights, source weights, and section type multipliers
- StringScore struct: Returned score with breakdown for transparency
- Default scoring profiles: Presets for common use cases (malware analysis, general strings, etc.)
Scoring Algorithm
final_score = (tag_weight + source_weight) × section_type_multiplier
Tag Weights (base importance):
- High value (8-10): URLs, Domains, IPv4/IPv6, Email, Registry paths
- Medium value (5-7): File paths, GUIDs, Base64 (potential encoding)
- Lower value (2-4): Format strings, User agents
- Contextual (variable): Imports/Exports (depends on name), Version strings
Source Weights:
- ImportName/ExportName: +3 (API calls are interesting)
- SectionData: +2 (hardcoded strings)
- ResourceString: +1 (UI strings, less critical)
- DebugInfo: -2 (usually noise)
Section Type Multipliers:
- Code sections: ×1.5 (strings in executable code are unusual)
- StringData/ReadOnlyData: ×1.0 (expected location)
- WritableData: ×1.2 (potentially modified at runtime)
- Resources: ×0.8 (often benign UI strings)
- Debug: ×0.3 (low priority noise)
Implementation Details
pub struct RankingEngine {
config: ScoreConfig,
}
pub struct ScoreConfig {
tag_weights: HashMap<Tag, f32>,
source_weights: HashMap<StringSource, f32>,
section_multipliers: HashMap<SectionType, f32>,
}
pub struct StringScore {
pub total: f32,
pub tag_weight: f32,
pub source_weight: f32,
pub section_multiplier: f32,
}
impl RankingEngine {
pub fn new(config: ScoreConfig) -> Self;
pub fn with_defaults() -> Self;
pub fn score(&self, tag: &Tag, source: StringSource, section: SectionType) -> StringScore;
}Acceptance Criteria
-
RankingEnginestruct created with configurable scoring -
ScoreConfigsupports custom weights for tags, sources, and sections - Default scoring profile implemented with sensible weights
-
score()method returns detailedStringScorewith breakdown - Unit tests for various scoring combinations
- Documentation with examples of customization
- Integration point in classification module (
mod.rs)
Technical Notes
- Use
f32for scores to allow fractional weights - Consider using builder pattern for
ScoreConfigcustomization - Scores should be normalized (0-100 range recommended)
- Future enhancement: Machine learning-based weight tuning
Dependencies
- Requires existing types from
src/classification/mod.rs:Tag,StringSource,SectionType - No external crate dependencies expected for MVP
References
- Requirements: 5.1
- Task-ID: stringy-analyzer/ranking-system-foundation
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
area:analyzerBinary analyzer functionalityBinary analyzer functionalitylang:rustRust implementationRust implementationneeds:testsNeeds test coverageNeeds test coveragestatus:backlogTask in backlogTask in backlogstory-points: 88 story points8 story pointstype:enhancementNew feature or requestNew feature or request