-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
area:analyzerBinary analyzer functionalityBinary analyzer functionalitylang:rustRust implementationRust implementationneeds:testsNeeds test coverageNeeds test coveragepriority:mediumMedium priority taskMedium priority taskstatus:backlogTask in backlogTask in backlogstory-points: 88 story points8 story pointstype:enhancementNew feature or requestNew feature or request
Milestone
Description
Summary
Implement the classification layer for import/export symbols and section names, converting parsed binary metadata into tagged, scored FoundString objects for analysis.
Current State
✅ Container Parsers Implemented: All three parsers (ELF, PE, Mach-O) successfully extract import/export metadata:
ImportInfo: Symbol names, library names, addressesExportInfo: Symbol names, addresses, ordinals- Section metadata with full classification
🚧 Classification Module Empty: src/classification/mod.rs contains only a comment. Need to implement symbol processing logic.
Requirements
4.2: Import Name Identification and Tagging
- Convert
ImportInfoobjects intoFoundStringwithTag::Import - Set
source: StringSource::ImportName - Apply semantic classification (e.g., crypto APIs, network APIs)
- Boost relevance scores for security-relevant imports
4.3: Export Name Identification and Tagging
- Convert
ExportInfoobjects intoFoundStringwithTag::Export - Set
source: StringSource::ExportName - Demangle Rust symbols using
rustc-demangle - Identify entry points and special exports
4.4: Section Name Processing and Classification
- Treat section names as high-value strings
- Tag with relevant semantic categories
- Use section names to provide context for other strings
Proposed Solution
Implementation Structure
Create src/classification/symbols.rs module with:
pub struct SymbolClassifier {
crypto_apis: HashSet<String>,
network_apis: HashSet<String>,
file_apis: HashSet<String>,
}
impl SymbolClassifier {
pub fn process_imports(imports: &[ImportInfo]) -> Vec<FoundString>;
pub fn process_exports(exports: &[ExportInfo]) -> Vec<FoundString>;
pub fn classify_symbol(name: &str) -> Vec<Tag>;
pub fn demangle_rust_symbol(name: &str) -> Option<String>;
}Key Features
-
Import Processing
- Convert each
ImportInfotoFoundString - Base score: +20 points (high value)
- Additional tags based on API name patterns:
- Crypto:
CryptEncrypt,EVP_*,crypto_* - Network:
socket,connect,WSA*,curl_* - File I/O:
CreateFile,open,fopen
- Crypto:
- Preserve library information in metadata
- Convert each
-
Export Processing
- Convert each
ExportInfotoFoundString - Attempt Rust symbol demangling
- Identify special exports:
- Entry points:
main,DllMain,_start - Mangled Rust functions
- Entry points:
- Base score: +15 points
- Convert each
-
Symbol Demangling
- Use
rustc-demanglecrate for Rust symbols - Preserve both mangled and demangled forms
- Add context-specific tags (
panic,alloc, etc.)
- Use
-
Section Name Processing
- Extract section names from
SectionInfo - High relevance score (+10)
- Use for contextualizing other strings in same section
- Extract section names from
API Design
// Main entry point for symbol processing
pub fn extract_symbol_strings(container_info: &ContainerInfo) -> Vec<FoundString> {
let mut strings = Vec::new();
let classifier = SymbolClassifier::new();
strings.extend(classifier.process_imports(&container_info.imports));
strings.extend(classifier.process_exports(&container_info.exports));
strings.extend(extract_section_names(&container_info.sections));
strings
}Test Requirements
Create tests/classification_symbols.rs with:
-
Import Classification Tests
- Test crypto API detection (
CryptEncrypt,AES_encrypt) - Test network API detection (
socket,WSAStartup) - Verify Tag::Import applied correctly
- Check library attribution preserved
- Test crypto API detection (
-
Export Classification Tests
- Test basic export tagging
- Test entry point detection (
main,DllMain) - Verify Tag::Export applied
-
Rust Demangling Tests
- Test successful demangling of Rust symbols
- Verify both forms preserved
- Test context tag detection (panic, main, etc.)
- Handle non-Rust symbols gracefully
-
Section Name Tests
- Verify section names extracted as strings
- Check appropriate scoring applied
- Test all three formats (ELF, PE, Mach-O)
Success Criteria
-
SymbolClassifierimplemented insrc/classification/symbols.rs - All imports converted to
FoundStringwithTag::Import - All exports converted to
FoundStringwithTag::Export - Rust symbol demangling working with
rustc-demangle - Semantic API classification (crypto, network, file I/O)
- Section names processed as high-value strings
- Unit tests achieving >90% coverage
- Integration tests for all three formats
- Documentation with examples
Dependencies
- ✅ Symbol Processing (container parsers extract symbols)
- 📦
rustc-demanglecrate (add to Cargo.toml)
Technical Notes
- Use lazy_static for compiled API pattern sets
- Preserve all original metadata (addresses, ordinals, library names)
- Scoring should be additive: base score + semantic boosts
- Handle edge cases: empty names, duplicate symbols, stripped binaries
- Consider memory efficiency with large symbol tables
Example Output
{
"text": "CryptEncrypt",
"encoding": "Ascii",
"offset": 0,
"section": ".idata",
"tags": ["import", "crypto"],
"score": 25,
"source": "ImportName"
}References
- Architecture docs:
docs/src/architecture.md(Symbol Classification section) - Classification docs:
docs/src/classification.md(Symbol Classifier implementation) - Concept doc:
concept.md(Demangling requirements) - Related: Container parsing implementation (✅ complete)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:analyzerBinary analyzer functionalityBinary analyzer functionalitylang:rustRust implementationRust implementationneeds:testsNeeds test coverageNeeds test coveragepriority:mediumMedium priority taskMedium priority taskstatus:backlogTask in backlogTask in backlogstory-points: 88 story points8 story pointstype:enhancementNew feature or requestNew feature or request