Skip to content

CLI: Implement output format selection (--top, --json flags) #31

@unclesp1d3r

Description

@unclesp1d3r

Summary

Implement CLI arguments for output format selection, enabling users to control result limiting and choose between human-readable and JSONL output formats.

Context

The Stringy CLI currently accepts only an input file path. The output system needs to support multiple formats to serve different use cases:

  • Human-readable: Interactive analysis with formatted tables
  • JSONL: Machine-readable for automation and pipelines
  • YARA: Security rule generation (future enhancement)

The core types (FoundString, Encoding, Tag, etc.) are fully defined in src/types.rs, and the output module skeleton exists at src/output/mod.rs.

Requirements

From the specification:

  • 7.5: Implement --top N argument to limit results to top N strings by score
  • 7.6: Add --json flag for JSONL output format (one JSON object per line)
  • 7.7: Default to human-readable table format when no format flag is specified

Proposed Solution

1. CLI Argument Structure

Extend the existing Cli struct in src/main.rs:

use clap::{Parser, ValueEnum};
use std::path::PathBuf;

#[derive(Parser)]
#[command(name = "stringy")]
#[command(about = "Extract meaningful strings from binary files")]
#[command(version)]
struct Cli {
    /// Input binary file to analyze
    #[arg(value_name = "FILE")]
    input: PathBuf,
    
    /// Output format
    #[arg(short = 'f', long, value_enum, default_value = "human")]
    format: OutputFormat,
    
    /// Shorthand for --format json
    #[arg(long, conflicts_with = "format")]
    json: bool,
    
    /// Limit to top N results by score
    #[arg(short = 't', long, value_name = "N")]
    top: Option<usize>,
}

#[derive(Clone, Copy, ValueEnum)]
enum OutputFormat {
    Human,
    Json,
}

2. Output Module Implementation

Create output formatters in src/output/:

src/output/mod.rs:

mod human;
mod json;

pub use human::HumanFormatter;
pub use json::JsonFormatter;

use crate::types::{FoundString, Result};

pub trait OutputFormatter {
    fn format(&self, strings: &[FoundString]) -> Result<String>;
}

src/output/human.rs:

use crate::output::OutputFormatter;
use crate::types::{FoundString, Result};

pub struct HumanFormatter;

impl OutputFormatter for HumanFormatter {
    fn format(&self, strings: &[FoundString]) -> Result<String> {
        let mut output = String::new();
        output.push_str("Score  Offset    Section      Tags           String\n");
        output.push_str("-----  ------    -------      ----           ------\n");
        
        for s in strings {
            let tags_str = s.tags.iter()
                .map(|t| format!("{:?}", t).to_lowercase())
                .collect::<Vec<_>>()
                .join(",");
            
            output.push_str(&format!(
                "{:5}  0x{:06x}  {:12} {:14} {}\n",
                s.score,
                s.offset,
                s.section.as_ref().unwrap_or(&String::from("<unknown>")),
                tags_str,
                s.text.chars().take(50).collect::<String>()
            ));
        }
        
        Ok(output)
    }
}

src/output/json.rs:

use crate::output::OutputFormatter;
use crate::types::{FoundString, Result};
use serde_json;

pub struct JsonFormatter;

impl OutputFormatter for JsonFormatter {
    fn format(&self, strings: &[FoundString]) -> Result<String> {
        let mut output = String::new();
        for s in strings {
            output.push_str(&serde_json::to_string(s)?);
            output.push('\n');
        }
        Ok(output)
    }
}

3. Main Pipeline Integration

Update src/main.rs:

use stringy::output::{HumanFormatter, JsonFormatter, OutputFormatter};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cli = Cli::parse();
    
    // Determine output format
    let format = if cli.json {
        OutputFormat::Json
    } else {
        cli.format
    };
    
    // TODO: Extract strings from binary (future PR)
    // let strings = extract_strings(&cli.input)?;
    
    // For now, use mock data
    let mut strings = vec![]; // Will be populated by extraction pipeline
    
    // Apply top limit if specified
    if let Some(top) = cli.top {
        strings.truncate(top);
    }
    
    // Format and output
    let formatter: Box<dyn OutputFormatter> = match format {
        OutputFormat::Human => Box::new(HumanFormatter),
        OutputFormat::Json => Box::new(JsonFormatter),
    };
    
    let output = formatter.format(&strings)?;
    println!("{}", output);
    
    Ok(())
}

4. Integration Tests

Create tests/cli_output_formats.rs:

use assert_cmd::Command;
use predicates::prelude::*;
use std::fs;
use tempfile::TempDir;

#[test]
fn test_default_human_readable_output() {
    let mut cmd = Command::cargo_bin("stringy").unwrap();
    cmd.arg("tests/fixtures/elf/simple_hello_world")
        .assert()
        .success()
        .stdout(predicate::str::contains("Score"))
        .stdout(predicate::str::contains("Offset"));
}

#[test]
fn test_json_flag() {
    let mut cmd = Command::cargo_bin("stringy").unwrap();
    cmd.arg("--json")
        .arg("tests/fixtures/elf/simple_hello_world")
        .assert()
        .success()
        .stdout(predicate::str::is_json());
}

#[test]
fn test_format_json() {
    let mut cmd = Command::cargo_bin("stringy").unwrap();
    cmd.arg("--format").arg("json")
        .arg("tests/fixtures/elf/simple_hello_world")
        .assert()
        .success()
        .stdout(predicate::str::is_json());
}

#[test]
fn test_top_limit() {
    let mut cmd = Command::cargo_bin("stringy").unwrap();
    let output = cmd.arg("--json")
        .arg("--top").arg("5")
        .arg("tests/fixtures/elf/simple_hello_world")
        .output()
        .unwrap();
    
    let line_count = String::from_utf8_lossy(&output.stdout)
        .lines()
        .filter(|l| !l.is_empty())
        .count();
    
    assert!(line_count <= 5, "Expected at most 5 results, got {}", line_count);
}

#[test]
fn test_json_and_format_conflict() {
    let mut cmd = Command::cargo_bin("stringy").unwrap();
    cmd.arg("--json")
        .arg("--format").arg("human")
        .arg("tests/fixtures/elf/simple_hello_world")
        .assert()
        .failure(); // Should fail due to conflicting arguments
}

Add test dependencies to Cargo.toml:

[dev-dependencies]
assert_cmd = "2.0"
predicates = "3.0"

Implementation Checklist

  • Update Cli struct in src/main.rs with new arguments
  • Implement OutputFormatter trait in src/output/mod.rs
  • Create HumanFormatter in src/output/human.rs
  • Create JsonFormatter in src/output/json.rs
  • Integrate formatters into main pipeline
  • Handle --top limiting in main
  • Add assert_cmd and predicates dev dependencies
  • Create tests/cli_output_formats.rs with integration tests
  • Ensure all tests pass: cargo test
  • Verify CLI works: cargo run -- --help
  • Update documentation if needed

Testing Strategy

  1. Unit tests: Test formatters individually with mock FoundString data
  2. Integration tests: Test CLI argument parsing and output format selection
  3. Manual testing: Verify output appearance and usability

Dependencies

  • Blocked by: Basic CLI Structure ✅ (Complete - basic clap structure exists)
  • Blocks: Future output features (YARA format, filtering options)

Related Files

  • src/main.rs: CLI argument parsing
  • src/output/mod.rs: Output module (currently empty)
  • src/types.rs: Core types (FoundString, Encoding, Tag)
  • tests/fixtures/: Test binary files for integration tests

References

  • Documentation: docs/src/output-formats.md
  • Documentation: docs/src/cli.md
  • Requirements: 7.5, 7.6, 7.7
  • Task ID: stringy-analyzer/output-format-cli-arguments

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions