This methodology covers local-only prompt-injection testing via Ollama.
- Provider: Ollama
- Typical models:
qwen2.5:1.5b,qwen2.5:3b,llama3:8b - Default temperature:
0.7 - Judge temperature (when using
llm_judge):0.1 - Seed:
42 - Single-shot execution per attack in standard runs
The suite combines community-derived and original attacks across categories such as structure, multiturn, emotional, jailbreak, classic, obfuscation, and encoding.
Two detector modes are used:
substring(fast, higher false-positive risk)llm_judge(slower, contextual judgment, judge availability dependent)
When comparing vulnerability rates:
- Run an identical attack set with
--detector substring. - Run the same set with
--detector llm_judge. - Compare category-level outcomes, not totals only.
- Record
fallback_used,detector_used, and per-attackerrorfields. - Fail-closed by default for judge unavailability (no silent fallback unless explicitly enabled).
- Flag non-aligned run sizes (e.g., 34 vs 14 vs 12 attacks) as a comparison caveat.
For the February 15, 2026 update, six result files were integrated:
- qwen2.5:1.5b (
substring,llm_judge) - qwen2.5:3b (
substring,llm_judge-focused) - llama3:8b (
substring,llm_judge)
The update uses only recorded JSON outputs in results/ and does not retroactively alter underlying run artifacts.
Results include:
schema_version:"1.0.0"runtime_config.seed:42runtime_config.temperature:0.7runtime_config.judge_temperature:0.1runtime_config.provider_timeout_sec
These fields should be preserved when sharing, diffing, or aggregating runs.
- Limited replication and sample size
- Detector uncertainty without full human review
- Judge availability/runtime dependency
- Sensitivity to model version, quantization, and runtime settings
Treat outcomes as practical red-team signals, not final security truth. Validate findings in your own deployment setup.