AI-Powered Synthetic Security Log Dataset + Benchmark for Detection Engineering
A realistic enterprise security log dataset and reproducible detection benchmark featuring:
- 7,920,291 total logs over 25 days of continuous enterprise activity
- Multi-user pivot attack campaign (initial compromise to higher-privilege user to admin account abuse)
- 133 users across departments with role-based behavior
- 55 service accounts with authentic high-volume background activity
- Defense observability logs (EDR / DLP / SIEM / PAM / MFA-style) at SOC volume:
- 142,184 defense observability events total
- 342 attacker actions (ground truth)
- 269 attack-triggered observability events (not de-duplicated; one action can trigger multiple alert types)
- Row-level ground truth: attacker actions are labeled by
attack_id(non-null)
How it's generated: both benign and attack activity are generated by an AI-driven simulation framework. The attacker is guided by a 100+ parameter defensive environment (visibility, controls, enforcement, noise, alert overlap) and optimizes its campaign behavior under that posture.
Built by an ML engineer working in cybersecurity who hit the same wall most teams hit: no realistic, labeled enterprise data for training and validating detection systems.
- Background & Motivation
- Scenario at a Glance
- Repository Contents
- Quick Start
- BigQuery (public) — browse with SQL (no local setup)
- Attack Chain
- Evaluate detections
- Why This Attack Is Hard to Detect
- Dataset Statistics
- Schema Overview
- How This Dataset Differs
- Dataset Scope
- License & Attribution
Security teams face a data challenge.
Training detection systems requires labeled attack data, but real breaches are rare, sensitive, and can't be freely shared. Most teams resort to:
- Static datasets from years ago (DARPA 2000, CICIDS)
- Lab exercises with clean, compressed attack scenarios
- Limited red team engagements that can't run continuously
Validating detections is difficult without realistic test data that mirrors actual enterprise environments, complete with background noise, service account activity, and the tool overlap that makes real attacks hard to detect.
This dataset aims to help by providing enterprise-scale, multi-week, high-noise logs with event-level ground truth, plus a standard evaluator so teams can compare detections apples-to-apples.
This release models an intermediate-skill living-off-the-land campaign with a multi-identity pivot:
| Identity | Context | Attacker Actions |
|---|---|---|
daniel.davis004 on WS-SAL-0005 |
Initial compromised user (Sales) | 136 |
daniel.thomas070 on WS-IT-0071 |
Pivot target (IT, normal account) — attacker + legitimate IT activity mixed | 142 |
daniel.thomas070_admin on WS-IT-0071 |
Privilege context (IT, admin account) — admin-like attacker actions blended with normal admin work | 23 |
Hosts touched by attacker actions:
| Host | Attacker-Action Events |
|---|---|
WS-IT-0071 |
165 |
WS-SAL-0005 |
136 |
DB-SRV-02 |
24 |
APP-SRV-02 |
11 |
WEB-SRV-01 |
3 |
DC-01 |
3 |
Ground truth: all rows where
attack_idis non-null.
├── data/
│ └── two_day_sample_cyber_simulator_json_format.zip # 2-day JSON sample (included)
├── notebooks/
│ └── explore_dataset.ipynb # dataset exploration
├── docs/
│ └── SCHEMA.md # complete field documentation
├── evaluate.py # benchmark evaluator
└── README.md
Full Dataset (External):
- JSON Format (HuggingFace): zipped JSONL files split by day, see Quick Start
The repo includes a small, ready-to-use two-day JSON sample at data/two_day_sample_cyber_simulator_json_format.zip.
import json
import zipfile
records = []
with zipfile.ZipFile("data/two_day_sample_cyber_simulator_json_format.zip") as zf:
for name in zf.namelist():
if name.endswith(".json"):
with zf.open(name) as f:
for line in f:
line = line.decode("utf-8").strip()
if line:
records.append(json.loads(line))
print("Loaded records:", len(records))
# Attacker actions = non-null attack_id
attack = [r for r in records if r.get("attack_id") is not None]
print("Attack records:", len(attack))curl -L -o data/cyber_simulator_json_format.zip \
https://huggingface.co/datasets/gregalr/cyber_simulation_json_format/resolve/main/cyber_simulator_json_format.zipimport json
import zipfile
records = []
with zipfile.ZipFile("data/cyber_simulator_json_format.zip") as zf:
for name in zf.namelist():
if name.endswith(".json"):
with zf.open(name) as f:
for line in f:
line = line.decode("utf-8").strip()
if line:
records.append(json.loads(line))
print("Loaded records:", len(records))Tip: For the full dataset, consider streaming into BigQuery / DuckDB / ClickHouse instead of loading all records into memory.
All benchmark metrics are computed only on Windows attacker-action telemetry:
- Eligible rows:
log_type == "windows_security_event" - Positive label:
attack_idis present (non-null / not"NA") within eligible rows - All other log types (DLP blocks, SIEM alerts, EDR alerts, PAM denials, etc.) are included for context but are not scored
Submit a file listing the row_id values you flagged as suspicious:
row_idis the 0-based row number in the canonical dataset ordering (header excluded)- Do not shuffle or re-sort the dataset before generating
row_ids - Alerts outside the eligible universe are ignored
Accepted formats:
- CSV/TSV with a
row_idcolumn - JSON list of integers:
[12, 19, 204, ...] - Plain text: one integer
row_idper line
Important: when working from the JSON zip, generate
row_idusing a deterministic order — iterate zip members insorted(zf.namelist())order, then iterate lines in file order.
import csv
import json
import zipfile
row_id = 0
alert_row_ids = []
with zipfile.ZipFile("data/cyber_simulator_json_format.zip") as zf:
for name in sorted(zf.namelist()):
if name.endswith(".json"):
with zf.open(name) as f:
for line in f:
obj = json.loads(line.decode("utf-8"))
cmd = (obj.get("command_line") or "").lower()
# Replace this rule with your detector/model
if obj.get("log_type") == "windows_security_event" and "invoke-command" in cmd:
alert_row_ids.append(row_id)
row_id += 1
with open("submission.csv", "w", newline="") as out:
w = csv.writer(out)
w.writerow(["row_id"])
for rid in alert_row_ids:
w.writerow([rid])
print("Wrote submission.csv with", len(alert_row_ids), "alerts")python evaluate.py --data data/cyber_simulator_json_format.zip --pred submission.csv --out metrics.jsonThe evaluator outputs precision/recall/F1, FP/day, recall by stage/technique, and time-to-detect.
Type: living-off-the-land (PowerShell / remote execution / discovery / lateral movement / exfiltration)
Key realism feature: the attacker pivots from a low-privilege endpoint to IT context, then abuses a separate admin account.
High-level phases:
- Initial compromise + foothold (Sales workstation)
- Discovery + pivot attempts (IT workstation context)
- Admin-account abuse (dual-account realism)
- Server access + staging + exfiltration
Attacker actions are labeled via attack_id (non-null), stage_number (0–15).
Privilege transitions are the hard part. The pivot from Sales to IT to IT Admin puts attacker behavior inside the legitimate admin "shape."
Tool overlap. PowerShell and remote admin behaviors are normal for IT, so signatures alone fail.
SOC-like alert noise. Defense observability exists at high volume, with overlapping alert types and duplicates.
Living-off-the-land. No malware required—mostly built-in tooling and normal protocols.
| Metric | Value |
|---|---|
| Total logs | 7,920,291 |
| Defense observability logs | 142,184 |
Attacker actions (attack_id non-null) |
342 (~0.004% of total) |
| Attack-triggered observability events | 269 (not de-duplicated) |
| Duration | 25 days |
| Users | 133 |
| Service accounts | 55 |
| Pivot identities | 3 (Sales user to IT user to IT admin) |
| Hosts touched by attacker | 6 |
| Attack stages | 16 |
| Field | Description |
|---|---|
timestamp |
ISO 8601: "2025-12-21T01:32:03.000-08:00" |
log_type |
"windows_security_event", "defender_atp_alert", "pam_access_denied", ... |
user |
Human identity |
account |
Security principal (user or svc_*) |
hostname |
Device: "WS-IT-0071", "DB-SRV-02", ... |
device_type |
workstation, database_server, domain_controller, ... |
location |
NYC_HQ, SF_Office, London, Remote_VPN, ... |
department |
Sales, IT, Finance, ... |
| Field | Description |
|---|---|
process_name |
"powershell.exe", "cmd.exe", ... |
command_line |
Full command with arguments |
event_type |
process_start, network_connection, file_access, ... |
source_ip |
Internal: 10.x.x.x, VPN: 192.168.x.x |
destination_ip |
Internal or external |
port |
135, 443, 3389, 8443, ... |
protocol |
TCP, UDP, HTTPS, RDP, ... |
| Field | Description |
|---|---|
attack_id |
"ATK_70246" or null |
attack_type |
MITRE technique (optional per-row) or null |
stage_number |
"0" through "15" or null |
| Field | Description |
|---|---|
severity |
low, medium, high, critical |
action_taken |
blocked, logged, quarantined, denied |
vendor |
Defense product vendor (if present) |
detection_confidence |
0.0–1.0 (if present) |
alert_name |
e.g. "Suspicious PowerShell Activity" |
reason |
e.g. "Unauthorized service account access attempt" |
See docs/SCHEMA.md for complete documentation.
This dataset is built to test detections under realistic enterprise conditions:
- Multi-user pivot across privilege boundaries (Sales to IT to IT Admin)
- Dual-account admin modeling (normal + admin accounts)
- SOC-scale observability noise (alerts/blocks/denials with overlap)
- Row-level ground truth + evaluator so detectors can be compared reproducibly
This is a static, synthetic dataset representing a single high-fidelity campaign. It is designed to be repeatable and shareable, and to support:
- Detection benchmarking (rules + ML)
- SOC analyst training and investigation drills
- Research on alert noise, privilege transitions, and pivot detection
- For Research & Education: Free to use
- For Commercial Use: Contact for licensing
Project: Phantom Armor — Enterprise Attack Simulator
Author: Greg Rothman
Contact: gregralr@phantomarmor.com
All data is fully synthetic. No real users, systems, or organizations are represented.
@dataset{phantom_armor_2026,
author = {Rothman, Greg},
title = {Enterprise Attack Simulator: AI-Powered Synthetic Security Log Dataset and Detection Benchmark},
year = {2026},
publisher = {Phantom Armor},
url = {https://github.com/gregdiy/cyber_simulation}
}- Issues: Report bugs or request features via GitHub Issues
- Discussions: Share detection techniques and ask questions
- Contributions: PRs welcome for notebooks and analysis scripts
Motivated by real-world gaps encountered while building ML-based detection systems in enterprise security operations. Thanks to the security research community for public threat intelligence and documentation that informed the modeled tradecraft.