Enterprise Attack Simulator

AI-Powered Synthetic Security Log Dataset + Benchmark for Detection Engineering

What This Is

A realistic enterprise security log dataset and reproducible detection benchmark featuring:

7,920,291 total logs over 25 days of continuous enterprise activity
Multi-user pivot attack campaign (initial compromise to higher-privilege user to admin account abuse)
133 users across departments with role-based behavior
55 service accounts with authentic high-volume background activity
Defense observability logs (EDR / DLP / SIEM / PAM / MFA-style) at SOC volume:
- 142,184 defense observability events total
- 342 attacker actions (ground truth)
- 269 attack-triggered observability events (not de-duplicated; one action can trigger multiple alert types)
Row-level ground truth: attacker actions are labeled by attack_id (non-null)

How it's generated: both benign and attack activity are generated by an AI-driven simulation framework. The attacker is guided by a 100+ parameter defensive environment (visibility, controls, enforcement, noise, alert overlap) and optimizes its campaign behavior under that posture.

Built by an ML engineer working in cybersecurity who hit the same wall most teams hit: no realistic, labeled enterprise data for training and validating detection systems.

Background & Motivation

Security teams face a data challenge.

Training detection systems requires labeled attack data, but real breaches are rare, sensitive, and can't be freely shared. Most teams resort to:

Static datasets from years ago (DARPA 2000, CICIDS)
Lab exercises with clean, compressed attack scenarios
Limited red team engagements that can't run continuously

Validating detections is difficult without realistic test data that mirrors actual enterprise environments, complete with background noise, service account activity, and the tool overlap that makes real attacks hard to detect.

This dataset aims to help by providing enterprise-scale, multi-week, high-noise logs with event-level ground truth, plus a standard evaluator so teams can compare detections apples-to-apples.

Scenario at a Glance

This release models an intermediate-skill living-off-the-land campaign with a multi-identity pivot:

Identity	Context	Attacker Actions
`daniel.davis004` on `WS-SAL-0005`	Initial compromised user (Sales)	136
`daniel.thomas070` on `WS-IT-0071`	Pivot target (IT, normal account) — attacker + legitimate IT activity mixed	142
`daniel.thomas070_admin` on `WS-IT-0071`	Privilege context (IT, admin account) — admin-like attacker actions blended with normal admin work	23

Hosts touched by attacker actions:

Host	Attacker-Action Events
`WS-IT-0071`	165
`WS-SAL-0005`	136
`DB-SRV-02`	24
`APP-SRV-02`	11
`WEB-SRV-01`	3
`DC-01`	3

Ground truth: all rows where attack_id is non-null.

Repository Contents

├── data/
│   └── two_day_sample_cyber_simulator_json_format.zip   # 2-day JSON sample (included)
├── notebooks/
│   └── explore_dataset.ipynb                            # dataset exploration
├── docs/
│   └── SCHEMA.md                                        # complete field documentation
├── evaluate.py                                          # benchmark evaluator
└── README.md

Full Dataset (External):

JSON Format (HuggingFace): zipped JSONL files split by day, see Quick Start

Quick Start

Option 1: Two-day JSON sample (included in repo)

The repo includes a small, ready-to-use two-day JSON sample at data/two_day_sample_cyber_simulator_json_format.zip.

import json
import zipfile

records = []

with zipfile.ZipFile("data/two_day_sample_cyber_simulator_json_format.zip") as zf:
    for name in zf.namelist():
        if name.endswith(".json"):
            with zf.open(name) as f:
                for line in f:
                    line = line.decode("utf-8").strip()
                    if line:
                        records.append(json.loads(line))

print("Loaded records:", len(records))

# Attacker actions = non-null attack_id
attack = [r for r in records if r.get("attack_id") is not None]
print("Attack records:", len(attack))

Option 2: Full dataset (JSON, split by day — HuggingFace)

curl -L -o data/cyber_simulator_json_format.zip \
  https://huggingface.co/datasets/gregalr/cyber_simulation_json_format/resolve/main/cyber_simulator_json_format.zip

import json
import zipfile

records = []

with zipfile.ZipFile("data/cyber_simulator_json_format.zip") as zf:
    for name in zf.namelist():
        if name.endswith(".json"):
            with zf.open(name) as f:
                for line in f:
                    line = line.decode("utf-8").strip()
                    if line:
                        records.append(json.loads(line))

print("Loaded records:", len(records))

Tip: For the full dataset, consider streaming into BigQuery / DuckDB / ClickHouse instead of loading all records into memory.

Evaluate detections (benchmark)

Scoring definition

All benchmark metrics are computed only on Windows attacker-action telemetry:

Eligible rows: log_type == "windows_security_event"
Positive label: attack_id is present (non-null / not "NA") within eligible rows
All other log types (DLP blocks, SIEM alerts, EDR alerts, PAM denials, etc.) are included for context but are not scored

Submission format

Submit a file listing the row_id values you flagged as suspicious:

row_id is the 0-based row number in the canonical dataset ordering (header excluded)
Do not shuffle or re-sort the dataset before generating row_ids
Alerts outside the eligible universe are ignored

Accepted formats:

CSV/TSV with a row_id column
JSON list of integers: [12, 19, 204, ...]
Plain text: one integer row_id per line

Example: create a submission file

Important: when working from the JSON zip, generate row_id using a deterministic order — iterate zip members in sorted(zf.namelist()) order, then iterate lines in file order.

import csv
import json
import zipfile

row_id = 0
alert_row_ids = []

with zipfile.ZipFile("data/cyber_simulator_json_format.zip") as zf:
    for name in sorted(zf.namelist()):
        if name.endswith(".json"):
            with zf.open(name) as f:
                for line in f:
                    obj = json.loads(line.decode("utf-8"))
                    cmd = (obj.get("command_line") or "").lower()

                    # Replace this rule with your detector/model
                    if obj.get("log_type") == "windows_security_event" and "invoke-command" in cmd:
                        alert_row_ids.append(row_id)

                    row_id += 1

with open("submission.csv", "w", newline="") as out:
    w = csv.writer(out)
    w.writerow(["row_id"])
    for rid in alert_row_ids:
        w.writerow([rid])

print("Wrote submission.csv with", len(alert_row_ids), "alerts")

Run evaluation

python evaluate.py --data data/cyber_simulator_json_format.zip --pred submission.csv --out metrics.json

The evaluator outputs precision/recall/F1, FP/day, recall by stage/technique, and time-to-detect.

Attack Chain: living_off_land_basic

Type: living-off-the-land (PowerShell / remote execution / discovery / lateral movement / exfiltration)

Key realism feature: the attacker pivots from a low-privilege endpoint to IT context, then abuses a separate admin account.

High-level phases:

Initial compromise + foothold (Sales workstation)
Discovery + pivot attempts (IT workstation context)
Admin-account abuse (dual-account realism)
Server access + staging + exfiltration

Attacker actions are labeled via attack_id (non-null), stage_number (0–15).

Why This Attack Is Hard to Detect

Privilege transitions are the hard part. The pivot from Sales to IT to IT Admin puts attacker behavior inside the legitimate admin "shape."

Tool overlap. PowerShell and remote admin behaviors are normal for IT, so signatures alone fail.

SOC-like alert noise. Defense observability exists at high volume, with overlapping alert types and duplicates.

Living-off-the-land. No malware required—mostly built-in tooling and normal protocols.

Dataset Statistics

Metric	Value
Total logs	7,920,291
Defense observability logs	142,184
Attacker actions (`attack_id` non-null)	342 (~0.004% of total)
Attack-triggered observability events	269 (not de-duplicated)
Duration	25 days
Users	133
Service accounts	55
Pivot identities	3 (Sales user to IT user to IT admin)
Hosts touched by attacker	6
Attack stages	16

Schema Overview

Core Fields

Field	Description
`timestamp`	ISO 8601: `"2025-12-21T01:32:03.000-08:00"`
`log_type`	`"windows_security_event"`, `"defender_atp_alert"`, `"pam_access_denied"`, ...
`user`	Human identity
`account`	Security principal (user or `svc_*`)
`hostname`	Device: `"WS-IT-0071"`, `"DB-SRV-02"`, ...
`device_type`	`workstation`, `database_server`, `domain_controller`, ...
`location`	`NYC_HQ`, `SF_Office`, `London`, `Remote_VPN`, ...
`department`	`Sales`, `IT`, `Finance`, ...

Activity Fields

Field	Description
`process_name`	`"powershell.exe"`, `"cmd.exe"`, ...
`command_line`	Full command with arguments
`event_type`	`process_start`, `network_connection`, `file_access`, ...
`source_ip`	Internal: `10.x.x.x`, VPN: `192.168.x.x`
`destination_ip`	Internal or external
`port`	`135`, `443`, `3389`, `8443`, ...
`protocol`	`TCP`, `UDP`, `HTTPS`, `RDP`, ...

Attack Labels

Field	Description
`attack_id`	`"ATK_70246"` or `null`
`attack_type`	MITRE technique (optional per-row) or `null`
`stage_number`	`"0"` through `"15"` or `null`

Defense Observability Fields

Field	Description
`severity`	`low`, `medium`, `high`, `critical`
`action_taken`	`blocked`, `logged`, `quarantined`, `denied`
`vendor`	Defense product vendor (if present)
`detection_confidence`	`0.0`–`1.0` (if present)
`alert_name`	e.g. `"Suspicious PowerShell Activity"`
`reason`	e.g. `"Unauthorized service account access attempt"`

See docs/SCHEMA.md for complete documentation.

How This Dataset Differs

This dataset is built to test detections under realistic enterprise conditions:

Multi-user pivot across privilege boundaries (Sales to IT to IT Admin)
Dual-account admin modeling (normal + admin accounts)
SOC-scale observability noise (alerts/blocks/denials with overlap)
Row-level ground truth + evaluator so detectors can be compared reproducibly

Dataset Scope

This is a static, synthetic dataset representing a single high-fidelity campaign. It is designed to be repeatable and shareable, and to support:

Detection benchmarking (rules + ML)
SOC analyst training and investigation drills
Research on alert noise, privilege transitions, and pivot detection

License & Attribution

For Research & Education: Free to use
For Commercial Use: Contact for licensing

Project: Phantom Armor — Enterprise Attack Simulator
Author: Greg Rothman
Contact: gregralr@phantomarmor.com

All data is fully synthetic. No real users, systems, or organizations are represented.

Citation

@dataset{phantom_armor_2026,
  author    = {Rothman, Greg},
  title     = {Enterprise Attack Simulator: AI-Powered Synthetic Security Log Dataset and Detection Benchmark},
  year      = {2026},
  publisher = {Phantom Armor},
  url       = {https://github.com/gregdiy/cyber_simulation}
}

Community

Issues: Report bugs or request features via GitHub Issues
Discussions: Share detection techniques and ask questions
Contributions: PRs welcome for notebooks and analysis scripts

Acknowledgments

Motivated by real-world gaps encountered while building ML-based detection systems in enterprise security operations. Thanks to the security research community for public threat intelligence and documentation that informed the modeled tradecraft.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enterprise Attack Simulator

What This Is

Table of Contents

Background & Motivation

Scenario at a Glance

Repository Contents

Quick Start

Option 1: Two-day JSON sample (included in repo)

Option 2: Full dataset (JSON, split by day — HuggingFace)

Evaluate detections (benchmark)

Scoring definition

Submission format

Example: create a submission file

Run evaluation

Attack Chain: living_off_land_basic

Why This Attack Is Hard to Detect

Dataset Statistics

Schema Overview

Core Fields

Activity Fields

Attack Labels

Defense Observability Fields

How This Dataset Differs

Dataset Scope

License & Attribution

Citation

Community

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
docs		docs
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py

Folders and files

Latest commit

History

Repository files navigation

Enterprise Attack Simulator

What This Is

Table of Contents

Background & Motivation

Scenario at a Glance

Repository Contents

Quick Start

Option 1: Two-day JSON sample (included in repo)

Option 2: Full dataset (JSON, split by day — HuggingFace)

Evaluate detections (benchmark)

Scoring definition

Submission format

Example: create a submission file

Run evaluation

Attack Chain: living_off_land_basic

Why This Attack Is Hard to Detect

Dataset Statistics

Schema Overview

Core Fields

Activity Fields

Attack Labels

Defense Observability Fields

How This Dataset Differs

Dataset Scope

License & Attribution

Citation

Community

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages