diff --git a/BACKLOG.md b/BACKLOG.md new file mode 100644 index 0000000..f5477b9 --- /dev/null +++ b/BACKLOG.md @@ -0,0 +1,94 @@ +# Backlog + +## 1. Kernel CPU accounting under-reports vs hardware PMU counters + +**Priority**: Low (understanding only — use `perf stat` as ground truth) +**Status**: Root cause narrowed, further experiments possible + +### Problem + +The Linux kernel's CPU time accounting (`/proc/pid/stat`, `schedstat`) consistently under-reports +CPU utilization compared to `perf stat` (hardware PMU counters) for the Netty custom scheduler workload. + +**Measured on NETTY_SCHEDULER with 4 carrier threads pinned to 4 physical cores (Ryzen 9 7950X):** + +| Source | CPUs utilized | Notes | +|--------|--------------|-------| +| `perf stat` (PMU task-clock) | **3.96** | Hardware counter ground truth | +| `/proc/pid/stat` (utime+stime) | **3.19** | Kernel accounting | +| `schedstat` (sum of all thread run_ns) | **3.19** | CFS-level accounting | +| `pidstat` process-level | **2.84** | Even lower (pidstat's own sampling) | +| `pidstat` per-thread carrier sum | **2.72** | 4 x ~68% | + +### Key findings + +- **pidstat is NOT lying** — it faithfully reports what the kernel provides +- `/proc/pid/stat` and `schedstat` agree perfectly (both at 3.19 CPUs) +- The kernel itself under-counts by ~0.77 CPUs (19%) vs PMU hardware counters +- This is NOT caused by `CONFIG_TICK_CPU_ACCOUNTING` — the kernel uses + `CONFIG_VIRT_CPU_ACCOUNTING_GEN=y` (full dyntick, precise at context switch boundaries) +- All 31 kernel-visible threads were accounted for — no "hidden threads" from + `-Djdk.trackAllThreads=false` (VTs are invisible to the kernel by design, their CPU time + is charged to carrier threads) + +### Hypotheses for the 0.77 CPU gap + +1. **Kernel scheduling overhead**: `__schedule()` / `finish_task_switch()` runs during + thread transitions. Some of this CPU time may be attributed to idle/swapper rather + than the thread being switched in/out. + +2. **Interrupt handling**: hardware interrupts (NIC, timer) steal cycles from the process. + `perf stat` counts all cycles on cores used by the process (task-clock includes time + when PMU is inherited by children or interrupted contexts), while `/proc/stat` only + counts time explicitly attributed to the thread. + +3. **`task-clock` semantics**: `perf stat`'s `task-clock` measures wall-clock time that + at least one thread of the process was running. With 4 threads on 4 cores, task-clock + closely approximates 4.0 * elapsed. This includes interrupt handling time on those cores + that `/proc/stat` charges elsewhere. + +4. **Carrier thread park/unpark transitions**: even with VIRT_CPU_ACCOUNTING_GEN, the + accounting happens at `schedule()` boundaries. CPU cycles consumed during the entry/exit + paths of `LockSupport.park()` (before the actual `schedule()` call and after the wakeup) + may be partially lost. + +### Further experiments (if desired) + +1. **Compare `perf stat -e task-clock` vs `perf stat -e cpu-clock`**: `task-clock` counts + per-thread time, `cpu-clock` counts wall time. If they differ, it reveals interrupt overhead. + +2. **Run with `nohz_full=4-7` (isolated CPUs)**: removes timer tick interrupts from server + cores. If the gap shrinks, interrupt overhead is the cause. + +3. **Spin-wait instead of park**: replace `LockSupport.park()` with `Thread.onSpinWait()` + in `FifoEventLoopScheduler`. If gap shrinks, park/unpark accounting is lossy. + +4. **Check `/proc/interrupts`** delta during benchmark: quantify how many interrupts hit + cores 4-7 and estimate their CPU cost. + +5. **`perf stat` per-thread (`-t TID`)** for each carrier: compare PMU task-clock per + carrier vs schedstat per carrier to see if the gap is evenly distributed. + +### Conclusion + +For benchmarking purposes, **always use `perf stat` as the ground truth** for CPU utilization. +pidstat is still useful for relative thread balance analysis and for monitoring non-server +components (mock server, load generator) where the gap is less significant. + +--- + +## 2. Add spin-wait phase before carrier thread parking + +**Priority**: Medium (performance optimization) +**Status**: Not started + +In `FifoEventLoopScheduler.virtualThreadSchedulerLoop()`, the carrier thread parks immediately +when the queue drains. Adding a brief spin-wait phase (e.g., 100-1000 iterations of +`Thread.onSpinWait()`) before calling `LockSupport.park()` could: + +- Reduce wake-up latency for incoming work (avoid kernel schedule/deschedule) +- Reduce context switch count (currently ~20/sec, could go to near-zero) +- Trade-off: slightly higher idle CPU consumption + +### Key file +- `core/src/main/java/io/netty/loom/FifoEventLoopScheduler.java` line ~199 diff --git a/benchmark-runner/README.md b/benchmark-runner/README.md index 99b619e..64b7442 100644 --- a/benchmark-runner/README.md +++ b/benchmark-runner/README.md @@ -49,46 +49,44 @@ The `run-benchmark.sh` script will automatically build the JAR if missing. ## Configuration -All configuration is via environment variables: +Configuration via CLI flags (preferred) or environment variables (fallback). CLI flags take precedence. + +Run `./run-benchmark.sh --help` for the full list. + +### Server +| CLI flag | Env var | Default | Description | +|----------|---------|---------|-------------| +| `--mode` | `SERVER_MODE` | NON_VIRTUAL_NETTY | Mode: NON_VIRTUAL_NETTY, REACTIVE, VIRTUAL_NETTY, NETTY_SCHEDULER | +| `--threads` | `SERVER_THREADS` | 2 | Number of event loop threads | +| `--mockless` | `SERVER_MOCKLESS` | false | Skip mock server; do Jackson work inline | +| `--io` | `SERVER_IO` | epoll | I/O type: epoll, nio, io_uring | +| `--poller-mode` | `SERVER_POLLER_MODE` | | jdk.pollerMode: 1, 2, or 3 | +| `--fj-parallelism` | `SERVER_FJ_PARALLELISM` | | ForkJoinPool parallelism | +| `--server-cpuset` | `SERVER_CPUSET` | 2,3 | CPU pinning | +| `--jvm-args` | `SERVER_JVM_ARGS` | | Additional JVM arguments | ### Mock Server -| Variable | Default | Description | -|----------|---------|-------------| -| `MOCK_PORT` | 8080 | Mock server port | -| `MOCK_THINK_TIME_MS` | 1 | Simulated processing delay (ms) | -| `MOCK_THREADS` | auto | Number of Netty threads (empty = available processors) | -| `MOCK_TASKSET` | 4,5,6,7 | CPU affinity (e.g., "0-1") | - -### Handoff Server -| Variable | Default | Description | -|----------|---------|-------------| -| `SERVER_PORT` | 8081 | Server port | -| `SERVER_THREADS` | 2 | Number of event loop threads | -| `SERVER_REACTIVE` | false | Use reactive handler with Reactor | -| `SERVER_USE_CUSTOM_SCHEDULER` | false | Use custom Netty scheduler | -| `SERVER_IO` | epoll | I/O type: epoll, nio, or io_uring | -| `SERVER_NO_TIMEOUT` | false | Disable HTTP client timeout | -| `SERVER_TASKSET` | 2,3 | CPU affinity (e.g., "2-5") | -| `SERVER_JVM_ARGS` | | Additional JVM arguments | -| `SERVER_POLLER_MODE` | 3 | jdk.pollerMode value: 1, 2, or 3 | -| `SERVER_FJ_PARALLELISM` | | ForkJoinPool parallelism (empty = JVM default) | +| CLI flag | Env var | Default | Description | +|----------|---------|---------|-------------| +| `--mock-port` | `MOCK_PORT` | 8080 | Mock server port | +| `--mock-think-time` | `MOCK_THINK_TIME_MS` | 1 | Simulated processing delay (ms) | +| `--mock-threads` | `MOCK_THREADS` | 1 | Number of Netty threads | +| `--mock-cpuset` | `MOCK_CPUSET` | 4,5 | CPU pinning | ### Load Generator -| Variable | Default | Description | -|----------|---------|-------------| -| `LOAD_GEN_CONNECTIONS` | 100 | Number of connections | -| `LOAD_GEN_THREADS` | 2 | Number of threads | -| `LOAD_GEN_RATE` | | Target rate (empty = max throughput with wrk) | -| `LOAD_GEN_TASKSET` | 0,1 | CPU affinity (e.g., "6-7") | -| `LOAD_GEN_URL` | http://localhost:8081/fruits | Target URL | +| CLI flag | Env var | Default | Description | +|----------|---------|---------|-------------| +| `--connections` | `LOAD_GEN_CONNECTIONS` | 100 | Number of connections | +| `--load-threads` | `LOAD_GEN_THREADS` | 2 | Number of threads | +| `--duration` | `LOAD_GEN_DURATION` | 30s | Test duration | +| `--rate` | `LOAD_GEN_RATE` | | Target rate for wrk2 (omit for max throughput) | +| `--load-cpuset` | `LOAD_GEN_CPUSET` | 0,1 | CPU pinning | ### Timing -| Variable | Default | Description | -|----------|---------|-------------| -| `WARMUP_DURATION` | 10s | Warmup duration (no profiling) | -| `TOTAL_DURATION` | 30s | Total test duration (steady-state must be >= 20s) | -| `PROFILING_DELAY_SECONDS` | 5 | Delay before starting profiling/perf/JFR | -| `PROFILING_DURATION_SECONDS` | 10 | Profiling/perf/JFR duration in seconds | +| CLI flag | Env var | Default | Description | +|----------|---------|---------|-------------| +| `--warmup` | `WARMUP_DURATION` | 10s | Warmup duration | +| `--total-duration` | `TOTAL_DURATION` | 30s | Total test duration (steady-state >= 20s) | ### Profiling | Variable | Default | Description | @@ -153,73 +151,126 @@ perf stat uses `PROFILING_DELAY_SECONDS` and `PROFILING_DURATION_SECONDS`. ## Example Runs -### Basic comparison: custom vs default scheduler +### Choosing CPU pinning with `lscpu -e` + +Good benchmarking requires NUMA-aware CPU pinning. Start by inspecting your topology: ```bash -# With custom scheduler -JAVA_HOME=/path/to/jdk \ -SERVER_USE_CUSTOM_SCHEDULER=true \ -./run-benchmark.sh +$ lscpu -e +CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE + 0 0 0 0 0:0:0:0 yes # NUMA 0, physical core 0 + 1 0 0 1 1:1:1:0 yes # NUMA 0, physical core 1 + ... + 8 1 0 8 8:8:8:1 yes # NUMA 1, physical core 8 + ... + 16 0 0 0 0:0:0:0 yes # NUMA 0, SMT sibling of core 0 + 17 0 0 1 1:1:1:0 yes # NUMA 0, SMT sibling of core 1 +``` + +Key rules: +- **Keep all benchmark components on the same NUMA node** to avoid cross-node memory latency +- **Use physical cores only** (avoid SMT siblings) for more stable results +- **Isolate noisy processes** (IDEs, browsers) on the other NUMA node + +Example layout for a 16-core/2-NUMA system with 4 server threads: + +| Component | CPUs | Rationale | +|-----------|------|-----------| +| Load generator | 0-1 | 2 physical cores, enough to saturate | +| Mock server | 2-3 | 2 physical cores for backend simulation | +| Handoff server | 4-7 | 4 physical cores, one per event loop thread | +| Other processes | 8-15 | Isolated on NUMA node 1 | + +### NETTY_SCHEDULER with 4 threads + +```bash +./run-benchmark.sh --mode NETTY_SCHEDULER --threads 4 --io nio \ + --server-cpuset "4-7" --mock-cpuset "2-3" --load-cpuset "0-1" \ + --jvm-args "-Xms8g -Xmx8g" \ + --connections 10000 --load-threads 4 \ + --mock-think-time 30 --mock-threads 4 \ + --perf-stat +``` + +### Analyzing bottlenecks with perf stat + +Use `--perf-stat` to get reliable hardware-level metrics. The `perf-stat.txt` output is the +ground truth for CPU utilization — pidstat per-thread numbers can be misleading with virtual threads. -# With default scheduler -JAVA_HOME=/path/to/jdk \ -SERVER_USE_CUSTOM_SCHEDULER=false \ -./run-benchmark.sh ``` +Performance counter stats for process id '95868': + + 39,741,757,754 task-clock # 3.970 CPUs utilized + 806 context-switches # 20.281 /sec + 199,114,762,646 instructions # 1.17 insn per cycle + 1,338,722,757 branch-misses # 3.08% of all branches +``` + +Key metrics to watch: +- **CPUs utilized**: how many cores the server is actually using (3.97 of 4 = fully saturated) +- **Context switches/sec**: lower is better; custom scheduler typically achieves 20-80/sec +- **IPC (insn per cycle)**: higher is better; >1.0 is good, <0.5 suggests memory stalls +- **Branch misses**: >5% suggests unpredictable control flow -### With CPU pinning +If CPUs utilized equals your allocated core count, the server is CPU-bound — add more cores. +If context switches are high (>10K/sec), the scheduler or OS is thrashing. + +pidstat is still useful for spotting **mock server or load generator bottlenecks** — +check `pidstat-mock.log` and `pidstat-loadgen.log` to ensure they aren't saturated. + +### NON_VIRTUAL_NETTY (default mode) ```bash -JAVA_HOME=/path/to/jdk \ -MOCK_TASKSET="0" \ -SERVER_TASKSET="1-4" \ -LOAD_GEN_TASKSET="5-7" \ -SERVER_THREADS=4 \ -SERVER_USE_CUSTOM_SCHEDULER=true \ -./run-benchmark.sh +./run-benchmark.sh --threads 4 \ + --server-cpuset "4-7" --mock-cpuset "2-3" --load-cpuset "0-1" \ + --connections 10000 --mock-think-time 30 ``` -### With profiling +### VIRTUAL_NETTY mode ```bash -JAVA_HOME=/path/to/jdk \ -ENABLE_PROFILER=true \ -ASYNC_PROFILER_PATH=/path/to/async-profiler \ -PROFILER_EVENT=cpu \ -SERVER_USE_CUSTOM_SCHEDULER=true \ -WARMUP_DURATION=15s \ -TOTAL_DURATION=45s \ -./run-benchmark.sh +./run-benchmark.sh --mode VIRTUAL_NETTY --threads 4 --io nio \ + --server-cpuset "4-7" --mock-cpuset "2-3" --load-cpuset "0-1" \ + --connections 10000 --mock-think-time 30 ``` -### With JFR events enabled (subset) +### Mockless mode (skip HTTP call to mock, inline Jackson work) ```bash -JAVA_HOME=/path/to/jdk \ -ENABLE_JFR=true \ -JFR_EVENTS=NettyRunIo,VirtualThreadTaskRuns \ -SERVER_USE_CUSTOM_SCHEDULER=true \ -./run-benchmark.sh +./run-benchmark.sh --mode NETTY_SCHEDULER --threads 4 --mockless \ + --server-cpuset "4-7" --load-cpuset "0-1" \ + --connections 10000 +``` + +### With async-profiler + +```bash +./run-benchmark.sh --mode NETTY_SCHEDULER --threads 4 \ + --server-cpuset "4-7" --mock-cpuset "2-3" --load-cpuset "0-1" \ + --profiler --profiler-path /path/to/async-profiler \ + --warmup 15s --total-duration 45s ``` ### Rate-limited test with wrk2 ```bash -JAVA_HOME=/path/to/jdk \ -LOAD_GEN_RATE=10000 \ -LOAD_GEN_CONNECTIONS=200 \ -TOTAL_DURATION=60s \ -WARMUP_DURATION=15s \ -./run-benchmark.sh +./run-benchmark.sh --mode NETTY_SCHEDULER --threads 4 \ + --server-cpuset "4-7" --mock-cpuset "2-3" --load-cpuset "0-1" \ + --rate 120000 --connections 10000 --total-duration 60s --warmup 15s ``` -### With pidstat monitoring +### With JFR events ```bash -JAVA_HOME=/path/to/jdk \ -ENABLE_PIDSTAT=true \ -PIDSTAT_INTERVAL=1 \ -./run-benchmark.sh +./run-benchmark.sh --mode NETTY_SCHEDULER --threads 4 \ + --server-cpuset "4-7" --mock-cpuset "2-3" --load-cpuset "0-1" \ + --jfr --jfr-events NettyRunIo,VirtualThreadTaskRuns +``` + +### Mixed: CLI flags + env vars + +```bash +SERVER_JVM_ARGS="-XX:+PrintGCDetails" ./run-benchmark.sh --mode VIRTUAL_NETTY --threads 2 ``` ## Output @@ -260,7 +311,7 @@ java -cp benchmark-runner/target/benchmark-runner.jar \ 8080 1 # port, thinkTimeMs (threads defaults to available processors) ``` -### Handoff Server (with custom scheduler) +### Handoff Server (custom scheduler mode) ```bash java \ @@ -275,11 +326,11 @@ java \ --port 8081 \ --mock-url http://localhost:8080/fruits \ --threads 2 \ - --use-custom-scheduler true \ + --mode netty_scheduler \ --io epoll ``` -### Handoff Server (with default scheduler) +### Handoff Server (default split topology) ```bash java \ @@ -292,6 +343,5 @@ java \ --port 8081 \ --mock-url http://localhost:8080/fruits \ --threads 2 \ - --use-custom-scheduler false \ --io epoll ``` diff --git a/benchmark-runner/scripts/run-benchmark.sh b/benchmark-runner/scripts/run-benchmark.sh index 40944f7..d437db5 100755 --- a/benchmark-runner/scripts/run-benchmark.sh +++ b/benchmark-runner/scripts/run-benchmark.sh @@ -25,22 +25,21 @@ JAVA_OPTS="${JAVA_OPTS:--Xms1g -Xmx1g}" MOCK_PORT="${MOCK_PORT:-8080}" MOCK_THINK_TIME_MS="${MOCK_THINK_TIME_MS:-1}" MOCK_THREADS="${MOCK_THREADS:-1}" -MOCK_TASKSET="${MOCK_TASKSET:-4,5}" # CPUs for mock server +MOCK_CPUSET="${MOCK_CPUSET:-4,5}" # CPUs for mock server # Handoff server configuration SERVER_PORT="${SERVER_PORT:-8081}" SERVER_THREADS="${SERVER_THREADS:-2}" -SERVER_USE_CUSTOM_SCHEDULER="${SERVER_USE_CUSTOM_SCHEDULER:-false}" SERVER_IO="${SERVER_IO:-epoll}" -SERVER_TASKSET="${SERVER_TASKSET:-2,3}" # CPUs for handoff server +SERVER_CPUSET="${SERVER_CPUSET:-2,3}" # CPUs for handoff server SERVER_JVM_ARGS="${SERVER_JVM_ARGS:-}" -SERVER_POLLER_MODE="${SERVER_POLLER_MODE:-3}" # jdk.pollerMode value (1, 2, or 3) +SERVER_POLLER_MODE="${SERVER_POLLER_MODE:-}" # jdk.pollerMode value (1, 2, or 3); empty = JVM default, NETTY_SCHEDULER defaults to 3 SERVER_FJ_PARALLELISM="${SERVER_FJ_PARALLELISM:-}" # ForkJoinPool parallelism (empty = JVM default) -SERVER_NO_TIMEOUT="${SERVER_NO_TIMEOUT:-false}" # Disable HTTP client timeout -SERVER_REACTIVE="${SERVER_REACTIVE:-false}" # Use reactive handler with Project Reactor +SERVER_MODE="${SERVER_MODE:-NON_VIRTUAL_NETTY}" # Server mode: NON_VIRTUAL_NETTY, REACTIVE, VIRTUAL_NETTY +SERVER_MOCKLESS="${SERVER_MOCKLESS:-false}" # Skip mock server; do Jackson work inline # Load generator configuration -LOAD_GEN_TASKSET="${LOAD_GEN_TASKSET:-0,1}" # CPUs for load generator +LOAD_GEN_CPUSET="${LOAD_GEN_CPUSET:-0,1}" # CPUs for load generator LOAD_GEN_CONNECTIONS="${LOAD_GEN_CONNECTIONS:-100}" LOAD_GEN_THREADS="${LOAD_GEN_THREADS:-2}" LOAD_GEN_DURATION="${LOAD_GEN_DURATION:-30s}" @@ -57,6 +56,7 @@ PROFILING_DURATION_SECONDS="${PROFILING_DURATION_SECONDS:-10}" # Profiling configuration ENABLE_PROFILER="${ENABLE_PROFILER:-false}" PROFILER_EVENT="${PROFILER_EVENT:-cpu}" +PROFILER_FORMAT="${PROFILER_FORMAT:-flamegraph}" # Output format: flamegraph, collapsed, jfr PROFILER_OUTPUT="${PROFILER_OUTPUT:-profile.html}" ASYNC_PROFILER_PATH="${ASYNC_PROFILER_PATH:-}" # Path to async-profiler @@ -346,7 +346,7 @@ build_jars() { start_mock_server() { log "Starting mock HTTP server..." - local taskset_cmd=$(build_taskset_cmd "$MOCK_TASKSET") + local taskset_cmd=$(build_taskset_cmd "$MOCK_CPUSET") local java_cmd="$JAVA_HOME/bin/java" local mock_threads_arg="" @@ -373,7 +373,7 @@ start_mock_server() { start_handoff_server() { log "Starting handoff HTTP server..." - local taskset_cmd=$(build_taskset_cmd "$SERVER_TASKSET") + local taskset_cmd=$(build_taskset_cmd "$SERVER_CPUSET") local java_cmd="$JAVA_HOME/bin/java" # Build JVM args @@ -382,9 +382,19 @@ start_handoff_server() { jvm_args="$jvm_args -XX:-DoJVMTIVirtualThreadTransitions" jvm_args="$jvm_args -Djdk.trackAllThreads=false" - if [[ "$SERVER_USE_CUSTOM_SCHEDULER" == "true" ]]; then - jvm_args="$jvm_args -Djdk.virtualThreadScheduler.implClass=io.netty.loom.NettyScheduler" - jvm_args="$jvm_args -Djdk.pollerMode=$SERVER_POLLER_MODE" + # Mode-specific JVM args + local poller_mode="$SERVER_POLLER_MODE" + case "$SERVER_MODE" in + NETTY_SCHEDULER) + jvm_args="$jvm_args -Djdk.virtualThreadScheduler.implClass=io.netty.loom.NettyScheduler" + # Default pollerMode to 3 for custom scheduler if not explicitly set + poller_mode="${poller_mode:-3}" + ;; + esac + + # Apply pollerMode if set (explicitly or via mode default) + if [[ -n "$poller_mode" ]]; then + jvm_args="$jvm_args -Djdk.pollerMode=$poller_mode" fi if [[ -n "$SERVER_FJ_PARALLELISM" ]]; then @@ -402,15 +412,19 @@ start_handoff_server() { jvm_args="$jvm_args $SERVER_JVM_ARGS" fi + local mockless_flag="" + if [[ "$SERVER_MOCKLESS" == "true" ]]; then + mockless_flag="--mockless" + fi + local cmd="$taskset_cmd $java_cmd $JAVA_OPTS $jvm_args -cp $RUNNER_JAR \ io.netty.loom.benchmark.runner.HandoffHttpServer \ --port $SERVER_PORT \ --mock-url http://localhost:$MOCK_PORT/fruits \ --threads $SERVER_THREADS \ - --use-custom-scheduler $SERVER_USE_CUSTOM_SCHEDULER \ --io $SERVER_IO \ - --no-timeout $SERVER_NO_TIMEOUT \ - --reactive $SERVER_REACTIVE \ + --mode $SERVER_MODE \ + $mockless_flag \ --silent" log "Handoff server command: $cmd" @@ -435,7 +449,7 @@ run_warmup() { log "Running warmup for $WARMUP_DURATION..." - local taskset_cmd=$(build_taskset_cmd "$LOAD_GEN_TASKSET") + local taskset_cmd=$(build_taskset_cmd "$LOAD_GEN_CPUSET") # Use wrk for warmup (no rate limiting) $taskset_cmd jbang wrk@hyperfoil \ @@ -467,7 +481,8 @@ start_profiler() { ( sleep "$PROFILING_DELAY_SECONDS" - "$asprof" --threads -e "$PROFILER_EVENT" -o flamegraph -d "$PROFILING_DURATION_SECONDS" -f "$output_file" "$SERVER_PID" + # --record-cpu + "$asprof" --threads -e "$PROFILER_EVENT" -o "$PROFILER_FORMAT" -d "$PROFILING_DURATION_SECONDS" -f "$output_file" "$SERVER_PID" ) & PROFILER_PID=$! @@ -566,14 +581,16 @@ start_pidstat() { log "pidstat running (PID: $PIDSTAT_PID)" - log "Starting pidstat for mock server (PID: $MOCK_PID)..." + if [[ -n "${MOCK_PID:-}" ]]; then + log "Starting pidstat for mock server (PID: $MOCK_PID)..." - local mock_output_file="$OUTPUT_DIR/$PIDSTAT_MOCK_OUTPUT" + local mock_output_file="$OUTPUT_DIR/$PIDSTAT_MOCK_OUTPUT" - pidstat -p "$MOCK_PID" "$PIDSTAT_INTERVAL" > "$mock_output_file" 2>&1 & - PIDSTAT_MOCK_PID=$! + pidstat -p "$MOCK_PID" "$PIDSTAT_INTERVAL" > "$mock_output_file" 2>&1 & + PIDSTAT_MOCK_PID=$! - log "pidstat running for mock server (PID: $PIDSTAT_MOCK_PID)" + log "pidstat running for mock server (PID: $PIDSTAT_MOCK_PID)" + fi } stop_pidstat() { @@ -637,7 +654,7 @@ run_load_test() { log "Running load test for ${test_secs}s..." - local taskset_cmd=$(build_taskset_cmd "$LOAD_GEN_TASKSET") + local taskset_cmd=$(build_taskset_cmd "$LOAD_GEN_CPUSET") local output_file="$OUTPUT_DIR/wrk-results.txt" if [[ -n "$LOAD_GEN_RATE" ]]; then @@ -695,25 +712,28 @@ print_config() { log " Port: $MOCK_PORT" log " Think Time: ${MOCK_THINK_TIME_MS}ms" log " Threads: ${MOCK_THREADS:-}" - log " CPU Affinity: ${MOCK_TASKSET:-}" + log " CPU Affinity: ${MOCK_CPUSET:-}" log "" log "Handoff Server:" log " Port: $SERVER_PORT" log " Threads: $SERVER_THREADS" - log " Reactive: $SERVER_REACTIVE" - log " Custom Sched: $SERVER_USE_CUSTOM_SCHEDULER" + log " Mode: $SERVER_MODE" + log " Mockless: $SERVER_MOCKLESS" log " I/O Type: $SERVER_IO" - log " No Timeout: $SERVER_NO_TIMEOUT" - log " Poller Mode: $SERVER_POLLER_MODE" + local effective_poller="$SERVER_POLLER_MODE" + if [[ -z "$effective_poller" && "$SERVER_MODE" == "NETTY_SCHEDULER" ]]; then + effective_poller="3 (default for NETTY_SCHEDULER)" + fi + log " Poller Mode: ${effective_poller:-}" log " FJ Parallelism: ${SERVER_FJ_PARALLELISM:-}" - log " CPU Affinity: ${SERVER_TASKSET:-}" + log " CPU Affinity: ${SERVER_CPUSET:-}" log " Extra JVM Args: ${SERVER_JVM_ARGS:-}" log "" log "Load Generator:" log " Connections: $LOAD_GEN_CONNECTIONS" log " Threads: $LOAD_GEN_THREADS" log " Rate: ${LOAD_GEN_RATE:-}" - log " CPU Affinity: ${LOAD_GEN_TASKSET:-}" + log " CPU Affinity: ${LOAD_GEN_CPUSET:-}" log "" log "Timing:" log " Warmup: $WARMUP_DURATION" @@ -766,120 +786,124 @@ print_config() { # ============================================================================ main() { - # Parse command line arguments + # Parse command line arguments (override env vars) while [[ $# -gt 0 ]]; do case "$1" in + # Server + --mode) SERVER_MODE="$2"; shift 2 ;; + --threads) SERVER_THREADS="$2"; shift 2 ;; + --mockless) SERVER_MOCKLESS=true; shift ;; + --io) SERVER_IO="$2"; shift 2 ;; + --poller-mode) SERVER_POLLER_MODE="$2"; shift 2 ;; + --fj-parallelism) SERVER_FJ_PARALLELISM="$2"; shift 2 ;; + --server-cpuset) SERVER_CPUSET="$2"; shift 2 ;; + --jvm-args) SERVER_JVM_ARGS="$2"; shift 2 ;; + # Mock + --mock-port) MOCK_PORT="$2"; shift 2 ;; + --mock-think-time) MOCK_THINK_TIME_MS="$2"; shift 2 ;; + --mock-threads) MOCK_THREADS="$2"; shift 2 ;; + --mock-cpuset) MOCK_CPUSET="$2"; shift 2 ;; + # Load generator + --connections) LOAD_GEN_CONNECTIONS="$2"; shift 2 ;; + --load-threads) LOAD_GEN_THREADS="$2"; shift 2 ;; + --duration) LOAD_GEN_DURATION="$2"; shift 2 ;; + --rate) LOAD_GEN_RATE="$2"; shift 2 ;; + --load-cpuset) LOAD_GEN_CPUSET="$2"; shift 2 ;; + # Timing + --warmup) WARMUP_DURATION="$2"; shift 2 ;; + --total-duration) TOTAL_DURATION="$2"; shift 2 ;; + # Profiling + --profiler) ENABLE_PROFILER=true; shift ;; + --profiler-path) ASYNC_PROFILER_PATH="$2"; shift 2 ;; + --profiler-event) PROFILER_EVENT="$2"; shift 2 ;; + --jfr) ENABLE_JFR=true; shift ;; + --jfr-events) JFR_EVENTS="$2"; shift 2 ;; + --perf-stat) ENABLE_PERF_STAT=true; shift ;; + --perf-stat-args) PERF_STAT_ARGS="$2"; shift 2 ;; + --no-pidstat) ENABLE_PIDSTAT=false; shift ;; + # Output + --output-dir) OUTPUT_DIR="$2"; shift 2 ;; + # Help --help|-h) cat << 'EOF' Benchmark Runner Script Usage: ./run-benchmark.sh [OPTIONS] -Environment Variables (can also be set via command line options): +All options can also be set via environment variables (shown in parentheses). +CLI flags take precedence over environment variables. + +Server: + --mode Server mode (SERVER_MODE, default: NON_VIRTUAL_NETTY) + Modes: NON_VIRTUAL_NETTY, REACTIVE, VIRTUAL_NETTY, NETTY_SCHEDULER + --threads Event loop threads (SERVER_THREADS, default: 2) + --mockless Skip mock server, inline Jackson work (SERVER_MOCKLESS) + --io I/O type: epoll, nio, io_uring (SERVER_IO, default: epoll) + --poller-mode jdk.pollerMode: 1, 2, or 3 (SERVER_POLLER_MODE) + --fj-parallelism ForkJoinPool parallelism (SERVER_FJ_PARALLELISM) + --server-cpuset Server CPU pinning, e.g. "2,3" (SERVER_CPUSET, default: 2,3) + --jvm-args Additional JVM arguments (SERVER_JVM_ARGS) Mock Server: - MOCK_PORT Mock server port (default: 8080) - MOCK_THINK_TIME_MS Response delay in ms (default: 1) - MOCK_THREADS Number of threads (default: auto = available processors) - MOCK_TASKSET CPU affinity range (default: "4,5,6,7") - -Handoff Server: - SERVER_PORT Server port (default: 8081) - SERVER_THREADS Number of event loop threads (default: 2) - SERVER_REACTIVE Use reactive handler with Reactor (default: false) - SERVER_USE_CUSTOM_SCHEDULER Use custom Netty scheduler (default: false) - SERVER_IO I/O type: epoll, nio, or io_uring (default: epoll) - SERVER_NO_TIMEOUT Disable HTTP client timeout (default: false) - SERVER_TASKSET CPU affinity range (default: "2,3") - SERVER_JVM_ARGS Additional JVM arguments - SERVER_POLLER_MODE jdk.pollerMode value: 1, 2, or 3 (default: 3) - SERVER_FJ_PARALLELISM ForkJoinPool parallelism (empty = JVM default) + --mock-port Mock server port (MOCK_PORT, default: 8080) + --mock-think-time Response delay in ms (MOCK_THINK_TIME_MS, default: 1) + --mock-threads Number of threads (MOCK_THREADS, default: 1) + --mock-cpuset Mock server CPU pinning (MOCK_CPUSET, default: 4,5) Load Generator: - LOAD_GEN_CONNECTIONS Number of connections (default: 100) - LOAD_GEN_THREADS Number of threads (default: 2) - LOAD_GEN_DURATION Test duration (default: 30s) - LOAD_GEN_RATE Target rate for wrk2 (empty = use wrk) - LOAD_GEN_TASKSET CPU affinity range (default: "0,1") + --connections Number of connections (LOAD_GEN_CONNECTIONS, default: 100) + --load-threads Number of threads (LOAD_GEN_THREADS, default: 2) + --duration Test duration (LOAD_GEN_DURATION, default: 30s) + --rate Target rate for wrk2; omit for max throughput (LOAD_GEN_RATE) + --load-cpuset Load generator CPU pinning (LOAD_GEN_CPUSET, default: 0,1) Timing: - WARMUP_DURATION Warmup duration (default: 10s) - TOTAL_DURATION Total test duration (default: 30s, must keep steady state >= 20s) - PROFILING_DELAY_SECONDS Profiling start delay in seconds (default: 5) - PROFILING_DURATION_SECONDS Profiling duration in seconds (default: 10) + --warmup Warmup duration (WARMUP_DURATION, default: 10s) + --total-duration Total test duration (TOTAL_DURATION, default: 30s) Profiling: - ENABLE_PROFILER Enable async-profiler (default: false) - ASYNC_PROFILER_PATH Path to async-profiler installation - PROFILER_EVENT Profiler event type (default: cpu) - PROFILER_OUTPUT Profiler output file (default: profile.html) - Note: profiling uses PROFILING_DELAY_SECONDS and PROFILING_DURATION_SECONDS. - -JFR: - ENABLE_JFR Enable Netty Loom JFR events (default: false) - JFR_EVENTS Comma-separated event list or "all" (default: all) - Options: NettyRunIo, NettyRunTasks, - VirtualThreadTaskRuns, VirtualThreadTaskRun, - VirtualThreadTaskSubmit - JFR_SETTINGS_FILE Path to a JFR settings (.jfc) file (default: auto) - JFR_OUTPUT JFR output file (default: netty-loom.jfr) - JFR_RECORDING_NAME JFR recording name (default: netty-loom-benchmark) - JFR_TIMELINE_OUTPUT Timeline output file (default: netty-loom-timeline.jsonl, empty = skip export) - Note: JFR uses PROFILING_DELAY_SECONDS and PROFILING_DURATION_SECONDS. - -pidstat: - ENABLE_PIDSTAT Enable pidstat collection (default: true) - PIDSTAT_INTERVAL Collection interval in seconds (default: 1) - PIDSTAT_OUTPUT Output file (default: pidstat.log) - PIDSTAT_MOCK_OUTPUT Mock server output file (default: pidstat-mock.log) - PIDSTAT_LOAD_GEN_OUTPUT Load generator output file (default: pidstat-loadgen.log) - PIDSTAT_HANDOFF_DETAILED Include per-thread detail for handoff server (default: true) - -perf stat: - ENABLE_PERF_STAT Enable perf stat collection (default: false) - PERF_STAT_OUTPUT Output file (default: perf-stat.txt) - PERF_STAT_ARGS Extra perf stat arguments (default: empty) - -General: + --profiler Enable async-profiler (ENABLE_PROFILER) + --profiler-path Path to async-profiler (ASYNC_PROFILER_PATH) + --profiler-event Profiler event type (PROFILER_EVENT, default: cpu) + --jfr Enable JFR events (ENABLE_JFR) + --jfr-events Comma-separated JFR events or "all" (JFR_EVENTS, default: all) + --perf-stat Enable perf stat (ENABLE_PERF_STAT) + --perf-stat-args Extra perf stat arguments (PERF_STAT_ARGS) + --no-pidstat Disable pidstat collection (ENABLE_PIDSTAT) + +Output: + --output-dir Output directory (OUTPUT_DIR, default: ./benchmark-results) + +Environment-only settings: JAVA_HOME Path to Java installation (required) - OUTPUT_DIR Output directory (default: ./benchmark-results) - CONFIG_OUTPUT Configuration output filename (default: benchmark-config.txt) + JAVA_OPTS JVM options (default: -Xms1g -Xmx1g) + PROFILING_DELAY_SECONDS Profiling start delay (default: 10) + PROFILING_DURATION_SECONDS Profiling duration (default: 10) Examples: - # Basic run with custom scheduler - JAVA_HOME=/path/to/jdk SERVER_USE_CUSTOM_SCHEDULER=true ./run-benchmark.sh - - # Run with CPU pinning and profiling - JAVA_HOME=/path/to/jdk \ - MOCK_TASKSET="0" \ - SERVER_TASKSET="1-2" \ - LOAD_GEN_TASKSET="3" \ - ENABLE_PROFILER=true \ - ASYNC_PROFILER_PATH=/path/to/async-profiler \ - ./run-benchmark.sh - - # Rate-limited test with wrk2 - JAVA_HOME=/path/to/jdk \ - LOAD_GEN_RATE=10000 \ - TOTAL_DURATION=60s \ - WARMUP_DURATION=15s \ - ./run-benchmark.sh - - # Reactive handler test - JAVA_HOME=/path/to/jdk \ - SERVER_REACTIVE=true \ - SERVER_THREADS=2 \ - ./run-benchmark.sh + # Virtual Netty mode, mockless + ./run-benchmark.sh --mode virtual_netty --threads 2 --mockless + + # With CPU pinning + ./run-benchmark.sh --mode netty_scheduler --threads 4 \ + --server-cpuset 2,3 --mock-cpuset 4,5 --load-cpuset 0,1 + + # With profiling + ./run-benchmark.sh --mode netty_scheduler --profiler --profiler-path /path/to/ap + # Rate-limited test + ./run-benchmark.sh --rate 10000 --total-duration 60s --warmup 15s + + # JVM args override + ./run-benchmark.sh --mode virtual_netty --jvm-args "-XX:+PrintGCDetails" EOF exit 0 ;; *) - error "Unknown option: $1" + error "Unknown option: $1. Use --help for usage." ;; esac - shift done # Validate configuration @@ -898,7 +922,11 @@ EOF build_jars # Start servers - start_mock_server + if [[ "$SERVER_MOCKLESS" != "true" ]]; then + start_mock_server + else + log "Mockless mode: skipping mock server" + fi start_handoff_server # Run warmup (no profiling/pidstat) diff --git a/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/HandoffHttpServer.java b/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/HandoffHttpServer.java index 8fdc450..cb231d0 100644 --- a/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/HandoffHttpServer.java +++ b/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/HandoffHttpServer.java @@ -37,8 +37,8 @@ import io.netty.handler.codec.http.HttpServerCodec; import io.netty.handler.codec.http.HttpUtil; import io.netty.handler.codec.http.HttpVersion; -import io.netty.loom.EventLoopSchedulerType; import io.netty.loom.VirtualMultithreadIoEventLoopGroup; +import io.netty.loom.VirtualMultithreadManualIoEventLoopGroup; import io.netty.util.AsciiString; import io.netty.util.CharsetUtil; import org.apache.hc.client5.http.ConnectionKeepAliveStrategy; @@ -72,7 +72,6 @@ *

* Usage: java -cp benchmark-runner.jar * io.netty.loom.benchmark.runner.HandoffHttpServer \ --port 8081 \ --mock-url - * http://localhost:8080/fruits \ --threads 2 \ --use-custom-scheduler true \ * --io epoll */ public class HandoffHttpServer { @@ -81,54 +80,53 @@ public enum IO { EPOLL, NIO, IO_URING } + public enum Mode { + NON_VIRTUAL_NETTY, REACTIVE, VIRTUAL_NETTY, NETTY_SCHEDULER + } + private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); private static final ByteBuf HEALTH_RESPONSE = Unpooled .unreleasableBuffer(Unpooled.copiedBuffer("OK", CharsetUtil.UTF_8)); + // Same JSON as MockHttpServer.CACHED_RESPONSE — used in mockless mode + private static final byte[] CACHED_JSON_BYTES = """ + { + "fruits": [ + {"name": "Apple", "color": "Red", "price": 1.20}, + {"name": "Banana", "color": "Yellow", "price": 0.50}, + {"name": "Orange", "color": "Orange", "price": 0.80}, + {"name": "Grape", "color": "Purple", "price": 2.00}, + {"name": "Mango", "color": "Yellow", "price": 1.50}, + {"name": "Strawberry", "color": "Red", "price": 3.00}, + {"name": "Blueberry", "color": "Blue", "price": 4.00}, + {"name": "Pineapple", "color": "Yellow", "price": 2.50}, + {"name": "Watermelon", "color": "Green", "price": 5.00}, + {"name": "Kiwi", "color": "Brown", "price": 1.00} + ] + } + """.getBytes(java.nio.charset.StandardCharsets.UTF_8); + private final int port; private final String mockUrl; private final int threads; - private final boolean useCustomScheduler; private final IO io; private final boolean silent; - private final boolean noTimeout; - private final boolean useReactive; - private final EventLoopSchedulerType schedulerType; + private final boolean mockless; + private final Mode mode; private MultiThreadIoEventLoopGroup workerGroup; private Channel serverChannel; private Supplier threadFactorySupplier; - public HandoffHttpServer(int port, String mockUrl, int threads, boolean useCustomScheduler, IO io) { - this(port, mockUrl, threads, useCustomScheduler, io, false, false, false); - } - - public HandoffHttpServer(int port, String mockUrl, int threads, boolean useCustomScheduler, IO io, boolean silent) { - this(port, mockUrl, threads, useCustomScheduler, io, silent, false, false); - } - - public HandoffHttpServer(int port, String mockUrl, int threads, boolean useCustomScheduler, IO io, boolean silent, - boolean noTimeout) { - this(port, mockUrl, threads, useCustomScheduler, io, silent, noTimeout, false); - } - - public HandoffHttpServer(int port, String mockUrl, int threads, boolean useCustomScheduler, IO io, boolean silent, - boolean noTimeout, boolean useReactive) { - this(port, mockUrl, threads, useCustomScheduler, io, silent, noTimeout, useReactive, - EventLoopSchedulerType.FIFO); - } - - public HandoffHttpServer(int port, String mockUrl, int threads, boolean useCustomScheduler, IO io, boolean silent, - boolean noTimeout, boolean useReactive, EventLoopSchedulerType schedulerType) { + public HandoffHttpServer(int port, String mockUrl, int threads, IO io, boolean silent, boolean mockless, + Mode mode) { this.port = port; this.mockUrl = mockUrl; this.threads = threads; - this.useCustomScheduler = useCustomScheduler; this.io = io; this.silent = silent; - this.noTimeout = noTimeout; - this.useReactive = useReactive; - this.schedulerType = schedulerType == null ? EventLoopSchedulerType.FIFO : schedulerType; + this.mockless = mockless; + this.mode = mode; } public void start() throws InterruptedException { @@ -138,26 +136,48 @@ public void start() throws InterruptedException { case IO_URING -> IoUringIoHandler.newFactory(); }; - Class serverChannelClass = switch (io) { - case NIO -> NioServerSocketChannel.class; - case EPOLL -> EpollServerSocketChannel.class; - case IO_URING -> IoUringServerSocketChannel.class; - }; - - Class clientChannelClass = switch (io) { - case NIO -> io.netty.channel.socket.nio.NioSocketChannel.class; - case EPOLL -> io.netty.channel.epoll.EpollSocketChannel.class; - case IO_URING -> io.netty.channel.uring.IoUringSocketChannel.class; - }; - - if (useCustomScheduler) { - var group = new VirtualMultithreadIoEventLoopGroup(threads, ioHandlerFactory, schedulerType); - threadFactorySupplier = group::vThreadFactory; - workerGroup = group; - } else { - workerGroup = new MultiThreadIoEventLoopGroup(threads, ioHandlerFactory); - var defaultFactory = Thread.ofVirtual().factory(); - threadFactorySupplier = () -> defaultFactory; + final Class serverChannelClass; + final Class clientChannelClass; + + switch (mode) { + case VIRTUAL_NETTY -> { + var group = new VirtualMultithreadManualIoEventLoopGroup(threads, NioIoHandler.newFactory()); + workerGroup = group; + var defaultFactory = Thread.ofVirtual().factory(); + threadFactorySupplier = () -> defaultFactory; + serverChannelClass = NioServerSocketChannel.class; + clientChannelClass = io.netty.channel.socket.nio.NioSocketChannel.class; + } + case NETTY_SCHEDULER -> { + serverChannelClass = switch (io) { + case NIO -> NioServerSocketChannel.class; + case EPOLL -> EpollServerSocketChannel.class; + case IO_URING -> IoUringServerSocketChannel.class; + }; + clientChannelClass = switch (io) { + case NIO -> io.netty.channel.socket.nio.NioSocketChannel.class; + case EPOLL -> io.netty.channel.epoll.EpollSocketChannel.class; + case IO_URING -> io.netty.channel.uring.IoUringSocketChannel.class; + }; + var group = new VirtualMultithreadIoEventLoopGroup(threads, ioHandlerFactory); + threadFactorySupplier = group::vThreadFactory; + workerGroup = group; + } + default -> { + serverChannelClass = switch (io) { + case NIO -> NioServerSocketChannel.class; + case EPOLL -> EpollServerSocketChannel.class; + case IO_URING -> IoUringServerSocketChannel.class; + }; + clientChannelClass = switch (io) { + case NIO -> io.netty.channel.socket.nio.NioSocketChannel.class; + case EPOLL -> io.netty.channel.epoll.EpollSocketChannel.class; + case IO_URING -> io.netty.channel.uring.IoUringSocketChannel.class; + }; + workerGroup = new MultiThreadIoEventLoopGroup(threads, ioHandlerFactory); + var defaultFactory = Thread.ofVirtual().factory(); + threadFactorySupplier = () -> defaultFactory; + } } ServerBootstrap b = new ServerBootstrap(); b.group(workerGroup).channel(serverChannelClass).childOption(ChannelOption.TCP_NODELAY, true) @@ -167,8 +187,8 @@ protected void initChannel(SocketChannel ch) { ChannelPipeline p = ch.pipeline(); p.addLast(new HttpServerCodec()); p.addLast(new HttpObjectAggregator(65536)); - if (useReactive) { - p.addLast(new ReactiveHandoffHandler(mockUrl, noTimeout, clientChannelClass)); + if (mode == Mode.REACTIVE) { + p.addLast(new ReactiveHandoffHandler(mockUrl, clientChannelClass)); } else { p.addLast(new HandoffHandler()); } @@ -178,17 +198,18 @@ protected void initChannel(SocketChannel ch) { serverChannel = b.bind(port).sync().channel(); if (!silent) { System.out.printf("Handoff HTTP Server started on port %d%n", port); - System.out.printf(" Mode: %s%n", useReactive ? "Reactive (Project Reactor)" : "Virtual Thread"); - System.out.printf(" Mock URL: %s%n", mockUrl); - System.out.printf(" Threads: %d%n", threads); - if (!useReactive) { - System.out.printf(" Custom Scheduler: %s%n", useCustomScheduler); - if (useCustomScheduler) { - System.out.printf(" Scheduler Type: %s%n", schedulerType); - } + System.out.printf(" Mode: %s%n", switch (mode) { + case NON_VIRTUAL_NETTY -> "Non-Virtual Netty (platform IO + VT blocking)"; + case REACTIVE -> "Reactive (pure async, no VTs)"; + case VIRTUAL_NETTY -> "Virtual Netty (IO loops as VTs on ForkJoinPool)"; + case NETTY_SCHEDULER -> "Netty Scheduler (platform IO + Netty VT scheduler)"; + }); + System.out.printf(" Mockless: %s%n", mockless); + if (!mockless) { + System.out.printf(" Mock URL: %s%n", mockUrl); } - System.out.printf(" I/O: %s%n", io); - System.out.printf(" No Timeout: %s%n", noTimeout); + System.out.printf(" Threads: %d%n", threads); + System.out.printf(" I/O: %s%n", mode == Mode.VIRTUAL_NETTY ? "NIO (forced)" : io); } } @@ -218,12 +239,16 @@ private class HandoffHandler extends SimpleChannelInboundHandler TimeValue.NEG_ONE_MILLISECOND; - BasicHttpClientConnectionManager cm = new BasicHttpClientConnectionManager(); - RequestConfig requestConfig = noTimeout ? NO_TIMEOUT_HTTP_REQUEST_CONFIG : RequestConfig.DEFAULT; - httpClient = HttpClientBuilder.create().setConnectionManager(cm).setDefaultRequestConfig(requestConfig) - .setConnectionManagerShared(false).setKeepAliveStrategy(keepAliveStrategy).build(); + if (mockless) { + httpClient = null; + } else { + ConnectionKeepAliveStrategy keepAliveStrategy = (HttpResponse response, + HttpContext context) -> TimeValue.NEG_ONE_MILLISECOND; + BasicHttpClientConnectionManager cm = new BasicHttpClientConnectionManager(); + httpClient = HttpClientBuilder.create().setConnectionManager(cm) + .setDefaultRequestConfig(NO_TIMEOUT_HTTP_REQUEST_CONFIG).setConnectionManagerShared(false) + .setKeepAliveStrategy(keepAliveStrategy).build(); + } } @Override @@ -244,10 +269,16 @@ protected void channelRead0(ChannelHandlerContext ctx, FullHttpRequest request) } if (uri.equals("/") || uri.startsWith("/fruits")) { - // Hand off to virtual thread for blocking processing - orderedExecutorService.execute(() -> { - doBlockingProcessing(ctx, eventLoop, keepAlive); - }); + // Hand off to virtual thread for processing + if (mockless) { + orderedExecutorService.execute(() -> { + doMocklessProcessing(ctx, eventLoop, keepAlive); + }); + } else { + orderedExecutorService.execute(() -> { + doBlockingProcessing(ctx, eventLoop, keepAlive); + }); + } return; } @@ -294,6 +325,28 @@ private void doBlockingProcessing(ChannelHandlerContext ctx, IoEventLoop eventLo } } + private void doMocklessProcessing(ChannelHandlerContext ctx, IoEventLoop eventLoop, boolean keepAlive) { + try { + // Same Jackson work as doBlockingProcessing, without the HTTP call + FruitsResponse fruitsResponse = OBJECT_MAPPER.readValue(CACHED_JSON_BYTES, FruitsResponse.class); + byte[] responseBytes = OBJECT_MAPPER.writeValueAsBytes(fruitsResponse); + eventLoop.execute(() -> { + ByteBuf content = Unpooled.wrappedBuffer(responseBytes); + sendResponse(ctx, content, HttpHeaderValues.APPLICATION_JSON, keepAlive); + }); + } catch (Throwable e) { + eventLoop.execute(() -> { + ByteBuf content = Unpooled.copiedBuffer("{\"error\":\"" + e.getMessage() + "\"}", + CharsetUtil.UTF_8); + FullHttpResponse response = new DefaultFullHttpResponse(HttpVersion.HTTP_1_1, + HttpResponseStatus.INTERNAL_SERVER_ERROR, content); + response.headers().set(HttpHeaderNames.CONTENT_TYPE, HttpHeaderValues.APPLICATION_JSON); + response.headers().setInt(HttpHeaderNames.CONTENT_LENGTH, content.readableBytes()); + ctx.writeAndFlush(response).addListener(ChannelFutureListener.CLOSE); + }); + } + } + private void sendResponse(ChannelHandlerContext ctx, ByteBuf content, AsciiString contentType, boolean keepAlive) { FullHttpResponse response = new DefaultFullHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK, @@ -319,7 +372,9 @@ public void channelInactive(ChannelHandlerContext ctx) throws Exception { try { orderedExecutorService.execute(() -> { try { - httpClient.close(); + if (httpClient != null) { + httpClient.close(); + } } catch (IOException e) { } finally { orderedExecutorService.shutdown(); @@ -332,28 +387,23 @@ public void channelInactive(ChannelHandlerContext ctx) throws Exception { } public static void main(String[] args) throws InterruptedException { - // Parse arguments int port = 8081; String mockUrl = "http://localhost:8080/fruits"; int threads = 1; - boolean useCustomScheduler = false; IO io = IO.EPOLL; boolean silent = false; - boolean noTimeout = false; - boolean useReactive = false; - EventLoopSchedulerType schedulerType = EventLoopSchedulerType.FIFO; + boolean mockless = false; + Mode mode = Mode.NON_VIRTUAL_NETTY; for (int i = 0; i < args.length; i++) { switch (args[i]) { case "--port" -> port = Integer.parseInt(args[++i]); case "--mock-url" -> mockUrl = args[++i]; case "--threads" -> threads = Integer.parseInt(args[++i]); - case "--use-custom-scheduler" -> useCustomScheduler = Boolean.parseBoolean(args[++i]); - case "--scheduler-type" -> schedulerType = EventLoopSchedulerType.valueOf(args[++i].toUpperCase()); case "--io" -> io = IO.valueOf(args[++i].toUpperCase()); case "--silent" -> silent = true; - case "--no-timeout" -> noTimeout = Boolean.parseBoolean(args[++i]); - case "--reactive" -> useReactive = Boolean.parseBoolean(args[++i]); + case "--mockless" -> mockless = true; + case "--mode" -> mode = Mode.valueOf(args[++i].toUpperCase()); case "--help" -> { printUsage(); return; @@ -361,8 +411,7 @@ public static void main(String[] args) throws InterruptedException { } } - HandoffHttpServer server = new HandoffHttpServer(port, mockUrl, threads, useCustomScheduler, io, silent, - noTimeout, useReactive, schedulerType); + HandoffHttpServer server = new HandoffHttpServer(port, mockUrl, threads, io, silent, mockless, mode); server.start(); // Shutdown hook @@ -372,25 +421,24 @@ public static void main(String[] args) throws InterruptedException { } private static void printUsage() { - System.out.println( - """ - Usage: java -cp benchmark-runner.jar io.netty.loom.benchmark.runner.HandoffHttpServer [options] - - Options: - --port HTTP port (default: 8081) - --mock-url Mock server URL (default: http://localhost:8080/fruits) - --threads Number of event loop threads (default: 1) - --use-custom-scheduler Use custom Netty scheduler (default: false, ignored if --reactive is true) - --scheduler-type Scheduler type for custom scheduler (default: fifo) - --io I/O type (default: epoll) - --no-timeout Disable HTTP client timeout (default: false) - --reactive Use reactive handler with Reactor (default: false) - --silent Suppress output messages - --help Show this help - - Modes: - Virtual Thread (default): Uses virtual threads with blocking Apache HttpClient - Reactive (--reactive true): Uses Project Reactor with non-blocking Reactor Netty HTTP client - """); + System.out.println(""" + Usage: java -cp benchmark-runner.jar io.netty.loom.benchmark.runner.HandoffHttpServer [options] + + Options: + --port HTTP port (default: 8081) + --mock-url Mock server URL (default: http://localhost:8080/fruits) + --threads Number of event loop threads (default: 1) + --io I/O type (default: epoll) + --mockless Skip mock server; do Jackson work inline (default: off) + --mode Server mode (default: virtual_thread) + --silent Suppress output messages + --help Show this help + + Modes: + NON_VIRTUAL_NETTY (default): Platform thread IO + virtual thread blocking work + REACTIVE: Pure Netty async, no virtual threads + VIRTUAL_NETTY: Netty IO event loops as VTs on ForkJoinPool + NETTY_SCHEDULER: Platform thread IO + Netty custom VT scheduler + """); } } diff --git a/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/ReactiveHandoffHandler.java b/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/ReactiveHandoffHandler.java index 31eed11..0d96f18 100644 --- a/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/ReactiveHandoffHandler.java +++ b/benchmark-runner/src/main/java/io/netty/loom/benchmark/runner/ReactiveHandoffHandler.java @@ -40,13 +40,11 @@ import io.netty.handler.codec.http.HttpResponseStatus; import io.netty.handler.codec.http.HttpUtil; import io.netty.handler.codec.http.HttpVersion; -import io.netty.handler.timeout.ReadTimeoutHandler; import io.netty.util.AsciiString; import io.netty.util.CharsetUtil; import java.net.InetSocketAddress; import java.net.URI; -import java.util.concurrent.TimeUnit; /** * Pure Netty non-blocking HTTP handler. @@ -67,8 +65,6 @@ public class ReactiveHandoffHandler extends SimpleChannelInboundHandler channelClass; - private final boolean noTimeout; - // This handler owns exactly ONE client channel private Channel clientChannel; @@ -76,10 +72,9 @@ public class ReactiveHandoffHandler extends SimpleChannelInboundHandler channelClass) { + public ReactiveHandoffHandler(String mockUrl, Class channelClass) { this.mockUrl = mockUrl; this.channelClass = channelClass; - this.noTimeout = noTimeout; try { this.parsedUri = new URI(mockUrl); } catch (Exception e) { @@ -104,15 +99,10 @@ public void channelActive(ChannelHandlerContext ctx) throws Exception { Bootstrap bootstrap = new Bootstrap().group(eventLoop) // Critical: use the SAME event loop .channel(channelClass) // Use same channel type as server .option(ChannelOption.SO_KEEPALIVE, true).option(ChannelOption.TCP_NODELAY, true) - .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, noTimeout ? 0 : 30000) - .handler(new ChannelInitializer() { + .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 0).handler(new ChannelInitializer() { @Override protected void initChannel(SocketChannel ch) { ChannelPipeline p = ch.pipeline(); - // Add read timeout handler if timeout is enabled - if (!noTimeout) { - p.addLast(new ReadTimeoutHandler(30, TimeUnit.SECONDS)); - } p.addLast(new HttpClientCodec()); p.addLast(new HttpObjectAggregator(65536)); // Add permanent response handler diff --git a/benchmark-runner/src/test/java/io/netty/loom/benchmark/runner/BenchmarkIntegrationTest.java b/benchmark-runner/src/test/java/io/netty/loom/benchmark/runner/BenchmarkIntegrationTest.java index 5e60abf..4dcad60 100644 --- a/benchmark-runner/src/test/java/io/netty/loom/benchmark/runner/BenchmarkIntegrationTest.java +++ b/benchmark-runner/src/test/java/io/netty/loom/benchmark/runner/BenchmarkIntegrationTest.java @@ -36,10 +36,7 @@ *

* Tests cover: *

    - *
  • NIO I/O with default scheduler
  • - *
  • NIO I/O with custom scheduler
  • - *
  • EPOLL I/O with default scheduler
  • - *
  • EPOLL I/O with custom scheduler
  • + *
  • NIO I/O with virtual Netty mode
  • *
*/ class BenchmarkIntegrationTest { @@ -53,16 +50,11 @@ class BenchmarkIntegrationTest { static Stream serverConfigurations() { return Stream.of( - // IO type, use custom scheduler, use reactive, description - Arguments.of(HandoffHttpServer.IO.NIO, false, false, "NIO with default scheduler"), - Arguments.of(HandoffHttpServer.IO.NIO, true, false, "NIO with custom scheduler"), - Arguments.of(HandoffHttpServer.IO.NIO, false, true, "NIO with reactive handler"), - Arguments.of(HandoffHttpServer.IO.EPOLL, false, false, "EPOLL with default scheduler"), - Arguments.of(HandoffHttpServer.IO.EPOLL, true, false, "EPOLL with custom scheduler"), - Arguments.of(HandoffHttpServer.IO.EPOLL, false, true, "EPOLL with reactive handler")); + // IO type, mode, description + Arguments.of(HandoffHttpServer.IO.NIO, HandoffHttpServer.Mode.VIRTUAL_NETTY, "NIO with Netty on FJ")); } - void startServers(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive) throws Exception { + void startServers(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode) throws Exception { // Use unique ports for each test to avoid conflicts mockPort = PORT_COUNTER.getAndIncrement(); handoffPort = PORT_COUNTER.getAndIncrement(); @@ -81,8 +73,8 @@ void startServers(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boole }); // Start handoff server with specified configuration - handoffServer = new HandoffHttpServer(handoffPort, "http://localhost:" + mockPort + "/fruits", 1, - useCustomScheduler, ioType, true, false, useReactive); + handoffServer = new HandoffHttpServer(handoffPort, "http://localhost:" + mockPort + "/fruits", 1, ioType, true, + false, mode); handoffServer.start(); // Wait for handoff server to be ready @@ -107,64 +99,64 @@ void stopServers() { } } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void mockServerHealthEndpoint(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void mockServerHealthEndpoint(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); given().port(mockPort).when().get("/health").then().statusCode(200).body(equalTo("OK")); } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void mockServerFruitsEndpoint(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void mockServerFruitsEndpoint(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); given().port(mockPort).when().get("/fruits").then().statusCode(200).contentType(ContentType.JSON) .body("fruits", hasSize(10)).body("fruits[0].name", equalTo("Apple")) .body("fruits[0].color", equalTo("Red")).body("fruits[0].price", equalTo(1.20f)); } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void handoffServerHealthEndpoint(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void handoffServerHealthEndpoint(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); given().port(handoffPort).when().get("/health").then().statusCode(200).body(equalTo("OK")); } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void handoffServerFruitsEndpoint(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void handoffServerFruitsEndpoint(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); given().port(handoffPort).when().get("/fruits").then().statusCode(200).contentType(ContentType.JSON) .body("fruits", hasSize(10)).body("fruits[0].name", equalTo("Apple")) .body("fruits[0].color", equalTo("Red")); } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void handoffServerRootEndpoint(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void handoffServerRootEndpoint(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); given().port(handoffPort).when().get("/").then().statusCode(200).contentType(ContentType.JSON).body("fruits", hasSize(10)); } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void handoffServer404ForUnknownPath(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void handoffServer404ForUnknownPath(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); given().port(handoffPort).when().get("/unknown").then().statusCode(404); } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void handoffServerReturnsAllFruits(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void handoffServerReturnsAllFruits(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); List fruitNames = given().port(handoffPort).when().get("/fruits").then().statusCode(200).extract() .jsonPath().getList("fruits.name", String.class); @@ -175,11 +167,11 @@ void handoffServerReturnsAllFruits(HandoffHttpServer.IO ioType, boolean useCusto assertTrue(fruitNames.contains("Kiwi")); } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void handoffServerHandlesMultipleRequests(HandoffHttpServer.IO ioType, boolean useCustomScheduler, - boolean useReactive, String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void handoffServerHandlesMultipleRequests(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, + String description) throws Exception { + startServers(ioType, mode); // Send multiple requests to verify server handles concurrent load for (int i = 0; i < 10; i++) { @@ -187,11 +179,11 @@ void handoffServerHandlesMultipleRequests(HandoffHttpServer.IO ioType, boolean u } } - @ParameterizedTest(name = "{3}") + @ParameterizedTest(name = "{2}") @MethodSource("serverConfigurations") - void verifyEndToEndJsonParsing(HandoffHttpServer.IO ioType, boolean useCustomScheduler, boolean useReactive, - String description) throws Exception { - startServers(ioType, useCustomScheduler, useReactive); + void verifyEndToEndJsonParsing(HandoffHttpServer.IO ioType, HandoffHttpServer.Mode mode, String description) + throws Exception { + startServers(ioType, mode); // This test verifies the complete flow: // 1. HandoffHttpServer receives request diff --git a/core/src/main/java/io/netty/loom/ManualIoEventLoopTask.java b/core/src/main/java/io/netty/loom/ManualIoEventLoopTask.java new file mode 100644 index 0000000..fc89084 --- /dev/null +++ b/core/src/main/java/io/netty/loom/ManualIoEventLoopTask.java @@ -0,0 +1,45 @@ +/* + * Copyright 2026 The Netty VirtualThread Scheduler Project + * + * The Netty VirtualThread Scheduler Project licenses this file to you under the Apache License, + * version 2.0 (the "License"); you may not use this file except in compliance with the + * License. You may obtain a copy of the License at: + * + * https://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the + * License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, + * either express or implied. See the License for the specific language governing permissions + * and limitations under the License. + */ +package io.netty.loom; + +import io.netty.channel.IoEventLoopGroup; +import io.netty.channel.IoHandlerFactory; +import io.netty.channel.ManualIoEventLoop; + +import java.util.concurrent.TimeUnit; + +public class ManualIoEventLoopTask extends ManualIoEventLoop implements Runnable { + + private static final long RUNNING_YIELD_US = TimeUnit.MICROSECONDS + .toNanos(Integer.getInteger("io.netty.loom.running.yield.us", 1)); + + public ManualIoEventLoopTask(IoEventLoopGroup parent, Thread owningThread, IoHandlerFactory factory) { + super(parent, owningThread, factory); + } + + @Override + public void run() { + while (!isShuttingDown()) { + run(0, RUNNING_YIELD_US); + Thread.yield(); + runNonBlockingTasks(RUNNING_YIELD_US); + Thread.yield(); + } + while (!isTerminated()) { + runNow(); + Thread.yield(); + } + } +} diff --git a/core/src/main/java/io/netty/loom/VirtualMultithreadManualIoEventLoopGroup.java b/core/src/main/java/io/netty/loom/VirtualMultithreadManualIoEventLoopGroup.java new file mode 100644 index 0000000..f33e1c8 --- /dev/null +++ b/core/src/main/java/io/netty/loom/VirtualMultithreadManualIoEventLoopGroup.java @@ -0,0 +1,43 @@ +/* + * Copyright 2026 The Netty VirtualThread Scheduler Project + * + * The Netty VirtualThread Scheduler Project licenses this file to you under the Apache License, + * version 2.0 (the "License"); you may not use this file except in compliance with the + * License. You may obtain a copy of the License at: + * + * https://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the + * License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, + * either express or implied. See the License for the specific language governing permissions + * and limitations under the License. + */ +package io.netty.loom; + +import io.netty.channel.IoEventLoop; +import io.netty.channel.IoHandlerFactory; +import io.netty.channel.MultiThreadIoEventLoopGroup; + +import java.util.concurrent.Executor; +import java.util.concurrent.ThreadFactory; + +public class VirtualMultithreadManualIoEventLoopGroup extends MultiThreadIoEventLoopGroup { + + private ThreadFactory threadFactory; + + public VirtualMultithreadManualIoEventLoopGroup(int nThreads, IoHandlerFactory factory) { + super(nThreads, (Executor) null, factory); + } + + @Override + protected IoEventLoop newChild(Executor executor, IoHandlerFactory ioHandlerFactory, Object... args) { + if (threadFactory == null) { + threadFactory = Thread.ofVirtual().factory(); + } + var manualTask = new ManualIoEventLoopTask(this, null, ioHandlerFactory); + var newThread = threadFactory.newThread(manualTask); + manualTask.setOwningThread(newThread); + newThread.start(); + return manualTask; + } +}