Skip to content

Fix Perl vs TypeScript comparison gaps#103

Merged
benbernard merged 1 commit intomasterfrom
feature/comparison-gap-fixes
Feb 27, 2026
Merged

Fix Perl vs TypeScript comparison gaps#103
benbernard merged 1 commit intomasterfrom
feature/comparison-gap-fixes

Conversation

@benbernard
Copy link
Owner

Summary

  • Fix eval chomp behavior, multiplex bucket key merging, collate --cube flag
  • Refactor chain from buffered spawnSync to streaming Bun.spawn pipes
  • Add --shell to generate, multi-DB to todb, --woothee to fromapache
  • Add comprehensive packet parsing to fromtcpdump (MAC, IP, TCP, UDP, DNS, ARP)
  • Add multi-unit durations and chrono-node to normalizetime
  • Add default UID conversion to fromps, deaggregator registry to decollate
  • Add Ord2Bivariate/Ord2Univariate statistical aggregators
  • Add ~190 new tests across 17 operations (1886 pass, 0 fail)

Test plan

  • bun test — 1886 pass, 12 skip, 0 fail
  • bunx oxlint — 0 errors (2 pre-existing warnings)
  • tsc --noEmit — clean
  • check-no-private — clean

🤖 Generated with Claude Code

Bug fixes:
- Fix eval chomp to remove only last newline (matching Perl behavior)
- Fix multiplex to merge bucket keys into output records (Perl parity)
- Fix collate --cube flag to actually generate ALL combinations
- Refactor chain from buffered spawnSync to streaming Bun.spawn pipes

New features:
- Add --shell flag to generate for shell command execution
- Add PostgreSQL and MySQL support to todb (matching fromdb)
- Add --woothee user agent parsing to fromapache
- Add comprehensive packet detail parsing to fromtcpdump (MAC, IP, TCP, UDP, DNS, ARP)
- Add multi-unit duration parsing and chrono-node date support to normalizetime
- Add default UID-to-username conversion to fromps
- Add --show-deaggregator, --dldeaggregator, and deaggregator registry to decollate
- Add Ord2Bivariate and Ord2Univariate statistical aggregators
- Add * postfix sort documentation and tests for ALL-to-end sorting

Test coverage (~190 new tests):
- xform: 4 → 46 tests (context ops, pre/post snippets, push helpers, edge cases)
- collate: +22 tests (cube, domain language, null handling, no-bucket, mr-agg, ii-agg)
- fromtcpdump: +28 tests (all packet types with synthetic fixtures)
- decollate: +17 tests (unhash, unarray, chained, show-deagg, dl-deagg)
- todb: +17 tests (multi-DB option parsing, SQL generation)
- chain: +12 tests (shell streaming, early exit, error handling, edge cases)
- toptable: +11 tests (record mode, KeyGroups, ordering, pins)
- normalizetime: +8 tests (multi-unit durations, chrono-node)
- generate: +7 tests (shell mode execution)
- fromps: +4 tests (UID conversion)
- annotate, eval, fromcsv, fromxml, multiplex, sort: +1-4 tests each
- statistical aggregators: +12 tests (Ord2Uni, Ord2Biv)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Performance Benchmark Results

⚠️ 12 regressions detected out of 103 benchmarks (threshold: 25%)

Benchmark Median Baseline Delta
KeySpec — simple key (name) 405.3µs 228.9µs +77.1% 🔴
grep — 10K records (r.age > 50) 706.2µs 368.6µs +91.6% 🔴
chain — 5 ops (grep eval grep eval
new Record() — 10K objects 341.7µs 73.4µs +365.3% 🔴
Record.dataRef — 10K records (zero-copy) 81.7µs 37.6µs +117.5% 🔴
chain — 2 ops (grep eval), 100 records 143.7µs 78.6µs
implicit — 2 ops (grep eval), 100 records 133.6µs 83.6µs
implicit — 2 ops (grep eval), 1K records 211.7µs 130.8µs
chain — 2 ops (grep eval), 10K records 659.6µs 469.9µs
implicit — 2 ops (grep eval), 10K records 670.7µs 500.5µs
chain — 3 ops (grep eval grep), 1K records 206.1µs
chain — 3 ops (grep eval grep), 10K records 680.1µs

103 benchmarks: 13 faster, 19 slower, 71 within noise (10%)

ℹ️ Note: Benchmarks are advisory-only. GitHub Actions shared runners have variable performance, so results may fluctuate ±25% between runs. For reliable benchmarking, run locally with bun run bench.

Full benchmark results

JSON Parsing

Benchmark Median Baseline Delta Throughput
Record.fromJSON — 100 lines 156.5µs 155.0µs +1.0% 638.91K rec/s
Record.fromJSON — 10K lines 13.77ms 12.66ms +8.8% 726.13K rec/s, 215.0 MB/s
InputStream.fromString — 100 records 209.8µs 212.7µs -1.3% 476.56K rec/s
InputStream.fromString — 10K records 18.21ms 18.36ms -0.8% 549.15K rec/s, 162.6 MB/s
JSON.parse baseline — 10K lines (no Record) 12.95ms 12.44ms +4.1% 771.99K rec/s, 228.6 MB/s
JSON.parse single array — 10K records 12.05ms 12.34ms -2.4% 830.02K rec/s, 245.8 MB/s

JSON Serialization

Benchmark Median Baseline Delta Throughput
Record.toString — 100 records 87.6µs 87.4µs +0.2% 1.14M rec/s
Record.toString — 10K records 8.56ms 8.12ms +5.4% 1.17M rec/s, 346.0 MB/s
Record.toJSON — 10K records 286.9µs 288.1µs -0.4% 34.86M rec/s
JSON.stringify baseline — 10K objects (no Record) 8.03ms 8.14ms -1.4% 1.24M rec/s, 368.7 MB/s
Batch join — 10K records (map+join) 8.33ms 8.62ms -3.3% 1.20M rec/s, 355.6 MB/s

KeySpec Access

Benchmark Median Baseline Delta Throughput
KeySpec — simple key (name) 405.3µs 228.9µs +77.1% 🔴 24.67M rec/s
KeySpec — nested key (address/zip) 542.3µs 532.0µs +1.9% 18.44M rec/s
KeySpec — deep nested (address/coords/lat) 523.7µs 538.2µs -2.7% 19.10M rec/s
KeySpec — array index (tags/#0) 509.0µs 518.2µs -1.8% 19.65M rec/s
Direct property access baseline (rec['name']) 41.1µs 49.9µs -17.6% 🟢 243.06M rec/s
Direct nested access baseline (rec.address.coords.lat) 109.3µs 104.6µs +4.5% 91.51M rec/s
KeySpec construction — cached (same spec 10K times) 275.4µs 274.7µs +0.3% 36.31M rec/s
KeySpec construction — unique specs (10K different) 2.24ms 2.87ms -22.1% 🟢 4.47M rec/s
Compiled KeySpec.resolveValue — nested (address/zip) 163.2µs 170.2µs -4.1% 61.29M rec/s
Compiled KeySpec.resolveValue — deep (address/coords/lat) 126.0µs 124.4µs +1.2% 79.39M rec/s
Compiled KeySpec.resolveValue — array (tags/#0) 152.2µs 152.5µs -0.2% 65.71M rec/s
Compiled KeySpec.setValue — nested (address/zip) 144.3µs 146.8µs -1.7% 69.28M rec/s

Core Operations

Benchmark Median Baseline Delta Throughput
grep — 10K records (r.age > 50) 706.2µs 368.6µs +91.6% 🔴 14.16M rec/s
grep — 10K records (string match) 363.5µs 370.6µs -1.9% 27.51M rec/s
eval — 10K records (add computed field) 1.45ms 1.43ms +1.6% 6.89M rec/s
xform — 10K records (push each record) 1.08ms 1.28ms -16.0% 🟢 9.29M rec/s
sort — 100 records (by score, numeric) 152.9µs 158.7µs -3.6% 654.00K rec/s
sort — 10K records (by score, numeric) 17.76ms 18.76ms -5.3% 563.21K rec/s
sort — 10K records (by name, lexical) 12.45ms 12.75ms -2.4% 803.19K rec/s
collate — 100 records (count by city) 413.7µs 353.3µs +17.1% 🔴 241.73K rec/s
collate — 10K records (count by city) 12.13ms 13.14ms -7.7% 824.66K rec/s
fromcsv — 10K rows (parse CSV to records) 14.36ms 14.89ms -3.5% 696.17K rec/s, 45.8 MB/s

Pipeline Overhead

Benchmark Median Baseline Delta Throughput
chain — single op (grep), 10K records 7.05ms 7.29ms -3.3% 1.42M rec/s
chain — 3 ops (grep eval grep), 10K records 6.96ms 7.80ms
chain — 5 ops (grep eval grep eval grep), 10K records
passthrough baseline — 10K records (direct collector) 5.96ms 6.56ms -9.1% 1.68M rec/s

Record Creation & Serialization

Benchmark Median Baseline Delta Throughput
new Record() — 10K objects 341.7µs 73.4µs +365.3% 🔴 29.27M rec/s
new Record() empty — 10K 94.0µs 111.5µs -15.7% 🟢 106.41M rec/s
Record.get — 10K records × 3 fields 51.6µs 56.2µs -8.3% 581.89M rec/s
Record.set — 10K records × 1 field 125.2µs 124.9µs +0.3% 79.88M rec/s
Record.toJSON — 10K records 274.1µs 288.1µs -4.9% 36.48M rec/s
Record.toString — 10K records 8.73ms 8.12ms +7.5% 1.15M rec/s
Record.clone — 10K records 6.06ms 6.18ms -1.9% 1.65M rec/s
Record.fromJSON — 10K lines 12.91ms 12.66ms +2.0% 774.76K rec/s, 229.4 MB/s
Record.dataRef — 10K records (zero-copy) 81.7µs 37.6µs +117.5% 🔴 122.39M rec/s
Record.sort — 10K records (numeric field) 11.68ms 11.14ms +4.8% 856.34K rec/s
Record.sort — 10K records (lexical field) 5.83ms 5.80ms +0.4% 1.72M rec/s
Record.cmp — 1M comparisons (single field) 111.46ms 105.72ms +5.4% 8.97M rec/s
Record.sort — 10K records (nested field numeric) 16.03ms 15.10ms +6.1% 623.70K rec/s
Record.cmp — 1M comparisons (multi-field cached) 89.01ms 88.49ms +0.6% 11.23M rec/s
Record.sort — 10K records (cached comparator reuse) 12.57ms 11.48ms +9.5% 795.84K rec/s

Chain vs Pipe

Benchmark Median Baseline Delta Throughput
chain — 2 ops (grep eval), 100 records 143.7µs 78.6µs +82.8% 🔴
pipe — 2 ops (grep eval), 100 records 577.12ms 521.60ms +10.6% 🔴
implicit — 2 ops (grep eval), 100 records 133.6µs 83.6µs +59.8% 🔴
chain — 2 ops (grep eval), 1K records 131.2µs 140.2µs -6.4%
pipe — 2 ops (grep eval), 1K records 583.99ms 544.17ms +7.3%
implicit — 2 ops (grep eval), 1K records 211.7µs 130.8µs +61.8% 🔴
chain — 2 ops (grep eval), 10K records 659.6µs 469.9µs +40.4% 🔴
pipe — 2 ops (grep eval), 10K records 587.26ms 527.56ms +11.3% 🔴
implicit — 2 ops (grep eval), 10K records 670.7µs 500.5µs +34.0% 🔴
chain — 3 ops (grep eval grep), 100 records 158.1µs 149.5µs
pipe — 3 ops (grep eval grep), 100 records 875.56ms 788.17ms
implicit — 3 ops (grep eval grep), 100 records 93.6µs 101.2µs
chain — 3 ops (grep eval grep), 1K records 206.1µs 134.6µs
pipe — 3 ops (grep eval grep), 1K records 870.94ms 773.77ms
implicit — 3 ops (grep eval grep), 1K records 155.8µs 140.6µs
chain — 3 ops (grep eval grep), 10K records 680.1µs 480.7µs
pipe — 3 ops (grep eval grep), 10K records 867.28ms 782.74ms
implicit — 3 ops (grep eval grep), 10K records 459.8µs 691.8µs
chain — 5 ops (grep eval grep eval grep), 100 records
pipe — 5 ops (grep eval grep eval grep), 100 records
implicit — 5 ops (grep eval grep eval grep), 100 records
chain — 5 ops (grep eval grep eval grep), 1K records
pipe — 5 ops (grep eval grep eval grep), 1K records
implicit — 5 ops (grep eval grep eval grep), 1K records
chain — 5 ops (grep eval grep eval grep), 10K records
pipe — 5 ops (grep eval grep eval grep), 10K records
implicit — 5 ops (grep eval grep eval grep), 10K records

Line Reading

Benchmark Median Baseline Delta Throughput
InputStream.fromFile — 100 lines 501.1µs 505.7µs -0.9% 199.57K rec/s, 58.9 MB/s
InputStream.fromString — 100 lines 180.2µs 189.6µs -5.0% 555.04K rec/s, 163.9 MB/s
manual buffer (isolated) — 100 lines 269.8µs 277.8µs -2.9% 370.63K rec/s, 109.5 MB/s
bulk text + split — 100 lines 99.9µs 104.2µs -4.1% 1.00M rec/s, 295.6 MB/s
node readline — 100 lines 478.9µs 462.5µs +3.6% 208.81K rec/s, 61.7 MB/s
TextDecoderStream — 100 lines 302.2µs 321.0µs -5.8% 330.90K rec/s, 97.7 MB/s
binary newline scan — 100 lines 320.6µs 301.1µs +6.5% 311.87K rec/s, 92.1 MB/s
bun native stdin — 100 lines 25.99ms 28.07ms -7.4% 3.85K rec/s, 1.1 MB/s
InputStream.fromFile — 10K lines 22.97ms 24.80ms -7.4% 435.26K rec/s, 128.9 MB/s
InputStream.fromString — 10K lines 18.14ms 18.89ms -4.0% 551.23K rec/s, 163.2 MB/s
manual buffer (isolated) — 10K lines 6.01ms 6.11ms -1.7% 1.66M rec/s, 493.0 MB/s
bulk text + split — 10K lines 2.39ms 2.55ms -6.1% 4.18M rec/s, 1239.1 MB/s
node readline — 10K lines 7.48ms 7.78ms -3.9% 1.34M rec/s, 395.9 MB/s
TextDecoderStream — 10K lines 3.94ms 5.44ms -27.6% 🟢 2.54M rec/s, 751.8 MB/s
binary newline scan — 10K lines 6.57ms 7.84ms -16.1% 🟢 1.52M rec/s, 450.5 MB/s
bun native stdin — 10K lines 42.27ms 42.44ms -0.4% 236.58K rec/s, 70.1 MB/s
InputStream.fromFile — 100K lines 259.51ms 268.41ms -3.3% 385.35K rec/s, 114.5 MB/s
InputStream.fromString — 100K lines 210.08ms 233.47ms -10.0% 🟢 476.00K rec/s, 141.4 MB/s
manual buffer (isolated) — 100K lines 33.25ms 36.64ms -9.3% 3.01M rec/s, 893.8 MB/s
bulk text + split — 100K lines 27.03ms 30.18ms -10.4% 🟢 3.70M rec/s, 1099.3 MB/s
node readline — 100K lines 66.50ms 68.00ms -2.2% 1.50M rec/s, 446.8 MB/s
TextDecoderStream — 100K lines 38.86ms 38.42ms +1.1% 2.57M rec/s, 764.7 MB/s
binary newline scan — 100K lines 70.46ms 76.65ms -8.1% 1.42M rec/s, 421.7 MB/s
bun native stdin — 100K lines 113.16ms 113.78ms -0.5% 883.67K rec/s, 262.6 MB/s

@benbernard benbernard merged commit 405f7d6 into master Feb 27, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant