Fix Perl vs TypeScript comparison gaps by benbernard · Pull Request #103 · benbernard/RecordStream

benbernard · 2026-02-27T15:15:10Z

Summary

Fix eval chomp behavior, multiplex bucket key merging, collate --cube flag
Refactor chain from buffered spawnSync to streaming Bun.spawn pipes
Add --shell to generate, multi-DB to todb, --woothee to fromapache
Add comprehensive packet parsing to fromtcpdump (MAC, IP, TCP, UDP, DNS, ARP)
Add multi-unit durations and chrono-node to normalizetime
Add default UID conversion to fromps, deaggregator registry to decollate
Add Ord2Bivariate/Ord2Univariate statistical aggregators
Add ~190 new tests across 17 operations (1886 pass, 0 fail)

Test plan

bun test — 1886 pass, 12 skip, 0 fail
bunx oxlint — 0 errors (2 pre-existing warnings)
tsc --noEmit — clean
check-no-private — clean

🤖 Generated with Claude Code

Bug fixes: - Fix eval chomp to remove only last newline (matching Perl behavior) - Fix multiplex to merge bucket keys into output records (Perl parity) - Fix collate --cube flag to actually generate ALL combinations - Refactor chain from buffered spawnSync to streaming Bun.spawn pipes New features: - Add --shell flag to generate for shell command execution - Add PostgreSQL and MySQL support to todb (matching fromdb) - Add --woothee user agent parsing to fromapache - Add comprehensive packet detail parsing to fromtcpdump (MAC, IP, TCP, UDP, DNS, ARP) - Add multi-unit duration parsing and chrono-node date support to normalizetime - Add default UID-to-username conversion to fromps - Add --show-deaggregator, --dldeaggregator, and deaggregator registry to decollate - Add Ord2Bivariate and Ord2Univariate statistical aggregators - Add * postfix sort documentation and tests for ALL-to-end sorting Test coverage (~190 new tests): - xform: 4 → 46 tests (context ops, pre/post snippets, push helpers, edge cases) - collate: +22 tests (cube, domain language, null handling, no-bucket, mr-agg, ii-agg) - fromtcpdump: +28 tests (all packet types with synthetic fixtures) - decollate: +17 tests (unhash, unarray, chained, show-deagg, dl-deagg) - todb: +17 tests (multi-DB option parsing, SQL generation) - chain: +12 tests (shell streaming, early exit, error handling, edge cases) - toptable: +11 tests (record mode, KeyGroups, ordering, pins) - normalizetime: +8 tests (multi-unit durations, chrono-node) - generate: +7 tests (shell mode execution) - fromps: +4 tests (UID conversion) - annotate, eval, fromcsv, fromxml, multiplex, sort: +1-4 tests each - statistical aggregators: +12 tests (Ord2Uni, Ord2Biv) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-02-27T15:16:47Z

Performance Benchmark Results

⚠️ 12 regressions detected out of 103 benchmarks (threshold: 25%)

Benchmark	Median	Baseline	Delta
KeySpec — simple key (name)	405.3µs	228.9µs	+77.1% 🔴
grep — 10K records (r.age > 50)	706.2µs	368.6µs	+91.6% 🔴
chain — 5 ops (grep	eval	grep	eval
new Record() — 10K objects	341.7µs	73.4µs	+365.3% 🔴
Record.dataRef — 10K records (zero-copy)	81.7µs	37.6µs	+117.5% 🔴
chain — 2 ops (grep	eval), 100 records	143.7µs	78.6µs
implicit — 2 ops (grep	eval), 100 records	133.6µs	83.6µs
implicit — 2 ops (grep	eval), 1K records	211.7µs	130.8µs
chain — 2 ops (grep	eval), 10K records	659.6µs	469.9µs
implicit — 2 ops (grep	eval), 10K records	670.7µs	500.5µs
chain — 3 ops (grep	eval	grep), 1K records	206.1µs
chain — 3 ops (grep	eval	grep), 10K records	680.1µs

103 benchmarks: 13 faster, 19 slower, 71 within noise (10%)

ℹ️ Note: Benchmarks are advisory-only. GitHub Actions shared runners have variable performance, so results may fluctuate ±25% between runs. For reliable benchmarking, run locally with bun run bench.

Full benchmark results

JSON Parsing

Benchmark	Median	Baseline	Delta	Throughput
Record.fromJSON — 100 lines	156.5µs	155.0µs	+1.0%	638.91K rec/s
Record.fromJSON — 10K lines	13.77ms	12.66ms	+8.8%	726.13K rec/s, 215.0 MB/s
InputStream.fromString — 100 records	209.8µs	212.7µs	-1.3%	476.56K rec/s
InputStream.fromString — 10K records	18.21ms	18.36ms	-0.8%	549.15K rec/s, 162.6 MB/s
JSON.parse baseline — 10K lines (no Record)	12.95ms	12.44ms	+4.1%	771.99K rec/s, 228.6 MB/s
JSON.parse single array — 10K records	12.05ms	12.34ms	-2.4%	830.02K rec/s, 245.8 MB/s

JSON Serialization

Benchmark	Median	Baseline	Delta	Throughput
Record.toString — 100 records	87.6µs	87.4µs	+0.2%	1.14M rec/s
Record.toString — 10K records	8.56ms	8.12ms	+5.4%	1.17M rec/s, 346.0 MB/s
Record.toJSON — 10K records	286.9µs	288.1µs	-0.4%	34.86M rec/s
JSON.stringify baseline — 10K objects (no Record)	8.03ms	8.14ms	-1.4%	1.24M rec/s, 368.7 MB/s
Batch join — 10K records (map+join)	8.33ms	8.62ms	-3.3%	1.20M rec/s, 355.6 MB/s

KeySpec Access

Benchmark	Median	Baseline	Delta	Throughput
KeySpec — simple key (name)	405.3µs	228.9µs	+77.1% 🔴	24.67M rec/s
KeySpec — nested key (address/zip)	542.3µs	532.0µs	+1.9%	18.44M rec/s
KeySpec — deep nested (address/coords/lat)	523.7µs	538.2µs	-2.7%	19.10M rec/s
KeySpec — array index (tags/#0)	509.0µs	518.2µs	-1.8%	19.65M rec/s
Direct property access baseline (rec['name'])	41.1µs	49.9µs	-17.6% 🟢	243.06M rec/s
Direct nested access baseline (rec.address.coords.lat)	109.3µs	104.6µs	+4.5%	91.51M rec/s
KeySpec construction — cached (same spec 10K times)	275.4µs	274.7µs	+0.3%	36.31M rec/s
KeySpec construction — unique specs (10K different)	2.24ms	2.87ms	-22.1% 🟢	4.47M rec/s
Compiled KeySpec.resolveValue — nested (address/zip)	163.2µs	170.2µs	-4.1%	61.29M rec/s
Compiled KeySpec.resolveValue — deep (address/coords/lat)	126.0µs	124.4µs	+1.2%	79.39M rec/s
Compiled KeySpec.resolveValue — array (tags/#0)	152.2µs	152.5µs	-0.2%	65.71M rec/s
Compiled KeySpec.setValue — nested (address/zip)	144.3µs	146.8µs	-1.7%	69.28M rec/s

Core Operations

Benchmark	Median	Baseline	Delta	Throughput
grep — 10K records (r.age > 50)	706.2µs	368.6µs	+91.6% 🔴	14.16M rec/s
grep — 10K records (string match)	363.5µs	370.6µs	-1.9%	27.51M rec/s
eval — 10K records (add computed field)	1.45ms	1.43ms	+1.6%	6.89M rec/s
xform — 10K records (push each record)	1.08ms	1.28ms	-16.0% 🟢	9.29M rec/s
sort — 100 records (by score, numeric)	152.9µs	158.7µs	-3.6%	654.00K rec/s
sort — 10K records (by score, numeric)	17.76ms	18.76ms	-5.3%	563.21K rec/s
sort — 10K records (by name, lexical)	12.45ms	12.75ms	-2.4%	803.19K rec/s
collate — 100 records (count by city)	413.7µs	353.3µs	+17.1% 🔴	241.73K rec/s
collate — 10K records (count by city)	12.13ms	13.14ms	-7.7%	824.66K rec/s
fromcsv — 10K rows (parse CSV to records)	14.36ms	14.89ms	-3.5%	696.17K rec/s, 45.8 MB/s

Pipeline Overhead

Benchmark	Median	Baseline	Delta	Throughput
chain — single op (grep), 10K records	7.05ms	7.29ms	-3.3%	1.42M rec/s
chain — 3 ops (grep	eval	grep), 10K records	6.96ms	7.80ms
chain — 5 ops (grep	eval	grep	eval	grep), 10K records
passthrough baseline — 10K records (direct collector)	5.96ms	6.56ms	-9.1%	1.68M rec/s

Record Creation & Serialization

Benchmark	Median	Baseline	Delta	Throughput
new Record() — 10K objects	341.7µs	73.4µs	+365.3% 🔴	29.27M rec/s
new Record() empty — 10K	94.0µs	111.5µs	-15.7% 🟢	106.41M rec/s
Record.get — 10K records × 3 fields	51.6µs	56.2µs	-8.3%	581.89M rec/s
Record.set — 10K records × 1 field	125.2µs	124.9µs	+0.3%	79.88M rec/s
Record.toJSON — 10K records	274.1µs	288.1µs	-4.9%	36.48M rec/s
Record.toString — 10K records	8.73ms	8.12ms	+7.5%	1.15M rec/s
Record.clone — 10K records	6.06ms	6.18ms	-1.9%	1.65M rec/s
Record.fromJSON — 10K lines	12.91ms	12.66ms	+2.0%	774.76K rec/s, 229.4 MB/s
Record.dataRef — 10K records (zero-copy)	81.7µs	37.6µs	+117.5% 🔴	122.39M rec/s
Record.sort — 10K records (numeric field)	11.68ms	11.14ms	+4.8%	856.34K rec/s
Record.sort — 10K records (lexical field)	5.83ms	5.80ms	+0.4%	1.72M rec/s
Record.cmp — 1M comparisons (single field)	111.46ms	105.72ms	+5.4%	8.97M rec/s
Record.sort — 10K records (nested field numeric)	16.03ms	15.10ms	+6.1%	623.70K rec/s
Record.cmp — 1M comparisons (multi-field cached)	89.01ms	88.49ms	+0.6%	11.23M rec/s
Record.sort — 10K records (cached comparator reuse)	12.57ms	11.48ms	+9.5%	795.84K rec/s

Chain vs Pipe

Benchmark	Median	Baseline	Delta	Throughput
chain — 2 ops (grep	eval), 100 records	143.7µs	78.6µs	+82.8% 🔴
pipe — 2 ops (grep	eval), 100 records	577.12ms	521.60ms	+10.6% 🔴
implicit — 2 ops (grep	eval), 100 records	133.6µs	83.6µs	+59.8% 🔴
chain — 2 ops (grep	eval), 1K records	131.2µs	140.2µs	-6.4%
pipe — 2 ops (grep	eval), 1K records	583.99ms	544.17ms	+7.3%
implicit — 2 ops (grep	eval), 1K records	211.7µs	130.8µs	+61.8% 🔴
chain — 2 ops (grep	eval), 10K records	659.6µs	469.9µs	+40.4% 🔴
pipe — 2 ops (grep	eval), 10K records	587.26ms	527.56ms	+11.3% 🔴
implicit — 2 ops (grep	eval), 10K records	670.7µs	500.5µs	+34.0% 🔴
chain — 3 ops (grep	eval	grep), 100 records	158.1µs	149.5µs
pipe — 3 ops (grep	eval	grep), 100 records	875.56ms	788.17ms
implicit — 3 ops (grep	eval	grep), 100 records	93.6µs	101.2µs
chain — 3 ops (grep	eval	grep), 1K records	206.1µs	134.6µs
pipe — 3 ops (grep	eval	grep), 1K records	870.94ms	773.77ms
implicit — 3 ops (grep	eval	grep), 1K records	155.8µs	140.6µs
chain — 3 ops (grep	eval	grep), 10K records	680.1µs	480.7µs
pipe — 3 ops (grep	eval	grep), 10K records	867.28ms	782.74ms
implicit — 3 ops (grep	eval	grep), 10K records	459.8µs	691.8µs
chain — 5 ops (grep	eval	grep	eval	grep), 100 records
pipe — 5 ops (grep	eval	grep	eval	grep), 100 records
implicit — 5 ops (grep	eval	grep	eval	grep), 100 records
chain — 5 ops (grep	eval	grep	eval	grep), 1K records
pipe — 5 ops (grep	eval	grep	eval	grep), 1K records
implicit — 5 ops (grep	eval	grep	eval	grep), 1K records
chain — 5 ops (grep	eval	grep	eval	grep), 10K records
pipe — 5 ops (grep	eval	grep	eval	grep), 10K records
implicit — 5 ops (grep	eval	grep	eval	grep), 10K records

Line Reading

Benchmark	Median	Baseline	Delta	Throughput
InputStream.fromFile — 100 lines	501.1µs	505.7µs	-0.9%	199.57K rec/s, 58.9 MB/s
InputStream.fromString — 100 lines	180.2µs	189.6µs	-5.0%	555.04K rec/s, 163.9 MB/s
manual buffer (isolated) — 100 lines	269.8µs	277.8µs	-2.9%	370.63K rec/s, 109.5 MB/s
bulk text + split — 100 lines	99.9µs	104.2µs	-4.1%	1.00M rec/s, 295.6 MB/s
node readline — 100 lines	478.9µs	462.5µs	+3.6%	208.81K rec/s, 61.7 MB/s
TextDecoderStream — 100 lines	302.2µs	321.0µs	-5.8%	330.90K rec/s, 97.7 MB/s
binary newline scan — 100 lines	320.6µs	301.1µs	+6.5%	311.87K rec/s, 92.1 MB/s
bun native stdin — 100 lines	25.99ms	28.07ms	-7.4%	3.85K rec/s, 1.1 MB/s
InputStream.fromFile — 10K lines	22.97ms	24.80ms	-7.4%	435.26K rec/s, 128.9 MB/s
InputStream.fromString — 10K lines	18.14ms	18.89ms	-4.0%	551.23K rec/s, 163.2 MB/s
manual buffer (isolated) — 10K lines	6.01ms	6.11ms	-1.7%	1.66M rec/s, 493.0 MB/s
bulk text + split — 10K lines	2.39ms	2.55ms	-6.1%	4.18M rec/s, 1239.1 MB/s
node readline — 10K lines	7.48ms	7.78ms	-3.9%	1.34M rec/s, 395.9 MB/s
TextDecoderStream — 10K lines	3.94ms	5.44ms	-27.6% 🟢	2.54M rec/s, 751.8 MB/s
binary newline scan — 10K lines	6.57ms	7.84ms	-16.1% 🟢	1.52M rec/s, 450.5 MB/s
bun native stdin — 10K lines	42.27ms	42.44ms	-0.4%	236.58K rec/s, 70.1 MB/s
InputStream.fromFile — 100K lines	259.51ms	268.41ms	-3.3%	385.35K rec/s, 114.5 MB/s
InputStream.fromString — 100K lines	210.08ms	233.47ms	-10.0% 🟢	476.00K rec/s, 141.4 MB/s
manual buffer (isolated) — 100K lines	33.25ms	36.64ms	-9.3%	3.01M rec/s, 893.8 MB/s
bulk text + split — 100K lines	27.03ms	30.18ms	-10.4% 🟢	3.70M rec/s, 1099.3 MB/s
node readline — 100K lines	66.50ms	68.00ms	-2.2%	1.50M rec/s, 446.8 MB/s
TextDecoderStream — 100K lines	38.86ms	38.42ms	+1.1%	2.57M rec/s, 764.7 MB/s
binary newline scan — 100K lines	70.46ms	76.65ms	-8.1%	1.42M rec/s, 421.7 MB/s
bun native stdin — 100K lines	113.16ms	113.78ms	-0.5%	883.67K rec/s, 262.6 MB/s

benbernard merged commit 405f7d6 into master Feb 27, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Perl vs TypeScript comparison gaps#103

Fix Perl vs TypeScript comparison gaps#103
benbernard merged 1 commit intomasterfrom
feature/comparison-gap-fixes

benbernard commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

JSON Parsing

JSON Serialization

KeySpec Access

Core Operations

Pipeline Overhead

Record Creation & Serialization

Chain vs Pipe

Line Reading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benbernard commented Feb 27, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Feb 27, 2026

Performance Benchmark Results

JSON Parsing

JSON Serialization

KeySpec Access

Core Operations

Pipeline Overhead

Record Creation & Serialization

Chain vs Pipe

Line Reading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant