Skip to content

escape: rewrite InternalEscapeBytes with lead-byte scanning #39

Open
dhartunian wants to merge 2 commits intomasterfrom
davidh/push-syupywymrmml
Open

escape: rewrite InternalEscapeBytes with lead-byte scanning #39
dhartunian wants to merge 2 commits intomasterfrom
davidh/push-syupywymrmml

Conversation

@dhartunian
Copy link
Contributor

@dhartunian dhartunian commented Mar 19, 2026

InternalEscapeBytes is called on every buffer mode transition during
redactable string formatting, scanning newly-written bytes for marker
characters (‹ › †) that must be escaped. Format operations like
Sprintf with SafeFormatter structs trigger many mode transitions,
making this function a significant cost in redactable output paths.

The previous implementation compared every byte against each 3-byte
marker using bytes.Equal. Since all markers share the same first two
UTF-8 bytes (0xE2 0x80), replace the per-byte scan with
bytes.IndexByte to skip ASCII runs in bulk (leveraging SIMD on
arm64/amd64), then check only the third byte to distinguish markers.
For the breakNewLines path, use dual bytes.IndexByte calls to find
either the lead byte or newline.

name                                          old time/op    new time/op    delta
SprintfWithSafeFormatter/multiple_structs-10    1.85µs ± 2%    1.25µs ± 2%  -32.34%  (p=0.000 n=8+9)
SprintfWithSafeFormatter/nested_structs-10      1.01µs ± 1%    0.65µs ± 4%  -35.40%  (p=0.000 n=8+9)
SprintfWithSafeFormatter/single_struct-10        517ns ± 1%     328ns ± 1%  -36.50%  (p=0.000 n=9+8)
SprintfWithSafeFormatter/sprint_mixed-10        1.80µs ± 2%    1.29µs ± 3%  -28.52%  (p=0.000 n=9+9)

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com


This change is Reviewable

dhartunian and others added 2 commits March 18, 2026 14:21
Add test cases for edge cases including truncated lead bytes,
non-marker UTF-8 sequences sharing the same lead byte (e.g. em dash,
euro sign, ellipsis), trailing invalid UTF-8, multiple consecutive
markers, and interleaved markers with newlines.

Also add benchmarks exercising redactable string printing with
interpolated SafeFormatter structs.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
InternalEscapeBytes is called on every buffer mode transition during
redactable string formatting, scanning newly-written bytes for marker
characters (‹ › †) that must be escaped. Format operations like
Sprintf with SafeFormatter structs trigger many mode transitions,
making this function a significant cost in redactable output paths.

The previous implementation compared every byte against each 3-byte
marker using bytes.Equal. Since all markers share the same first two
UTF-8 bytes (0xE2 0x80), replace the per-byte scan with
bytes.IndexByte to skip ASCII runs in bulk (leveraging SIMD on
arm64/amd64), then check only the third byte to distinguish markers.
For the breakNewLines path, use dual bytes.IndexByte calls to find
either the lead byte or newline.

benchdiff --old 8f1ddc0 --new 82c8dd4 --count 10 --run BenchmarkSprintfWithSafeFormatter .

name                                          old time/op    new time/op    delta
SprintfWithSafeFormatter/multiple_structs-10    1.85µs ± 2%    1.25µs ± 2%  -32.34%  (p=0.000 n=8+9)
SprintfWithSafeFormatter/nested_structs-10      1.01µs ± 1%    0.65µs ± 4%  -35.40%  (p=0.000 n=8+9)
SprintfWithSafeFormatter/single_struct-10        517ns ± 1%     328ns ± 1%  -36.50%  (p=0.000 n=9+8)
SprintfWithSafeFormatter/sprint_mixed-10        1.80µs ± 2%    1.29µs ± 3%  -28.52%  (p=0.000 n=9+9)

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@dhartunian
Copy link
Contributor Author

Not sure yet if I'll merge this but putting it up since it's an interesting result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant