Skip to content

Make constrainByteLength work#751

Open
tsdko wants to merge 2 commits intoynoproject:masterfrom
tsdko:fix-constrain-byte-length
Open

Make constrainByteLength work#751
tsdko wants to merge 2 commits intoynoproject:masterfrom
tsdko:fix-constrain-byte-length

Conversation

@tsdko
Copy link
Contributor

@tsdko tsdko commented Feb 9, 2026

Might require more testing; I have run the tests on Firefox and Chromium and tested the input manually with Firefox (and IME input with Mozc on Linux) but have not tested on other platforms.


Should hopefully prevent overly long non-ASCII messages from getting eaten during send attempts.

Seems like the current implementation as present in master could have worked with a bit more space in buf (enough to fit the next largest UTF-8 character) and comparing written instead of read as read is in UTF-16 code units instead of bytes, but it still has weird behavior when the caret is not at the end of the string (on regular input it's forced to the end; if you paste something and the entire string is too long the existing text at the end gets cut off).

The behavior of the built-in maxlength attribute is not very consistent across browsers: if the user attempts to replace currently selected text and not even one character from the replacement string fits, Firefox preserves the selection while Chromium discards it instead. This implementation discards the selection.

Shortcoming: hitting the length limit breaks undo (does nothing). (This is a problem with the current implementation as well, it's just a bit more hidden as ASCII inputs get properly constrained via HTML maxlength.)

Test code
// "|" is the caret
const tests = [
  // below the byte limit, unchanged
  "🐱|",       "🐱",
  "あい|",     "あい",
  "abc🐱|",    "abc🐱",
  "abcdあ|",   "abcdあ",
  "abcdefg|",  "abcdefg",
  // above the byte limit, truncated
  "abcdefgh|", "abcdefg",
  "あabcde|",  "あabcd",
  "abcdeあ|",  "abcde",
  "abcd🐱|",   "abcd",
  "あいう|",   "あい",
  "🐱🦈|",     "🐱",
  // above the byte limit, caret in the middle of the string
  "abcd|efgh", "abcefgh",
  "あb|cdef",  "あcdef",
  "abc|deあ",  "abdeあ",
  "abc|d🐱",   "abd🐱",
  "あい|う",   "あう",
  "🐱|🦈",     "🦈",
];
const cbl = constrainByteLength(7);
for (let i = 0; i < tests.length; i += 2) {
  const sel = tests[i].indexOf('|');
  console.assert(sel >= 0, `no caret in ${tests[i]}`);
  const inVal = tests[i].substring(0, sel) + tests[i].substring(sel+1);
  const event = {target: {value: inVal, selectionStart: sel, selectionEnd: sel}};
  cbl(event);
  const actual = event.target.value, expected = tests[i+1];
  console.assert(expected === actual, `expected ${expected}, got ${actual}`);
}

Should hopefully prevent overly long non-ASCII messages from
getting eaten during send attempts.

Seems like the current implementation could have worked with
a bit more space in buf (enough to fit the next largest UTF-8
character) and comparing `written` instead of `read` as `read`
is in UTF-16 code units instead of bytes, but it still has
weird behavior when the caret is not at the end of the string
(on regular input it's forced to the end, if you paste
something the existing text at the end gets cut off if the
entire string is too long).

The behavior of the built-in `maxlength` attribute is not very
consistent across browsers: if the user attempts to replace
currently selected text and not even one character from the
replacement string fits, Firefox preserves the selection while
Chromium discards it instead. This implementation discards the
selection.

Shortcoming: hitting the length limit breaks undo (does nothing).
(This is a problem with the current implementation as well, it's
just a bit more hidden as ASCII inputs get properly constrained
via HTML `maxlength`.)

Test code:

// "|" is the caret
const tests = [
  // below the byte limit, unchanged
  "🐱|",       "🐱",
  "あい|",     "あい",
  "abc🐱|",    "abc🐱",
  "abcdあ|",   "abcdあ",
  "abcdefg|",  "abcdefg",
  // above the byte limit, truncated
  "abcdefgh|", "abcdefg",
  "あabcde|",  "あabcd",
  "abcdeあ|",  "abcde",
  "abcd🐱|",   "abcd",
  "あいう|",   "あい",
  "🐱🦈|",     "🐱",
  // above the byte limit, caret in the middle of the string
  "abcd|efgh", "abcefgh",
  "あb|cdef",  "あcdef",
  "abc|deあ",  "abdeあ",
  "abc|d🐱",   "abd🐱",
  "あい|う",   "あう",
  "🐱|🦈",     "🦈",
];
const cbl = constrainByteLength(7);
for (let i = 0; i < tests.length; i += 2) {
  const sel = tests[i].indexOf('|');
  console.assert(sel >= 0, `no caret in ${tests[i]}`);
  const inVal = tests[i].substring(0, sel) + tests[i].substring(sel+1);
  const event = {target: {value: inVal, selectionStart: sel, selectionEnd: sel}};
  cbl(event);
  const actual = event.target.value, expected = tests[i+1];
  console.assert(expected === actual, `expected ${expected}, got ${actual}`);
}
@zebraed
Copy link
Contributor

zebraed commented Feb 10, 2026

I can help testing with this on another platform, please wait a little while

@zebraed
Copy link
Contributor

zebraed commented Feb 14, 2026

i tested your test snippet on several platform

Platform Browser Assertion Test
macOS Chrome Passed
macOS Safari Passed
Windows Chrome Passed
Windows Firefox Passed

Manual test on macOS Chrome + Japanese IME, (haven't tested anything else yet, sorry
i can reproduce a difference between paste vs IME typing:

  1. Pasting a >150-byte Japanese string gets trimmed immediately (as expected).
    Example: pasting
    あのイーハトーヴォのすきとおった風、夏でも底に冷たさをもつ青いそら、うつくしい森で飾られたモリーオ市、郊外のぎらぎらひかる草の波。
    trims down to
    あのイーハトーヴォのすきとおった風、夏でも底に冷たさをもつ青いそら、うつくしい森で飾られたモリーオ市

  2. However, after the paste-trim, i can continue typing (with Japanese IME) and the input can exceed 150 UTF-8 bytes again.
    e.g. append 、あああああ and it keeps accepting input.

this seems related to IME composition: many input events are emitted with isComposing=true, and trimming is skipped during composition, so byte overflow can accumulate unless we also trim on compositionend and/or enforce once right before sending.
As you mentioned in your comment, this suggests that while skipping trimming during composition is correct, the implementation also needs a guaranteed recovery point after composition completes.

  • Apply the same byte-limiting function on compositionend
  • Enforce a final constrainByteLength(150) call immediately before sending

With these additions, IME input and normal typing both remain within the 150 UTF-8 byte limit across the tested environments.

displaying the current UTF-8 byte count as an indicator in the chat box could also improve the overall UX. i will experiment with this on my side when once your implementation is finalized.

@tsdko
Copy link
Contributor Author

tsdko commented Feb 14, 2026

Manual test on macOS Chrome + Japanese IME, (haven't tested anything else yet, sorry i can reproduce a difference between paste vs IME typing:

1. Pasting a >150-byte Japanese string gets trimmed immediately (as expected).
   Example: pasting
   `あのイーハトーヴォのすきとおった風、夏でも底に冷たさをもつ青いそら、うつくしい森で飾られたモリーオ市、郊外のぎらぎらひかる草の波。`
   trims down to
   `あのイーハトーヴォのすきとおった風、夏でも底に冷たさをもつ青いそら、うつくしい森で飾られたモリーオ市`

2. However, after the paste-trim, i can continue typing (with Japanese IME) and the input can exceed 150 UTF-8 bytes again.
   e.g. append `、あああああ` and it keeps accepting input.

this seems related to IME composition: many input events are emitted with isComposing=true, and trimming is skipped during composition, so byte overflow can accumulate unless we also trim on compositionend and/or enforce once right before sending. As you mentioned in your comment, this suggests that while skipping trimming during composition is correct, the implementation also needs a guaranteed recovery point after composition completes.

True; compositionend has been addressed now. I should've tested more thoroughly, this was a case I could reproduce. It seems there are differences in how different browsers send input events and I was working on the assumption one would always get input with isComposing=false after composition, which on Chrome seems to happen only if the user closes the IME without submitting via the enter key.

As for limiting right before sending: I was not able to find a case where it was needed after adding a handler for compositionend but I wouldn't mind it having it added if there is one. It is a bit bad for UX to have parts of the message cut off only after it's submitted but arguably the current behavior of eating the entire thing might be considered much worse.

displaying the current UTF-8 byte count as an indicator in the chat box could also improve the overall UX. i will experiment with this on my side when once your implementation is finalized.

I think doing this and letting the user freely exceed the limit (but prevent them from submitting if text is too long) is a much better idea than the current approach of emulating HTML maxlength behavior with byte counts. Ideally this would apply to the "in-game" input (gameChatInput) as well. My focus with this PR was just to make the current handler work.

@tsdko tsdko force-pushed the fix-constrain-byte-length branch from ba5ebf3 to 26a1c69 Compare February 14, 2026 22:00
@zebraed
Copy link
Contributor

zebraed commented Feb 15, 2026

As for limiting right before sending: I was not able to find a case where it was needed after adding a handler for compositionend but I wouldn't mind it having it added if there is one. It is a bit bad for UX to have parts of the message cut off only after it's submitted but arguably the current behavior of eating the entire thing might be considered much worse.

that makes sence, it does seem somewhat redundant after adding the compositionend handler.

My focus with this PR was just to make the current handler work.

yes in that case, would it be okay if we merge this PR first so that the current design is completed as intended?
this is an important fix.

I think doing this and letting the user freely exceed the limit (but prevent them from submitting if text is too long) is a much better idea than the current approach of emulating HTML maxlength behavior with byte counts.

Okay, after that, i will then triage it as a separate issue for UX improvements and additional feature implementation.
i also think this is better approach, as you said, since real-time trimming based on UTF-8 byte limits will likely always conflict with IME behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants