Skip to content

[BUG] Rust panic in Ucs2String case conversion on UCS-2 surrogate code points (CEA-708 input) #2232

@NexionisJake

Description

@NexionisJake

Summary

Ucs2String::to_lowercase() and Ucs2String::to_uppercase() panic on any
UCS-2 code unit in the surrogate range (0xD800–0xDFFF). These are legal u16
values that can appear in CEA-708 subtitle streams but are not valid Unicode
scalar values, so char::from_u32() returns None and the .expect() call
crashes the process.

Location

src/rust/lib_ccxr/src/util/encoding.rs

  • Line ~245: Ucs2String::to_lowercase()
  • Line ~257: Ucs2String::to_uppercase()

Code

char::from_u32(c as u32).expect("Invalid u32 character")

c is a u16 from a Ucs2String. Values in 0xD800–0xDFFF are valid u16
surrogate code units but char::from_u32() returns None for them (Rust char
must be a valid Unicode scalar value). The .expect() unconditionally panics.

Impact

Any CEA-708 subtitle stream containing surrogate pairs — which are valid in
UCS-2 encoding — will crash CCExtractor during case conversion. This is
triggerable from real-world broadcast input with no malicious intent required.

Suggested Fix

Replace both .expect("Invalid u32 character") calls with:

char::from_u32(c as u32).unwrap_or('\u{FFFD}')

U+FFFD (Unicode Replacement Character) is the standard substitution for
unrepresentable code points and is consistent with how the rest of the codebase
handles unavailable characters (UNAVAILABLE_CHAR).

Environment

  • Affects all platforms
  • Triggered by CEA-708 streams carrying UCS-2 surrogate pairs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions