Summary
Ucs2String::to_lowercase() and Ucs2String::to_uppercase() panic on any
UCS-2 code unit in the surrogate range (0xD800–0xDFFF). These are legal u16
values that can appear in CEA-708 subtitle streams but are not valid Unicode
scalar values, so char::from_u32() returns None and the .expect() call
crashes the process.
Location
src/rust/lib_ccxr/src/util/encoding.rs
- Line ~245:
Ucs2String::to_lowercase()
- Line ~257:
Ucs2String::to_uppercase()
Code
char::from_u32(c as u32).expect("Invalid u32 character")
c is a u16 from a Ucs2String. Values in 0xD800–0xDFFF are valid u16
surrogate code units but char::from_u32() returns None for them (Rust char
must be a valid Unicode scalar value). The .expect() unconditionally panics.
Impact
Any CEA-708 subtitle stream containing surrogate pairs — which are valid in
UCS-2 encoding — will crash CCExtractor during case conversion. This is
triggerable from real-world broadcast input with no malicious intent required.
Suggested Fix
Replace both .expect("Invalid u32 character") calls with:
char::from_u32(c as u32).unwrap_or('\u{FFFD}')
U+FFFD (Unicode Replacement Character) is the standard substitution for
unrepresentable code points and is consistent with how the rest of the codebase
handles unavailable characters (UNAVAILABLE_CHAR).
Environment
- Affects all platforms
- Triggered by CEA-708 streams carrying UCS-2 surrogate pairs
Summary
Ucs2String::to_lowercase()andUcs2String::to_uppercase()panic on anyUCS-2 code unit in the surrogate range (0xD800–0xDFFF). These are legal
u16values that can appear in CEA-708 subtitle streams but are not valid Unicode
scalar values, so
char::from_u32()returnsNoneand the.expect()callcrashes the process.
Location
src/rust/lib_ccxr/src/util/encoding.rsUcs2String::to_lowercase()Ucs2String::to_uppercase()Code
cis au16from aUcs2String. Values in0xD800–0xDFFFare validu16surrogate code units but
char::from_u32()returnsNonefor them (Rustcharmust be a valid Unicode scalar value). The
.expect()unconditionally panics.Impact
Any CEA-708 subtitle stream containing surrogate pairs — which are valid in
UCS-2 encoding — will crash CCExtractor during case conversion. This is
triggerable from real-world broadcast input with no malicious intent required.
Suggested Fix
Replace both
.expect("Invalid u32 character")calls with:U+FFFD(Unicode Replacement Character) is the standard substitution forunrepresentable code points and is consistent with how the rest of the codebase
handles unavailable characters (
UNAVAILABLE_CHAR).Environment