Skip to content

libobs: Fix crash with non-ASCII characters in formatted filenames#12964

Open
kellemar wants to merge 1 commit intoobsproject:masterfrom
kellemar:fix/strftime-utf8-encoding
Open

libobs: Fix crash with non-ASCII characters in formatted filenames#12964
kellemar wants to merge 1 commit intoobsproject:masterfrom
kellemar:fix/strftime-utf8-encoding

Conversation

@kellemar
Copy link
Copy Markdown

Description
Added os_strftime_utf8() to handle strftime correctly on Windows. The existing strftime() calls in os_generate_formatted_filename() now use this new function.

On Windows, strftime() returns strings in the system's ANSI codepage rather than UTF-8. The new function uses wcsftime() and converts to UTF-8 properly.

Motivation and Context
Fixes #12953

Using %Z (timezone name) in the filename format crashes OBS on German Windows because the timezone "Mitteleuropäische Zeit" contains an umlaut. The ANSI-encoded string gets passed to os_fopen() which expects UTF-8, causing MultiByteToWideChar to fail.

Same issue would happen with any locale that has non-ASCII characters in timezone names or other strftime output.

How Has This Been Tested?
I don't have a German Windows installation to test directly. The fix follows the same pattern OBS already uses elsewhere for Windows string handling (wcs functions + os_wcs_to_utf8).

Would appreciate if someone with a German Windows setup could verify.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

@Dankirk
Copy link
Copy Markdown

Dankirk commented Dec 28, 2025

I have looked into locales and character encodings a bit and have a pull request here #12624 that also fixes this. I can confirm the crash does happen on Windows with settings set to German and that this PR would also fix that.

The timezone name is a bit interesting though, since with our current OBS locale being minimal 'C' on Windows (and not the system locale), the %Z flag is still translated using the Windows display language. This only happens with %Z flag, for example %A (weekday name) properly follows locale settings ('C') and thus is always in English, even on German display language, locale and region.

@PatTheMav
Copy link
Copy Markdown
Member

Could you please sync with @Dankirk whether this PR supersedes (or is superseded) by the PR(s) that already exist for this topic?

It would be great for reviewers to understand which PR they should prioritise.

@Dankirk
Copy link
Copy Markdown

Dankirk commented Jan 27, 2026

#12624 covers more. This method would still work if OS failed to set utf-8 C runtime locale, but it has been supported since Windows 10 v1803 (released in April 30, 2018).

@PatTheMav
Copy link
Copy Markdown
Member

#12624 covers more. This method would still work if OS failed to set utf-8 C runtime locale, but it has been supported since Windows 10 v1803 (released in April 30, 2018).

@kellemar do you agree with that? Would you be willing to look over that PR possibly leave comments so it addresses your concerns?

In that case we could focus on getting that one reviewed instead.

Comment thread libobs/util/platform.c
*str = new_str;
}

size_t os_strftime_utf8(char *dst, size_t dst_size, const char *format, const struct tm *tm)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this function's implementation is platform-specific, it should be implemented that way in platform-nix.c and platform-windows.c.

Comment thread libobs/util/platform.c
*
* Use wcsftime() to get proper wide characters, then convert to UTF-8.
*/
wchar_t wformat[64];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good practice to use os_utf8_to_wcs once with a NULL pointer to have it return the buffer size required for a successful conversion and then call it again with an appropriately sized buffer.

As the format string is UTF-8 encoded by convention, you cannot make any simple assumptions about the required size for the UTF-16 string (as both are variable length). Users are free to combine the format string with any text that is valid UTF-8 with the actual format modifiers, so this has to be reactive to that.

Comment thread libobs/util/platform.c
* Use wcsftime() to get proper wide characters, then convert to UTF-8.
*/
wchar_t wformat[64];
wchar_t wdst[256];
Copy link
Copy Markdown
Member

@PatTheMav PatTheMav Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum size of the intermediary buffer should depend on the provided size of the destination buffer in dst_size.

The constant 256 just happens to work here because the function is only called by os_generate_formatted_filename in this PR, but any other caller in the future might provide a different buffer size.

Assuming "2 bytes for every byte available in the output" seems reasonable to me: 256 bytes of UTF-16 code points of the ASCII range can take as little as 128 bytes of UTF-8, any characters beyond that will take 2-3 bytes, and characters beyond U+10000 take 4 bytes in both.

So a 2x intermediary buffer will be able to hold at most as much ASCII data as the destination buffer can hold and will easily have enough capacity for any code points that might not even fit into the output buffer after conversion.

Comment thread libobs/util/platform.c
return 0;
}

return os_wcs_to_utf8(wdst, wlen, dst, dst_size);
Copy link
Copy Markdown
Member

@PatTheMav PatTheMav Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both strftime and wcsftime return 0 when the output buffer is not large enough to hold the converted result string (and the state of the output buffer is indeterminate).

When called on POSIX, os_strftime_utf8 will mimic that behaviour, but it won't on Windows, because os_wcs_to_utf8 will not abort conversion when the output buffer is too small, which introduces a non-obvious difference in behaviour between platforms.

And os_wcs_to_utf8 should be called once with a NULL pointer to figure out the size of the converted data and abort if that exceeds the size of the output buffer.

That's because there's not a single "2-to-1" ratio between UTF-16 and UTF-8:

All characters in the Basic Multilingual Plane (BMP) are encoded in 2 bytes in UTF-16, but can take anywhere between 1 to 3 bytes in UTF-8. Characters beyond that always take up 4 bytes in UTF-16 and UTF-8.

So while a 256-byte UTF-16 buffer can potentially hold just 128 bytes worth of ASCII data (see above), even a 128-byte UTF-16 buffer can potentially require more than 128 bytes to properly encode all code points in UTF-8 and would thus exceed the output buffer size provided by os_generate_formatted_filename and the entire function should abort the conversion.

@RytoEX RytoEX added the Bug Fix Non-breaking change which fixes an issue label Feb 10, 2026
@RytoEX
Copy link
Copy Markdown
Member

RytoEX commented Feb 10, 2026

@kellemar Are you able to respond to the feedback here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Fix Non-breaking change which fixes an issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OBS crashes if filename contains german Umlaut

4 participants