Respect multibyte UTF8 when splitting task logs#2971
Respect multibyte UTF8 when splitting task logs#2971adenwuts wants to merge 1 commit intoNovik:developfrom
Conversation
Fixes an error where task logs would be split on multibyte UTF8 characters, resulting in _tasks/actions.php returning a response with an empty body and a parsererror being triggered in the ruTorrent GUI. This was causing issues when creating torrents with many files containing multibyte UTF8 characters (e.g. those with non-English characters), where ruTorrent would lose track of the torrent creation task and hang, despite the .torrent file being successfully created.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
Keepalive. |
|
@adenwuts would you provide repro instructions? (a bash script or whatever that generates the test condition, or a tgz with files that trigger this, whatever). |
|
Hi @xirvik, It's an extremely rare edge-case, but as always Murphy's law prevails. I've reproduced the issue after creating a torrent containing many files with multi-byte utf8 characters in their names. I generated those files with this script: https://gist.github.com/adenwuts/36e4cbc8bc4cb7f664cfa1b50475b1e8 It can take a few attempts to generate the errors, but the larger files provide more opportunities for the calls to hit a bad line. Once the error is triggered, you'll see one of: And the torrent creation progress pop-up will stall. |
|
Thanks for the repro script @adenwuts — we were able to reproduce the issue. However, we found a bug in the fix itself: the outer The original code avoided this because We reproduced this on rtorrent 0.16.8 with 400 files with Korean names (82KB log file) — The UTF-8 boundary detection logic itself looks correct — just the file pointer management in the outer loop needs fixing. Would you be able to take a look? |
|
Will try to take a look at this over the next couple of days and let you know. |
Fixes an error where task logs would be split on multibyte UTF8 characters, resulting in _tasks/actions.php returning a response with an empty body and a parsererror being triggered in the ruTorrent GUI.
This was causing issues when creating torrents with many files containing multibyte UTF8 characters (e.g. those with non-English characters), where ruTorrent would lose track of the torrent creation task and hang, despite the .torrent file being successfully created.
This setup assumes that incoming data is UTF8, but there's room to incorporate/extend some of the UTF checks from
php/utlity/utf.phpto potentially allow for non-UTF8 filesystem data. Let me know if you that's intended to be supported by ruTorrent, or if there's any guidance on how invalid use cases should be handled.