In trying to use Zstd::StreamingCompress.new, I seem to have encountered a couple different data corruption issues. These present themselves in two different ways:
- If you use
Zstd::StreamingCompress.new and use << or .write to pass in a string from ~16,000 bytes to 131,072 bytes, it will sometimes randomly generate a resulting compressed string that Zstd.decompress will generate differing output than the input for (it doesn't fail, but the output is different and corrupt). However, the compressed string is capable of being decompressed successfully by the zstd CLI tool, and in that case it matches the input exactly, so the compression doesn't necessarily seem corrupt, but Zstd.decompress itself seems to not be able to handle this string.
- If you use
Zstd::StreamingCompress.new and use << or .write to pass in a string of 131,072 bytes or more in length, then Zstd::StreamingCompress will consistently produce corrupt content that cannot be decompressed (by either the gem or the CLI tool).
- If you use
Zstd::StreamingCompress.new and use .compress (instead of << or .write) to pass in data, then it only exhibits the first issue above, and not the second issue. So data passed to .compress greater than ~16,000 bytes will randomly generate corrupt output if passed to Zstd.decompress, but it will work via the zstd CLI. However, instead of the second issue above with data greater than 131,072 consistently failing completely, these longer strings will still exhibit the first issue (mismatched Zstd.decompress output).
I cannot reproduce these issues if I use Zstd.compress, so this seems specific to the streaming compression.
Reproduction script
I've reproduce this with both zstd-ruby 1.5.7.0 and 2.0.0.pre.preview1 on the following platforms:
ruby 3.4.5 (2025-07-16 revision 20cda200d3) +PRISM [arm64-darwin24]
ruby 3.4.5 (2025-07-16 revision 20cda200d3) +PRISM [x86_64-linux]
Here's my attempt at a script to reproduce this if you save the following to test_zstd.rb. Sorry it's maybe a bit convoluted to test all 3 situations, but more explanation on usage and some abbreviated output below:
require "bundler/inline"
require "digest"
require "tempfile"
gemfile do
source "https://rubygems.org"
gem "zstd-ruby", "1.5.7.0"
end
def compare_compressed(original:, compressed:)
begin
decompressed = Zstd.decompress(compressed)
rescue => e
decompress_error = e
end
if original != decompressed
if decompress_error
puts "Decompression error for #{original.bytesize} bytes input (#{decompress_error})"
else
puts "Content mismatch for #{original.bytesize} bytes input"
end
puts " Original: #{original.bytesize} bytes, #{Digest::SHA256.hexdigest(original)[0, 10]} checksum"
if decompressed
puts " Zstd.decompress: #{decompressed.bytesize} bytes, #{Digest::SHA256.hexdigest(decompressed)[0, 10]} checksum"
end
begin
cli_decompressed = Tempfile.create(binmode: true) do |temp_write|
temp_write.write(compressed)
temp_write.close
Tempfile.create(binmode: true) do |temp_read|
system "zstd", "--decompress", "--quiet", "--force", "-o", temp_read.path, temp_write.path, exception: true
File.read(temp_read.path, binmode: true)
end
end
puts " zstd cli: #{cli_decompressed.bytesize} bytes, #{Digest::SHA256.hexdigest(cli_decompressed)[0, 10]} checksum"
rescue => e
puts " zstd cli error: #{e}"
end
end
end
def test_stream_write
(1..256_000).each do |length|
original = "a" * length
stream = Zstd::StreamingCompress.new
stream << original
res = stream.finish
compare_compressed(original: original, compressed: res)
end
end
def test_stream_compress
(1..256_000).each do |length|
original = "a" * length
stream = Zstd::StreamingCompress.new
res = stream.compress(original)
res << stream.finish
compare_compressed(original: original, compressed: res)
end
end
def test_compress
(1..256_000).each do |length|
original = "a" * length
res = Zstd.compress(original)
compare_compressed(original: original, compressed: res)
end
end
case ARGV[0]
when "stream_write"
puts "=== Zstd::StreamingCompress.new with << ==="
test_stream_write
when "stream_compress"
puts "=== Zstd::StreamingCompress.new with .compress ==="
test_stream_compress
when "compress"
puts "=== Zstd.compress ==="
test_compress
else
abort "Unknown test mode: #{ARGV[0].inspect}"
end
Reproduction script usage
- Run
ruby test_zstd.rb stream_write to test Zstd::StreamingCompress.new with << which should exhibit the first issue above randomly for input sizes in the ~16,000-131,072 byte range, and the second issue consistently for inputs greater than or equal to 131,072 bytes.
- Run
ruby test_zstd.rb stream_compress to test Zstd::StreamingCompress.new with .compress which should exhibit the third issue describe above with inputs sizes greater than ~16,000 bytes randomly having issues.
- RUn
ruby test_zstd.rb compress to test Zstd.compress which generates no errors for me.
Reproduction script example output
-
For ruby test_zstd.rb stream_write note that given the Ruby stream compression input, that zstd CLI actually does produce the same output as the original input, even when Zstd.decompress does not (this is what the checksum of the content is in the output for). However, once you get to 131,072 bytes, then all decompression starts to fail completely.
=== Zstd::StreamingCompress.new with << ===
Content mismatch for 16937 bytes input
Original: 16937 bytes, e31be8f076 checksum
Zstd.decompress: 16937 bytes, 41fb77f572 checksum
zstd cli: 16937 bytes, e31be8f076 checksum
Content mismatch for 21515 bytes input
Original: 21515 bytes, 1cccd688d3 checksum
Zstd.decompress: 21515 bytes, 6695186c17 checksum
zstd cli: 21515 bytes, 1cccd688d3 checksum
[...]
Content mismatch for 97075 bytes input
Original: 97075 bytes, 1f642ed1a7 checksum
Zstd.decompress: 97075 bytes, acc3eb205e checksum
zstd cli: 97075 bytes, 1f642ed1a7 checksum
Decompression error for 131072 bytes input (not compressed by zstd: Unspecified error code)
Original: 131072 bytes, b44ffb72fc checksum
zstd: /var/folders/td/52lw67lj0wz36_rhqflz24_19mm_gh/T/20250813-83789-e4qsiy: unknown header
zstd cli error: Command failed with exit 1: zstd
Decompression error for 131073 bytes input (not compressed by zstd: Unspecified error code)
Original: 131073 bytes, 7e009ea4ef checksum
zstd: /var/folders/td/52lw67lj0wz36_rhqflz24_19mm_gh/T/20250813-83789-uzf2ug: unsupported format
zstd cli error: Command failed with exit 1: zstd
[...]
-
For ruby test_zstd.rb stream_compress note it still exhibits the first issue, but it behaves the same above 131,072 bytes of input:
=== Zstd::StreamingCompress.new with .compress ===
Content mismatch for 16671 bytes input
Original: 16671 bytes, 75fa71ee56 checksum
Zstd.decompress: 16671 bytes, fefcb91c80 checksum
zstd cli: 16671 bytes, 75fa71ee56 checksum
Content mismatch for 16936 bytes input
Original: 16936 bytes, a9d4d5bb65 checksum
Zstd.decompress: 16936 bytes, adfd120dfd checksum
zstd cli: 16936 bytes, a9d4d5bb65 checksum
[...]
Content mismatch for 98731 bytes input
Original: 98731 bytes, 093614bb66 checksum
Zstd.decompress: 98731 bytes, fa9fe924fd checksum
zstd cli: 98731 bytes, 093614bb66 checksum
Content mismatch for 131185 bytes input
Original: 131185 bytes, b5b6d4d116 checksum
Zstd.decompress: 131185 bytes, 16ab74a052 checksum
zstd cli: 131185 bytes, b5b6d4d116 checksum
[...]
Content mismatch for 244541 bytes input
Original: 244541 bytes, 951b4d7ef8 checksum
Zstd.decompress: 244541 bytes, 170cce21a4 checksum
zstd cli: 244541 bytes, 951b4d7ef8 checksum
Content mismatch for 247016 bytes input
Original: 247016 bytes, 2b51d7363f checksum
Zstd.decompress: 247016 bytes, c5bc5222b8 checksum
zstd cli: 247016 bytes, 2b51d7363f checksum
-
For ruby test_zstd.rb compress when not using the streaming compressor, it seems like everything works and the tests produce no output of mismatched things:
In trying to use
Zstd::StreamingCompress.new, I seem to have encountered a couple different data corruption issues. These present themselves in two different ways:Zstd::StreamingCompress.newand use<<or.writeto pass in a string from ~16,000 bytes to 131,072 bytes, it will sometimes randomly generate a resulting compressed string thatZstd.decompresswill generate differing output than the input for (it doesn't fail, but the output is different and corrupt). However, the compressed string is capable of being decompressed successfully by thezstdCLI tool, and in that case it matches the input exactly, so the compression doesn't necessarily seem corrupt, butZstd.decompressitself seems to not be able to handle this string.Zstd::StreamingCompress.newand use<<or.writeto pass in a string of 131,072 bytes or more in length, thenZstd::StreamingCompresswill consistently produce corrupt content that cannot be decompressed (by either the gem or the CLI tool).Zstd::StreamingCompress.newand use.compress(instead of<<or.write) to pass in data, then it only exhibits the first issue above, and not the second issue. So data passed to.compressgreater than ~16,000 bytes will randomly generate corrupt output if passed toZstd.decompress, but it will work via thezstdCLI. However, instead of the second issue above with data greater than 131,072 consistently failing completely, these longer strings will still exhibit the first issue (mismatchedZstd.decompressoutput).I cannot reproduce these issues if I use
Zstd.compress, so this seems specific to the streaming compression.Reproduction script
I've reproduce this with both zstd-ruby 1.5.7.0 and 2.0.0.pre.preview1 on the following platforms:
Here's my attempt at a script to reproduce this if you save the following to
test_zstd.rb. Sorry it's maybe a bit convoluted to test all 3 situations, but more explanation on usage and some abbreviated output below:Reproduction script usage
ruby test_zstd.rb stream_writeto testZstd::StreamingCompress.newwith<<which should exhibit the first issue above randomly for input sizes in the ~16,000-131,072 byte range, and the second issue consistently for inputs greater than or equal to 131,072 bytes.ruby test_zstd.rb stream_compressto testZstd::StreamingCompress.newwith.compresswhich should exhibit the third issue describe above with inputs sizes greater than ~16,000 bytes randomly having issues.ruby test_zstd.rb compressto testZstd.compresswhich generates no errors for me.Reproduction script example output
For
ruby test_zstd.rb stream_writenote that given the Ruby stream compression input, thatzstdCLI actually does produce the same output as the original input, even whenZstd.decompressdoes not (this is what the checksum of the content is in the output for). However, once you get to 131,072 bytes, then all decompression starts to fail completely.For
ruby test_zstd.rb stream_compressnote it still exhibits the first issue, but it behaves the same above 131,072 bytes of input:For
ruby test_zstd.rb compresswhen not using the streaming compressor, it seems like everything works and the tests produce no output of mismatched things: