feat: Add Snappy support by Erouan50 · Pull Request #466 · Nullus157/async-compression

Erouan50 · 2026-03-27T13:54:21Z

Hello there!

This pull request introduces support for framed snappy decoder and encoder. I'm not an expert in compression/decompression, but I feel the snappy one is quite different from the others. My work was inspired by the gzip decoding (for the state machine), the work done in the snap crate in the framing reader and writer, and of course the actual spec. One big difference is the whole frame must be buffered before compressing or decompressing it.

I have a few questions/notes:

The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?
One test is failing (snappy::tokio::bufread::decompress::trailer), but I don't know how to fix it. If I understand the test correctly, it checks if the decoder can stop decoding with some trailing data in the buffer. But with the framing format, I don't know how to address that. Should I stop decoding when the chunk type is invalid, or should I stop and fail (current behavior)?
Should I add more tests? Because the test data ([1, 2, 3...]) are not compressible in snappy, the encoder and decoder never compress or decompress anything. But is it the job of the proptests to create some random test data?

Implement the Snappy framing format as a new compression algorithm, following the same pattern as the existing codecs (zstd, lz4, etc.). The encoder produces Snappy frames with stream identifiers, compressed/uncompressed chunks (falling back to uncompressed when compression ratio is poor), and masked CRC32C checksums. The decoder is a state machine that parses frame headers, buffers chunk data, and verifies checksums before emitting decompressed output. New dependencies: `snap` for raw Snappy compression and `crc32c` for checksum computation. Both are optional behind the `snappy` feature flag.

NobodyXu · 2026-03-27T14:42:46Z

The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?

I'm ok with either, though based on the size of the PR and that its CI is failing, I think getting a baseline implementation merged is a more sensible decision

Erouan50 · 2026-03-27T14:51:15Z

The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?

I'm ok with either, though based on the size of the PR and that its CI is failing, I think getting a baseline implementation merged is a more sensible decision

That's fair. Let me fix the proptest. I forgot them...

Erouan50 · 2026-03-27T15:20:05Z

The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?

I'm ok with either, though based on the size of the PR and that its CI is failing, I think getting a baseline implementation merged is a more sensible decision

@NobodyXu I do agree to implement that in other PR. I fixed the tests except the one I mentioned in the description:

     Summary [  41.303s] 884 tests run: 882 passed, 2 failed, 0 skipped
        FAIL [   0.013s] async-compression::snappy snappy::futures::bufread::decompress::trailer
        FAIL [   0.008s] async-compression::snappy snappy::tokio::bufread::decompress::trailer

Do you have any opinion on how should I fix it? Or should I just skip this test for snappy?

NobodyXu

Thank you, I'm really unhappy with all the repeated vec creation which resulted in repeated copying and heap allocation

crates/compression-codecs/src/snappy/decoder.rs

crates/compression-codecs/src/snappy/encoder.rs

NobodyXu · 2026-03-27T15:20:37Z

crates/compression-codecs/src/snappy/mod.rs

+            0xFF => Self::Stream,
+            0x00 => Self::Compressed,
+            0x01 => Self::Uncompressed,
+            0xFE => Self::Padding,
+            0x02..=0x7f => Self::ReservedUnskippable(value),
+            0x80..=0xFD => Self::ReservedSkippable(value),


Do we have to hardcode it here?

Ideally the ChunkType should be in snap

It is but the enum is not public.

I think it'd be better if snap makes it public under the raw module

NobodyXu · 2026-03-27T15:22:25Z

crates/compression-codecs/src/snappy/mod.rs

+}
+
+fn mask_crc(crc: u32) -> u32 {
+    (crc.wrapping_shr(15) | crc.wrapping_shl(17)).wrapping_add(0xA282EAD8)


Another magic value, better in snap, I assume this is probably duplicate code?

Yes it's duplicate code. The function is not public.

Can we have it public under snap::raw module?

Hardcoding it here really feels wrong

crates/compression-codecs/src/snappy/decoder.rs

NobodyXu

Question regarding why encoder is designed with extra buffering

I can understand that the decoder needs it as we cannot control on bow the chunks are created

crates/compression-codecs/src/snappy/encoder.rs

NobodyXu · 2026-03-27T15:42:26Z

Do you have any opinion on how should I fix it? Or should I just skip this test for snappy?

Does snappy have a "file header" that recorded the size of the entire stream?

If we have it, then we can configure the decoder to stop decoding once that limit is matched, and only way to continue would be reinit

Otherwise we cannot fix it

Erouan50 · 2026-03-27T16:10:41Z

Do you have any opinion on how should I fix it? Or should I just skip this test for snappy?

Does snappy have a "file header" that recorded the size of the entire stream?

If we have it, then we can configure the decoder to stop decoding once that limit is matched, and only way to continue would be reinit

Otherwise we cannot fix it

No, the first "stream" chunk is just there to mark the start of the file, but it doesn't include the length of the whole stream.

Erouan50 · 2026-03-27T21:49:15Z

Thank you, I'm really unhappy with all the repeated vec creation which resulted in repeated copying and heap allocation

You're right, I apologize for that... I tried to apply your suggestions. It's not perfect (I don't like the reset() + clear() + resize() in the decoder for example), but I hope I'm going to the right direction?

NobodyXu

Thanks!

My feedback is that the copying around of uncompressed sections is unnecessary and should be directly copied into the output

And that snap::raw should have more public items to prevent duplicate, and that it needs to support uninitialised buffer for maximum efficiency

NobodyXu · 2026-03-28T02:12:38Z

crates/compression-codecs/src/snappy/decoder.rs

+                let mut decoder = snap::raw::Decoder::new();
+                decoder.decompress(data, out_buf)?;
+            }
+            ChunkType::Uncompressed => out_buf.extend_from_slice(data),


We should skip this copy, by putting the chunk type into a new variant UncompressedCopy so that it'd copy from data, along with an offset, initially set to 4 to skip the header.

Using that offset we can construct a PartialBuffer

NobodyXu · 2026-03-28T02:14:47Z

crates/compression-codecs/src/snappy/decoder.rs

+                out_buf.resize(uncompress_length, 0);
+                let mut decoder = snap::raw::Decoder::new();
+                decoder.decompress(data, out_buf)?;


It's a shame that snap does not support writing to uninitialised buffer, zeroing the whole vec would indeed take quite some time

NobodyXu · 2026-03-28T02:16:33Z

crates/compression-codecs/src/snappy/decoder.rs

+
+impl DecodeV2 for SnappyDecoder {
+    fn reinit(&mut self) -> std::io::Result<()> {
+        *self = Self::new();


Suggested change

*self = Self::new();

self.state = State::default();

self.in_buf.clear();

self.out_buf.reset();

NobodyXu · 2026-03-28T02:27:12Z

crates/compression-codecs/src/snappy/mod.rs

+    }
+}
+
+pub fn read_u24_le(slice: &[u8; 3]) -> u32 {


Suggested change

pub fn read_u24_le(slice: &[u8; 3]) -> u32 {

fn read_u24_le(slice: &[u8; 3]) -> u32 {

This function is used by snappy submodules and doesn't have to be public

NobodyXu · 2026-03-28T02:29:08Z

crates/compression-codecs/src/snappy/mod.rs

+        ))?;
+
+        let chunk_type = ChunkType::from(header_part[0]);
+        // SAFETY: header_part is guaranteed to have at least 4 bytes due to split_first_chunk


Suggested change

// SAFETY: header_part is guaranteed to have at least 4 bytes due to split_first_chunk

// header_part is guaranteed to have at least 4 bytes due to split_first_chunk

Safety is for unsafe block and function

NobodyXu · 2026-03-28T02:30:46Z

crates/compression-codecs/src/snappy/decoder.rs

+    Skipping(usize),
+    Buffering {
+        remaining: usize,


Suggested change

Skipping(usize),

Buffering {

remaining: usize,

Skipping(NonZeroUize),

Buffering {

remaining: NonZeroUsize,

Using NonZeroUsize would make it clear the invariant of the state

NobodyXu · 2026-03-28T02:34:35Z

crates/compression-codecs/src/snappy/encoder.rs

+        out_buf.resize(max_compress_size + 8, 0);
+
+        let mut encoder = snap::raw::Encoder::new();
+        let compress_data = encoder.compress(in_buffer, &mut out_buf[8..])?;


It's a shame snap::raw doesn't support uninitialised buffer

NobodyXu · 2026-03-28T02:35:48Z

crates/compression-codecs/src/snappy/encoder.rs

+        let chunk_type = if compress_data >= in_buffer.len() - (in_buffer.len() / 8) {
+            out_buf.clear();
+            out_buf.resize(in_buffer.len() + 8, 0);
+            out_buf[8..].copy_from_slice(in_buffer);


We should have a new state UncompressedCopy instead of doing this intermediate copy

NobodyXu · 2026-03-28T02:40:48Z

It's not perfect (I don't like the reset() + clear() + resize() in the decoder for example), but I hope I'm going to the right direction?

Yeah it's the right direction, it looks ugly because snap::raw does not support uninitialised buffer, speaking of which the {de, en}code_vec is also very inefficient due to that

Erouan50 and others added 4 commits March 27, 2026 09:53

Merge branch 'main' into feat/snappy

71d868e

fix: Address Clippy complains

dc0a8c8

fix: Address Cargo formatting complains

130777e

Fix regression from previous commit

ad114ab

NobodyXu requested changes Mar 27, 2026

View reviewed changes

crates/compression-codecs/src/snappy/encoder.rs Show resolved Hide resolved

Apply review from @NobodyXu

fba07bb

Erouan50 added 3 commits March 27, 2026 17:53

Apply missing review from @NobodyXu

3b20b9c

Remove unused import

f61f7b4

Address Clippy issues

04edb10

This comment has been minimized.

Sign in to view

NobodyXu requested changes Mar 28, 2026

View reviewed changes

-        *self = Self::new();
+        self.state = State::default();
+        self.in_buf.clear();
+        self.out_buf.reset();

	pub fn read_u24_le(slice: &[u8; 3]) -> u32 {
	fn read_u24_le(slice: &[u8; 3]) -> u32 {

	// SAFETY: header_part is guaranteed to have at least 4 bytes due to split_first_chunk
	// header_part is guaranteed to have at least 4 bytes due to split_first_chunk

Conversation

Erouan50 commented Mar 27, 2026

Uh oh!

NobodyXu commented Mar 27, 2026

Uh oh!

Erouan50 commented Mar 27, 2026

Uh oh!

Erouan50 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NobodyXu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NobodyXu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NobodyXu commented Mar 27, 2026

Uh oh!

Erouan50 commented Mar 27, 2026

Uh oh!

Erouan50 commented Mar 27, 2026

Uh oh!

This comment has been minimized.

NobodyXu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NobodyXu commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Erouan50 commented Mar 27, 2026 •

edited

Loading