feat(RISC-V): Add RVV-optimized implementation for memcopy64#213
feat(RISC-V): Add RVV-optimized implementation for memcopy64#213danilak-G merged 19 commits intogoogle:mainfrom
Conversation
…peed For RISC-V[skip ci]
…into add_rvv_support
…into add_rvv_support
…into add_rvv_support
…into add_rvv_support
|
@danilak-G |
Hi @danilak-G, |
|
Overall LGTM, thanks for fixing many things and now it looks greate, some small comments |
snappy.cc
Outdated
| size_t remaining_bytes = size; | ||
| //Loop as long as there are bytes remaining to be copied. | ||
| while (remaining_bytes > 0) { | ||
| //Set vector configuration: e8 (8-bit elements), m2 (LMUL=2). |
There was a problem hiding this comment.
Can you fix the indentation to 4 spaces?
snappy.cc
Outdated
| // RVV acceleration available on RISC-V when compiled with -march=rv64gcv | ||
| #elif defined(__riscv) && SNAPPY_HAVE_RVV | ||
| // Cast pointers to the type we will operate on. | ||
| unsigned char* dst_ptr = (unsigned char*)dst; |
There was a problem hiding this comment.
Use reinterpret_cast, please
snappy.cc
Outdated
| vuint8m2_t vec = VLE8_V_U8M2(src_ptr, vl); | ||
| //Store data to the current destination pointer. | ||
| VSE8_V_U8M2(dst_ptr, vec, vl); | ||
| //Update pointers and the remaining count. |
There was a problem hiding this comment.
Set the space after // for all comments
snappy.cc
Outdated
| static_cast<const uint8_t*>(src) + kShortMemCopy, | ||
| 64 - kShortMemCopy); | ||
| } | ||
| static_cast<const uint8_t*>(src) + kShortMemCopy, |
There was a problem hiding this comment.
Extra not-needed space added
…into add_rvv_support
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
Fixed, thanks for the review! |
| std::memmove(dst + kShortMemCopy, | ||
| static_cast<const uint8_t*>(src) + kShortMemCopy, | ||
| 64 - kShortMemCopy); | ||
| } |
There was a problem hiding this comment.
This closing bracket should not be deleted
There was a problem hiding this comment.
Sure—bracket restored. Thanks for catching that!
snappy.cc
Outdated
| #else | ||
| std::memmove(dst, src, kShortMemCopy); | ||
| // Profiling shows that nearly all copies are short. | ||
| //Profiling shows that nearly all copies are short. |
snappy.cc
Outdated
| size_t vl = VSETVL_E8M2(remaining_bytes); | ||
| // Load data from the current source pointer. | ||
| vuint8m2_t vec = VLE8_V_U8M2(src_ptr, vl); | ||
| // Store data to the current destination pointer. |
There was a problem hiding this comment.
Align comments with code
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
snappy.cc
Outdated
| // Suppress -Wignored-attributes warning for __m128i in x86 SSE2 environment | ||
| // warning: ignoring attributes on template argument 'snappy::internal::V128' {aka '__vector(2) long long int'} [-Wignored-attributes] | ||
| // This occurs because __m128i has vector attributes (e.g., __attribute__((vector_size(16)))) that are ignored in template parameters. | ||
| #ifdef __SSE2__ |
There was a problem hiding this comment.
This is deleting LoadPatternAndReshuffleMask completely for non SSE2. SNAPPY_HAVE_VECTOR_BYTE_SHUFFLE should be correct, I guess
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64

This PR add RVV to optimize Memcopy64 function , improving compression speed by ~46%.
Optimize Snappy 1.2.2 performance
Added RVV support for RISC-V in Snappy, optimizing Memcopy64 by leveraging RVV vector load/store instructions (e.g., vle8_v_u8m2, vse8_v_u8m2) to reduce memory copy overhead and improve decompression performance. lzbench 2.1 tests onsilesia.tar (GCC 13.2.1, 64-bit Linux) show:Snappy 1.2.2 unittest ([ PASSED ] 21 tests.)
Running main() from gmock_main.cc
[==========] Running 21 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CorruptedTest
[ RUN ] CorruptedTest.VerifyCorrupted
Crazy decompression lengths not checked on 64-bit build
[ OK ] CorruptedTest.VerifyCorrupted (5 ms)
[----------] 1 test from CorruptedTest (5 ms total)
[----------] 17 tests from Snappy
[ RUN ] Snappy.SimpleTests
[ OK ] Snappy.SimpleTests (22 ms)
[ RUN ] Snappy.AppendSelfPatternExtensionEdgeCases
[ OK ] Snappy.AppendSelfPatternExtensionEdgeCases (3 ms)
[ RUN ] Snappy.AppendSelfPatternExtensionEdgeCasesExhaustive
[ OK ] Snappy.AppendSelfPatternExtensionEdgeCasesExhaustive (4830 ms)
[ RUN ] Snappy.MaxBlowup
[ OK ] Snappy.MaxBlowup (11 ms)
[ DISABLED ] Snappy.DISABLED_MoreThan4GB
[ RUN ] Snappy.RandomData
[ OK ] Snappy.RandomData (20311 ms)
[ RUN ] Snappy.FourByteOffset
[ OK ] Snappy.FourByteOffset (1 ms)
[ RUN ] Snappy.IOVecSourceEdgeCases
[ OK ] Snappy.IOVecSourceEdgeCases (0 ms)
[ RUN ] Snappy.IOVecSinkEdgeCases
[ OK ] Snappy.IOVecSinkEdgeCases (0 ms)
[ RUN ] Snappy.IOVecLiteralOverflow
[ OK ] Snappy.IOVecLiteralOverflow (0 ms)
[ RUN ] Snappy.IOVecCopyOverflow
[ OK ] Snappy.IOVecCopyOverflow (0 ms)
[ RUN ] Snappy.ReadPastEndOfBuffer
[ OK ] Snappy.ReadPastEndOfBuffer (0 ms)
[ RUN ] Snappy.ZeroOffsetCopy
[ OK ] Snappy.ZeroOffsetCopy (0 ms)
[ RUN ] Snappy.ZeroOffsetCopyValidation
[ OK ] Snappy.ZeroOffsetCopyValidation (0 ms)
[ RUN ] Snappy.FindMatchLength
[ OK ] Snappy.FindMatchLength (0 ms)
[ RUN ] Snappy.FindMatchLengthRandom
[ OK ] Snappy.FindMatchLengthRandom (1145 ms)
[ RUN ] Snappy.VerifyCharTable
[ OK ] Snappy.VerifyCharTable (0 ms)
[ RUN ] Snappy.TestBenchmarkFiles
[ OK ] Snappy.TestBenchmarkFiles (406 ms)
[----------] 17 tests from Snappy (26734 ms total)
[----------] 3 tests from SnappyCorruption
[ RUN ] SnappyCorruption.TruncatedVarint
[ OK ] SnappyCorruption.TruncatedVarint (0 ms)
[ RUN ] SnappyCorruption.UnterminatedVarint
[ OK ] SnappyCorruption.UnterminatedVarint (0 ms)
[ RUN ] SnappyCorruption.OverflowingVarint
[ OK ] SnappyCorruption.OverflowingVarint (0 ms)
[----------] 3 tests from SnappyCorruption (0 ms total)
[----------] Global test environment tear-down
[==========] 21 tests from 3 test suites ran. (26740 ms total)
[ PASSED ] 21 tests.
YOU HAVE 1 DISABLED TEST
[ PASSED ] 21 tests.