Greetings! I am using the library and really like it. Very flexible and performs quite well. However... there's always room for optimization. I noticed that the activity on this project is low and most comments are very old, so I should ask first, should I be looking into a newer project that has effectively replaced this one?
That being said, my experience in C++ based sorting showed two improvements can produce very significant results:
- Pipelining: during the block sort phase, instead of doing the accumulate/blocksort/write in a procedural loop, fill each block and then lob it into an execution pipeline that separates the sort and write into separate parallel tasks.
- Compression: A light compression like Snappy can reduce temp space by 70% or so, and can result in faster I/O (especially if compression is done in parallel using the above pipeline technique).
I haven't dug deep enough into the code to see if some of this is already supported, please tell me to RTFM if I've missed something.
Thanks,
john
Greetings! I am using the library and really like it. Very flexible and performs quite well. However... there's always room for optimization. I noticed that the activity on this project is low and most comments are very old, so I should ask first, should I be looking into a newer project that has effectively replaced this one?
That being said, my experience in C++ based sorting showed two improvements can produce very significant results:
I haven't dug deep enough into the code to see if some of this is already supported, please tell me to RTFM if I've missed something.
Thanks,
john