Skip to content

enable implicit multithreading to improve compression performance#33

Open
neo-0007 wants to merge 1 commit intocompiler-research:developfrom
neo-0007:feat/improve-compression
Open

enable implicit multithreading to improve compression performance#33
neo-0007 wants to merge 1 commit intocompiler-research:developfrom
neo-0007:feat/improve-compression

Conversation

@neo-0007
Copy link
Copy Markdown

RNTupleWriteOptions enables implicit multithreading by default https://root.cern/doc/v636/classROOT_1_1RNTupleWriteOptions.html#a6b57fe45b76b33670c4b8b979ccff490 . But the multithreading needs to be enabled first using EnableImplicitMT() .
On testing samtoramntuple tool on a 37GB SAM file converted from a BAM file obtained from the 1000 Genomes Project: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/

Observations :

BAM FILE : HG00100.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
BAM FILE SIZE : 8,752,906,441 bytes | ~8347 MB  | ~8.15 GB
SAM FILE : HG00100.sam
SAM FILE SIZE : 37,054,527,496 bytes | ~35,339 MB | ~34.5 GB

RUN 1 (develop branch)
RAM FILE : HG00100.ram
RAM FILE SIZE : 6,755,871,183 bytes | ~6443 MB | ~6.29 GB
COMPRESSION RATIO: ~5.488x
CONVERSION TIME (SAM TO RAM): Real time 0:21:32 | CP time 1215.140

RUN 2 (change 1)
CHANGE : called EnableImplicitMT();
REASON: RNTupleWriteOptions enables ImplicitMT by default but we need to enable the MT first
RAM FILE : HG00100.ram
RAM FILE SIZE : 6,755,871,183 bytes | ~6443 MB | ~6.29 GB ( no change)
COMPRESSION RATIO: ~5.488x (no change)
CONVERSION TIME (SAM TO RAM): Real time 0:09:02, CP time 1950.110 (~2.4x spedup with just 12 core multi threading)

RUN3 (change 2)
CHANGE: called EnableImplicitMT and change max unzipped page size to 1024*1024 and change aprox zipped cluster size to 256*1024*1024
REASON: Implicit MT runs the compression parallely using multiple cores so increasing the paze size would make sense
RAM FILE : HG00100.ram
RAM FILE SIZE : 6,755,871,183 bytes | ~6443 MB | ~6.29 GB ( no change)
COMPRESSION RATIO: ~5.488x (no change)
CONVERSION TIME (SAM TO RAM): Real time 0:09:00, CP time 1889.930 (not really much of a improvement than change 1)

complete logs can be found here

Monitoring the CPU utilization during run :
Before change : ( single core used )

one_core_used

After change : ( multi cores utlized )
multiple_core_used

  • Utilization of all cores (12 for my system ) reduced the conversion time by ~2.4 times
  • Data integrity was validated by running multiple region queries on both outputs using ramntupleview. all regions returned identical record counts, confirming that enabling implicit multithreading does not affect the dataset contents

@neo-0007
Copy link
Copy Markdown
Author

Followup : Can we use RNTupleParallelWriter here . That would further improve the performance but we would require to change the indexing logic since the data would than not be in sequential manner which the current indexing assumes.

@AdityaPandeyCN
Copy link
Copy Markdown

What about the comparison with CRAM? any results on that?

@github-actions
Copy link
Copy Markdown

clang-tidy review says "All clean, LGTM! 👍"

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 14, 2026

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.91%. Comparing base (71a3365) to head (1de20ca).

Files with missing lines Patch % Lines
src/ramcore/SamToNTuple.cxx 57.14% 2 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (57.14%) is below the target coverage (85.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #33      +/-   ##
===========================================
+ Coverage    59.89%   59.91%   +0.02%     
===========================================
  Files           16       16              
  Lines         1476     1477       +1     
  Branches       631      632       +1     
===========================================
+ Hits           884      885       +1     
  Misses         513      513              
  Partials        79       79              
Flag Coverage Δ
unittests 59.91% <57.14%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/ramcore/SamToNTuple.cxx 92.52% <57.14%> (+0.04%) ⬆️
Files with missing lines Coverage Δ
src/ramcore/SamToNTuple.cxx 92.52% <57.14%> (+0.04%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@swetank18
Copy link
Copy Markdown

Great work! I noticed in my benchmarks (#34) that ZSTD achieves near-LZMA compression ratios while being faster. Combined with EnableImplicitMT(), this could give both better compression and faster conversion. Have you tested ZSTD + multithreading together?

@neo-0007
Copy link
Copy Markdown
Author

Thanks @swetank18 , You are right ZSTD along with Level 5 is the best compression settings for RAM files which are also been bench marked and documented here https://compiler-research.org/blogs/gsoc25_aditya_pandey_final_blog/ . Great work on your analysis #34 . I will also try to test out various settings and comparisons as you did and analyse the results on a different dataset.
And yes , the benchmarks i have provided uses ZSTD along with Multi threading as that's what the current samtoramntuple uses

samtoramntuple_split_by_chromosome(input, output, 505, quality_mode, 4);
.

@neo-0007
Copy link
Copy Markdown
Author

neo-0007 commented Mar 14, 2026

What about the comparison with CRAM? any results on that?

Yes I did a conversion of the BAM and SAM files to CRAM , here are the benchmarks

BAM FILE : HG00100.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
BAM FILE SIZE : 8,752,906,441 bytes | ~8347 MB  | ~8.15 GB
SAM FILE : HG00100.sam
SAM FILE SIZE : 37,054,527,496 bytes | ~35,339 MB | ~34.5 GB
CONVERSION TOOL : samtools

BAM to CRAM
CRAM FILE : HG00100.cram
CRAM FILE SIZE : 4,529,767,359 bytes | ~4320 MB | ~4.22 GB
COMPRESSION RATIO (vs SAM): ~8.18×

CONVERSION TIME (BAM to CRAM):
Real time  : 0:01:22
CPU time   : 7:10.928 (user) + 0:19.358 (sys)

SAM to CRAM
CRAM FILE : HG00100_from_sam.cram
CRAM FILE SIZE : 4,533,541,090 bytes | ~4324 MB | ~4.22 GB

COMPRESSION RATIO (vs SAM): ~8.17×

CONVERSION TIME (SAM to CRAM):
Real time  : 0:01:47
CPU time   : 7:21.050 (user) + 0:46.716 (sys)

Comparison with RAM

CRAM file size : ~4.22 GB
CRAM + reference : ~5.05 GB
RAM file size : ~6.29 GB

CRAM is ~2.07 GB smaller than RAM (~1.49× smaller)
CRAM + reference is ~1.24 GB smaller than RAM (~1.25× smaller)

CRAM conversion time : ~1–2 minutes ( samtools using 12 cores )
RAM conversion time (ImplicitMT) : ~9 minutes (~5–6× slower than SAM to CRAM)
RAM conversion time (develop branch) : ~21 minutes (~12–15× slower than SAM to CRAM)

Plots :
size_comparison

time_comparison

@neo-0007 neo-0007 force-pushed the feat/improve-compression branch 2 times, most recently from 3cd3a0f to 93cc85b Compare March 14, 2026 11:14
@neo-0007
Copy link
Copy Markdown
Author

added clang-format suggestions

@neo-0007 neo-0007 force-pushed the feat/improve-compression branch from 93cc85b to b28dc02 Compare March 20, 2026 08:01
@neo-0007
Copy link
Copy Markdown
Author

@mvassilev @AdityaPandeyCN ,
Can this be reviewed .

@github-actions
Copy link
Copy Markdown

clang-tidy review says "All clean, LGTM! 👍"

@mvassilev
Copy link
Copy Markdown
Collaborator

@mvassilev @AdityaPandeyCN , Can this be reviewed .

@neo-0007 please check the clang-format error.

@neo-0007 neo-0007 force-pushed the feat/improve-compression branch from b28dc02 to 3ee8c96 Compare March 20, 2026 08:20
@neo-0007
Copy link
Copy Markdown
Author

@mvassilev Rebased on top of latest develop and ran git-clang-format HEAD~ to add git-clang-format suggestions

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread src/ramcore/SamToNTuple.cxx Outdated
@neo-0007 neo-0007 force-pushed the feat/improve-compression branch from 3ee8c96 to 1de20ca Compare March 20, 2026 09:11
@github-actions
Copy link
Copy Markdown

clang-tidy review says "All clean, LGTM! 👍"

@vgvassilev
Copy link
Copy Markdown

@guitargeek, can you take a look?

Copy link
Copy Markdown

@guitargeek guitargeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and a very nice demonstration of parallel RNTuple writing!

Signed-off-by: Hrishikesh Gohain <hrishikeshgohain123@gmail.com>
@vgvassilev vgvassilev force-pushed the feat/improve-compression branch from 1de20ca to 2390a85 Compare March 23, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants