enable implicit multithreading to improve compression performance by neo-0007 · Pull Request #33 · compiler-research/ramtools

neo-0007 · 2026-03-13T13:01:59Z

RNTupleWriteOptions enables implicit multithreading by default https://root.cern/doc/v636/classROOT_1_1RNTupleWriteOptions.html#a6b57fe45b76b33670c4b8b979ccff490 . But the multithreading needs to be enabled first using EnableImplicitMT() .
On testing samtoramntuple tool on a 37GB SAM file converted from a BAM file obtained from the 1000 Genomes Project: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/

Observations :

BAM FILE : HG00100.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
BAM FILE SIZE : 8,752,906,441 bytes | ~8347 MB  | ~8.15 GB
SAM FILE : HG00100.sam
SAM FILE SIZE : 37,054,527,496 bytes | ~35,339 MB | ~34.5 GB

RUN 1 (develop branch)
RAM FILE : HG00100.ram
RAM FILE SIZE : 6,755,871,183 bytes | ~6443 MB | ~6.29 GB
COMPRESSION RATIO: ~5.488x
CONVERSION TIME (SAM TO RAM): Real time 0:21:32 | CP time 1215.140

RUN 2 (change 1)
CHANGE : called EnableImplicitMT();
REASON: RNTupleWriteOptions enables ImplicitMT by default but we need to enable the MT first
RAM FILE : HG00100.ram
RAM FILE SIZE : 6,755,871,183 bytes | ~6443 MB | ~6.29 GB ( no change)
COMPRESSION RATIO: ~5.488x (no change)
CONVERSION TIME (SAM TO RAM): Real time 0:09:02, CP time 1950.110 (~2.4x spedup with just 12 core multi threading)

RUN3 (change 2)
CHANGE: called EnableImplicitMT and change max unzipped page size to 1024*1024 and change aprox zipped cluster size to 256*1024*1024
REASON: Implicit MT runs the compression parallely using multiple cores so increasing the paze size would make sense
RAM FILE : HG00100.ram
RAM FILE SIZE : 6,755,871,183 bytes | ~6443 MB | ~6.29 GB ( no change)
COMPRESSION RATIO: ~5.488x (no change)
CONVERSION TIME (SAM TO RAM): Real time 0:09:00, CP time 1889.930 (not really much of a improvement than change 1)

complete logs can be found here

Monitoring the CPU utilization during run :
Before change : ( single core used )

After change : ( multi cores utlized )

Utilization of all cores (12 for my system ) reduced the conversion time by ~2.4 times
Data integrity was validated by running multiple region queries on both outputs using ramntupleview. all regions returned identical record counts, confirming that enabling implicit multithreading does not affect the dataset contents

neo-0007 · 2026-03-13T13:10:26Z

Followup : Can we use RNTupleParallelWriter here . That would further improve the performance but we would require to change the indexing logic since the data would than not be in sequential manner which the current indexing assumes.

AdityaPandeyCN · 2026-03-14T04:19:38Z

What about the comparison with CRAM? any results on that?

github-actions · 2026-03-14T06:21:20Z

clang-tidy review says "All clean, LGTM! 👍"

codecov-commenter · 2026-03-14T06:22:26Z

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.91%. Comparing base (71a3365) to head (1de20ca).

Files with missing lines	Patch %	Lines
src/ramcore/SamToNTuple.cxx	57.14%	2 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (57.14%) is below the target coverage (85.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop      #33      +/-   ##
===========================================
+ Coverage    59.89%   59.91%   +0.02%     
===========================================
  Files           16       16              
  Lines         1476     1477       +1     
  Branches       631      632       +1     
===========================================
+ Hits           884      885       +1     
  Misses         513      513              
  Partials        79       79

Flag	Coverage Δ
unittests	`59.91% <57.14%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/ramcore/SamToNTuple.cxx	`92.52% <57.14%> (+0.04%)`	⬆️

Files with missing lines	Coverage Δ
src/ramcore/SamToNTuple.cxx	`92.52% <57.14%> (+0.04%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

swetank18 · 2026-03-14T09:06:15Z

Great work! I noticed in my benchmarks (#34) that ZSTD achieves near-LZMA compression ratios while being faster. Combined with EnableImplicitMT(), this could give both better compression and faster conversion. Have you tested ZSTD + multithreading together?

neo-0007 · 2026-03-14T10:16:07Z

Thanks @swetank18 , You are right ZSTD along with Level 5 is the best compression settings for RAM files which are also been bench marked and documented here https://compiler-research.org/blogs/gsoc25_aditya_pandey_final_blog/ . Great work on your analysis #34 . I will also try to test out various settings and comparisons as you did and analyse the results on a different dataset.
And yes , the benchmarks i have provided uses ZSTD along with Multi threading as that's what the current samtoramntuple uses

ramtools/tools/samtoramntuple.cxx

Line 52 in 3a3873a

samtoramntuple_split_by_chromosome(input, output, 505, quality_mode, 4);

.

neo-0007 · 2026-03-14T10:59:42Z

What about the comparison with CRAM? any results on that?

Yes I did a conversion of the BAM and SAM files to CRAM , here are the benchmarks

BAM FILE : HG00100.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
BAM FILE SIZE : 8,752,906,441 bytes | ~8347 MB  | ~8.15 GB
SAM FILE : HG00100.sam
SAM FILE SIZE : 37,054,527,496 bytes | ~35,339 MB | ~34.5 GB
CONVERSION TOOL : samtools

BAM to CRAM
CRAM FILE : HG00100.cram
CRAM FILE SIZE : 4,529,767,359 bytes | ~4320 MB | ~4.22 GB
COMPRESSION RATIO (vs SAM): ~8.18×

CONVERSION TIME (BAM to CRAM):
Real time  : 0:01:22
CPU time   : 7:10.928 (user) + 0:19.358 (sys)

SAM to CRAM
CRAM FILE : HG00100_from_sam.cram
CRAM FILE SIZE : 4,533,541,090 bytes | ~4324 MB | ~4.22 GB

COMPRESSION RATIO (vs SAM): ~8.17×

CONVERSION TIME (SAM to CRAM):
Real time  : 0:01:47
CPU time   : 7:21.050 (user) + 0:46.716 (sys)

Comparison with RAM

CRAM file size : ~4.22 GB
CRAM + reference : ~5.05 GB
RAM file size : ~6.29 GB

CRAM is ~2.07 GB smaller than RAM (~1.49× smaller)
CRAM + reference is ~1.24 GB smaller than RAM (~1.25× smaller)

CRAM conversion time : ~1–2 minutes ( samtools using 12 cores )
RAM conversion time (ImplicitMT) : ~9 minutes (~5–6× slower than SAM to CRAM)
RAM conversion time (develop branch) : ~21 minutes (~12–15× slower than SAM to CRAM)

Plots :

neo-0007 · 2026-03-14T11:16:04Z

added clang-format suggestions

neo-0007 · 2026-03-20T08:03:53Z

@mvassilev @AdityaPandeyCN ,
Can this be reviewed .

github-actions · 2026-03-20T08:10:50Z

clang-tidy review says "All clean, LGTM! 👍"

mvassilev · 2026-03-20T08:14:30Z

@mvassilev @AdityaPandeyCN , Can this be reviewed .

@neo-0007 please check the clang-format error.

neo-0007 · 2026-03-20T08:21:20Z

@mvassilev Rebased on top of latest develop and ran git-clang-format HEAD~ to add git-clang-format suggestions

github-actions

clang-tidy made some suggestions

github-actions · 2026-03-20T09:22:30Z

clang-tidy review says "All clean, LGTM! 👍"

vgvassilev · 2026-03-21T16:22:05Z

@guitargeek, can you take a look?

guitargeek

Looks good to me, and a very nice demonstration of parallel RNTuple writing!

Signed-off-by: Hrishikesh Gohain <hrishikeshgohain123@gmail.com>

neo-0007 force-pushed the feat/improve-compression branch 2 times, most recently from 3cd3a0f to 93cc85b Compare March 14, 2026 11:14

neo-0007 force-pushed the feat/improve-compression branch from 93cc85b to b28dc02 Compare March 20, 2026 08:01

neo-0007 force-pushed the feat/improve-compression branch from b28dc02 to 3ee8c96 Compare March 20, 2026 08:20

github-actions Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread src/ramcore/SamToNTuple.cxx Outdated

neo-0007 force-pushed the feat/improve-compression branch from 3ee8c96 to 1de20ca Compare March 20, 2026 09:11

guitargeek approved these changes Mar 23, 2026

View reviewed changes

feat: enable implicit multithreading to improve compression perf

2390a85

Signed-off-by: Hrishikesh Gohain <hrishikeshgohain123@gmail.com>

vgvassilev force-pushed the feat/improve-compression branch from 1de20ca to 2390a85 Compare March 23, 2026 16:43

Conversation

neo-0007 commented Mar 13, 2026

Uh oh!

neo-0007 commented Mar 13, 2026

Uh oh!

AdityaPandeyCN commented Mar 14, 2026

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

codecov-commenter commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

swetank18 commented Mar 14, 2026

Uh oh!

neo-0007 commented Mar 14, 2026

Uh oh!

neo-0007 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neo-0007 commented Mar 14, 2026

Uh oh!

neo-0007 commented Mar 20, 2026

Uh oh!

github-actions Bot commented Mar 20, 2026

Uh oh!

mvassilev commented Mar 20, 2026

Uh oh!

neo-0007 commented Mar 20, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Mar 20, 2026

Uh oh!

vgvassilev commented Mar 21, 2026

Uh oh!

guitargeek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-commenter commented Mar 14, 2026 •

edited

Loading

neo-0007 commented Mar 14, 2026 •

edited

Loading