Add ramsort tool: coordinate and name sort for RAM (RNTuple) files#30
Add ramsort tool: coordinate and name sort for RAM (RNTuple) files#30swetank18 wants to merge 5 commits intocompiler-research:developfrom
Conversation
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (76.27%) is below the target coverage (85.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #30 +/- ##
==========================================
Coverage ? 57.42%
==========================================
Files ? 18
Lines ? 1543
Branches ? 837
==========================================
Hits ? 886
Misses ? 525
Partials ? 132
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| @@ -0,0 +1,11 @@ | |||
| #ifndef RAMCORE_RAMSORT_H | |||
| #define RAMCORE_RAMSORT_H | |||
There was a problem hiding this comment.
warning: header guard does not follow preferred style [llvm-header-guard]
| #define RAMCORE_RAMSORT_H | |
| #ifndef GITHUB_WORKSPACE_INC_RAMCORE_RAMSORT_H | |
| #define GITHUB_WORKSPACE_INC_RAMCORE_RAMSORT_H |
inc/ramcore/RAMSort.h:10:
- #endif // RAMCORE_RAMSORT_H
+ #endif // GITHUB_WORKSPACE_INC_RAMCORE_RAMSORT_H| /// \param outputFile Path to output .root RAM file | ||
| /// \param byName If true, sort by QNAME; otherwise sort by (refid, pos) | ||
| /// \return 0 on success, 1 on error | ||
| int ramsortntuple(const char *inputFile, const char *outputFile, bool byName = false); |
There was a problem hiding this comment.
warning: unknown type name 'bool' [clang-diagnostic-error]
int ramsortntuple(const char *inputFile, const char *outputFile, bool byName = false);
^| /// \param outputFile Path to output .root RAM file | ||
| /// \param byName If true, sort by QNAME; otherwise sort by (refid, pos) | ||
| /// \return 0 on success, 1 on error | ||
| int ramsortntuple(const char *inputFile, const char *outputFile, bool byName = false); |
There was a problem hiding this comment.
warning: use of undeclared identifier 'false' [clang-diagnostic-error]
int ramsortntuple(const char *inputFile, const char *outputFile, bool byName = false);
^| @@ -0,0 +1,98 @@ | |||
| #include "ramcore/RAMSort.h" | |||
There was a problem hiding this comment.
warning: 'ramcore/RAMSort.h' file not found [clang-diagnostic-error]
#include "ramcore/RAMSort.h"
^| #include <exception> | ||
| #include <iostream> | ||
| #include <memory> | ||
| #include <numeric> |
There was a problem hiding this comment.
warning: included header memory is not used directly [misc-include-cleaner]
| #include <numeric> | |
| #include <numeric> |
| #include <numeric> | ||
| #include <string> | ||
| #include <utility> | ||
| #include <vector> |
There was a problem hiding this comment.
warning: included header utility is not used directly [misc-include-cleaner]
| #include <vector> | |
| #include <vector> |
| #include <string> | ||
| #include <utility> | ||
| #include <vector> | ||
|
|
There was a problem hiding this comment.
warning: included header vector is not used directly [misc-include-cleaner]
| #include <utility> | ||
| #include <vector> | ||
|
|
||
| int ramsortntuple(const char *inputFile, const char *outputFile, bool byName) |
There was a problem hiding this comment.
warning: function 'ramsortntuple' can be made static or moved into an anonymous namespace to enforce internal linkage [misc-use-internal-linkage]
| int ramsortntuple(const char *inputFile, const char *outputFile, bool byName) | |
| static int ramsortntuple(const char *inputFile, const char *outputFile, bool byName) |
| @@ -0,0 +1,110 @@ | |||
| #include <gtest/gtest.h> | |||
There was a problem hiding this comment.
warning: 'gtest/gtest.h' file not found [clang-diagnostic-error]
#include <gtest/gtest.h>
^| class RAMSortTest : public ::testing::Test { | ||
| protected: | ||
| static constexpr int kNumReads = 200; | ||
| const char *kSamFile = "sort_test.sam"; |
There was a problem hiding this comment.
warning: invalid case style for protected member 'kSamFile' [readability-identifier-naming]
| const char *kSamFile = "sort_test.sam"; | |
| const char *m_kSamFile = "sort_test.sam"; |
test/ramsorttests.cxx:21:
- GenerateSAMFile(kSamFile, kNumReads);
+ GenerateSAMFile(m_kSamFile, kNumReads);test/ramsorttests.cxx:25:
- samtoramntuple(kSamFile, kUnsortedFile, true, true, true, 505, 0);
+ samtoramntuple(m_kSamFile, kUnsortedFile, true, true, true, 505, 0);test/ramsorttests.cxx:30:
- std::remove(kSamFile);
+ std::remove(m_kSamFile);|
Fixed clang-tidy warnings in RAMSort.cxx: Fixed include order (llvm-include-order) All tests passing locally. Coverage should now be above 85% |
Summary
This PR introduces a
ramsortcommand-line tool that sorts RAM (RNTuple) files by genomic coordinate(refid, pos)or by query name (QNAME). It is the RNTuple-based equivalent ofsamtools sortandsamtools sort -n.Motivation
Sorted alignment files are a prerequisite for most downstream bioinformatics analyses — region queries, variant calling, duplicate marking, and indexing all require coordinate-sorted input. This PR brings native sort capability to RAMTools without requiring conversion back to SAM/BAM.
Changes
New files:
inc/ramcore/RAMSort.h—ramsortntuple()function declarationsrc/ramcore/RAMSort.cxx— implementation usingRNTupleReaderfield views andstd::stable_sorttools/ramsort.cxx— command-line entry pointtest/ramsorttests.cxx— 5 Google Test casesModified files:
CMakeLists.txt— addedRAMSort.handRAMSort.cxxto theramcorelibrarytools/CMakeLists.txt— addedramsortexecutable usingROOT_EXECUTABLEmacrotest/CMakeLists.txt— addedramsortteststest targetUsage
Implementation Notes
refid,pos,qname) are read upfront into vectors for cache-efficient sortingstd::stable_sortto preserve relative order of reads with identical keysRNTupleReaderviews — no full deserialization until write timeWriteAllRefs()andWriteIndex()src/ramcore/RAMSort.cxx(library) —tools/ramsort.cxxis a thin CLI wrapperTests
5 test cases in
test/ramsorttests.cxx— all passing:EntryCountPreserved— sorted file has same number of entries as inputCoordinateSortOrder— records are in non-decreasing(refid, pos)order after sortNameSortOrder— records are in non-decreasingQNAMElexicographic order after--by-nameIdempotentSort— sorting an already sorted file produces identical coordinate orderMissingInputFileReturnsError— graceful failure returns non-zero on bad inputTest Results