Skip to content

Latest commit

 

History

History
21 lines (16 loc) · 758 Bytes

File metadata and controls

21 lines (16 loc) · 758 Bytes

rr_util

Utilities for generating raw reads datasets for deep neural network training

usage: bam2reads.py [-h] -b BAM -s FASTA [-o OUT_TAG] [-n NUM_SEQ_PER_CHUNK]

Takes sorted bam file and produces reads file and index for ML modeling

options:
  -h, --help            show this help message and exit
  -b BAM, --bam BAM     bam files to process
  -s FASTA, --fasta FASTA
                        FASTA of reference sequences used to align bam
  -o OUT_TAG, --out_tag OUT_TAG
                        tag used for tag.reads.txt and tags.index.csv
  -n NUM_SEQ_PER_CHUNK, --num_seq_per_chunk NUM_SEQ_PER_CHUNK
                        number of sequences

Outputs are .index.csv with 1-indexed start/end of reads for each ref sequence and .reads.txt