fastder is a C++ based tool for detecting expressed regions in RNA-seq data.
It is intended to build on the recount3 resource, which consists of over 750'000 uniformly processed RNA-seq samples across different mouse and human studies.
The tool aims to reconstruct expressed genes prior to splicing in an annotation-agnostic approach.
fastder takes genome-wide coverage bigWig files and splice junction coordinates as an input. The tool averages across samples and performs thresholding to identify
consecutive regions with above-threshold expression. Following this, fastder attempts to stitch together expressed regions (ERs)
by searching for splice junction coordinates that overlap with the start and end position of these expressed regions.
recount3 provides RNA-seq data for over 8'000 human and over 10'000 mouse studies. Each study consist of multiple per-sample
coverage bigWig files and one set of per-study splice junction coordinate files amongst others.
These datasets can be downloaded from their online platform.
Thus, the user can either provide data from one of the existing studies or run the recount3 pipeline with new RNA-seq data.
recount3 provides uniformly processed RNA-seq data for over 8'000 human and over 10'000 mouse studies. Each study consists of several thousand samples. Existing input files can be retrieved from the recount3 online platform.
If a user wishes to run fastder on new RNA-seq data, the easiest way to obtain the required input data is to run the recount3 pipeline.
fastder builds on the Monorail pipeline used by recount3. Monorail takes the FASTQ files provided by Illumina Sequencing as an input.
A brief summary of the relevant steps in the Monorail pipeline (used to create recount3 resources) is provided below:
-
Input data:
- unpaired or paired-end FASTQ files
- suffix-array-based index of reference genome sequence
-
Perform spliced alignment with STAR to obtain
- a BAM file with the spliced alignment
- a summary of detected splice junction
-
Use Megadepth to produce bigWig coverage files
-
Aggregate SJ.out.tab into a
- MM file
- RR file
The following diagram provides an overview of the tables and objects used in fastder. The _File suffix indicates that the table is one of the input files.
All other tables are objects created by the Parser class to map between the three different sample IDs (in lilac) used by the splice junction and coverage files respectively.
The following sequence diagram provides an abstracted overview of the three main functional stages of fastder.
fastder can currently take only one RR and MM file as an input. Thus, users directly working with
recount3 resources can only provide samples from the same study as an input.
fastderexpects all input files to be in the same folder (provided as a relative path to the build directory with--dir).fastderallows users to optionally specify which chromosomes they wish to analyze. The flag--chr <chr1>means that the tool will only output expressed regions on chromosome 1, and will ignore all coverage and splice junction information from other chromosomes).fastderallows optionally specifying four different thresholds:--min-coverage 0.25describes the coverage threshold of an expressed region (ER). A consecutive base-pair position must have at least 0.25 CPM coverage to be added to en ER.--min-length 5describes the minimum length (in bp) that an ER must have. For instance, three consecutive base pairs with coverage > 0.25 CPM will be ignored if the min length is set to 5 bp.--position-tolerance 5describes the maximum permitted offset of the end position of an exon and the starting position of a splice junction. If this tolerance is set to 5, an ER with end position = 1000 bp and a splice junction with start position = 1005 bp will be stitched together (if the coverage and end junction match).--coverage-tolerance 0.1describes the maximum permitted coverage deviation between two ER that are separated by a spliced region. For a coverage tolerance of 0.1, two ERs with coverage = 10 CPM and 11 CPM will be stitched together (if there is a matching splice junction).
A visualization of the different parameters is provided below.
Usage:
fastder \
--dir <path> ... \
[--chr <chr1> <chr2> ...] \
[--min-coverage <float>] \
[--position-tolerance <int>] \
[--coverage-tolerance <float>] \
[--help]
Required inputs:
--dir <path> ... Relative path from the build directory to the directory containing the input files.
Example: --dir ../../data/test_exon_skipping
Optional inputs:
--chr <chr1> <chr2> ... List of chromosomes to process.
Default: all (chr1-chr22, chrX)
Example: --chr chr1 chr2 chr3
--min-length <float> Minimum length [#bp] required for a region to qualify as an expressed region (ER).
Default: 5 bp
Example: --min-length 5
--min-coverage <float> Minimum coverage [CPM] required for a region to qualify as an ER.
Normalized in-place by library size.
Default: 0.25 CPM
Example: --min-coverage 0.25
--position-tolerance <int> Maximum allowed positional deviation between splice junction and ER coordinates [bp].
Default: 5 bp
Example: --position-tolerance 5
--coverage-tolerance <float> Allowed relative deviation in coverage between stitched ERs (e.g. 0.1 = 10%).
Default: 0.1
Example: --coverage-tolerance 0.1
--help Show this help message.
Example:
fastder \
--dir ../../data/input \
--chr chr1 chr2 \
--position-tolerance 5 \
--min-length 5 \
--min-coverage 0.25 \
--coverage-tolerance 0.1
GPLv3
- Snakemake pipeline
- installation requirements: CMAke version 4 or newer




