Conversation
| #' @param sample_sheet: matrix of file names and sample names | ||
| #' | ||
| #' @return ints_sorted: list of sample names with corresponding file names (technical replicates) | ||
|
|
There was a problem hiding this comment.
Shouldn't these roxygen style comments be places above the function definition? https://roxygen2.r-lib.org/articles/roxygen2.html#basic-process
| @@ -0,0 +1,29 @@ | |||
| # function for parse_samplesheet | |||
| generate_repl_pattern <- function(sample_sheet) { | |||
| #' Generate replication pattern list based on information in sample_sheet | |||
There was a problem hiding this comment.
So the function name should be generate_repl_pattern_list?
| #' | ||
| #' @return ints_sorted: list of sample names with corresponding file names (technical replicates) | ||
|
|
||
| # get the right columns from the samplesheet |
There was a problem hiding this comment.
In-line comments are more usefull if they explain a choice, rather than stating what the line(s) below it does.
the right column
does not make it clearer, but if you explain the rationale then it does become clearer :)
| file_name_col <- grep("File_Name|File Name", colnames(sample_sheet)) | ||
| sample_name_col <- grep("Sample_Name|Sample Name", colnames(sample_sheet)) | ||
| # get the unique sample names from the samplesheet | ||
| sample_names <- sort(unique(trimws(as.vector(unlist(sample_sheet[sample_name_col]))))) |
There was a problem hiding this comment.
Could you consider using piping to make this more readable?
| sample_names <- sort(unique(trimws(as.vector(unlist(sample_sheet[sample_name_col]))))) | |
| sample_names <- sample_sheet[sample_name_col] |> | |
| unlist() |> | |
| as.vector() |> | |
| trimws() |> | |
| unique() |> | |
| sort() |
| repl_pattern <- c() | ||
| for (sample_group in sample_names) { | ||
| file_indices <- which(sample_sheet[, sample_name_col] == sample_group) | ||
| file_names <- sample_sheet[file_indices, file_name_col] | ||
| repl_pattern <- c(repl_pattern, list(file_names)) | ||
| } |
There was a problem hiding this comment.
Of misschien?
| repl_pattern <- c() | |
| for (sample_group in sample_names) { | |
| file_indices <- which(sample_sheet[, sample_name_col] == sample_group) | |
| file_names <- sample_sheet[file_indices, file_name_col] | |
| repl_pattern <- c(repl_pattern, list(file_names)) | |
| } | |
| repl_pattern <- split( | |
| sample_sheet[[file_name_col]], | |
| sample_sheet[[sample_name_col]] | |
| )[sample_names] |
| repl_pattern <- generate_repl_pattern(sample_sheet) | ||
|
|
||
| # write the replication pattern to text file for troubleshooting purposes | ||
| sink("replication_pattern.txt") |
There was a problem hiding this comment.
I would add something to the output name to more clearly indicate that this is a logging output, and not actual 'data'. e.g., replication_pattern_log.txt
|
|
||
| script: | ||
| """ | ||
| Rscript ${baseDir}/CustomModules/DIMS/ParseSamplesheet.R $samplesheet $params.preprocessing_scripts_dir |
There was a problem hiding this comment.
Using $params inside a module/process is not nextflow best practice.
The best option is to add it as an input. So input becomes:
input:
path(samplesheet)
path(preprocessing_scripts_dir)
|
|
||
| script: | ||
| """ | ||
| Rscript ${baseDir}/CustomModules/DIMS/ParseSamplesheet.R $samplesheet $params.preprocessing_scripts_dir |
There was a problem hiding this comment.
Why is the Rscript referred from the baseDir? For stability sake, it it would make more sense to use a resource folder and put it in there. For example in the PRS repo: https://github.com/UMCUGenetics/DxNextflowPRS/blob/develop/modules/local/SNPlist/main.nf (this is python but it works in the same way)
Rscript ${baseDir}/CustomModules/DIMS/ParseSamplesheet.R
| tag "DIMS ParseSamplesheet" | ||
| label 'ParseSamplesheet' | ||
| container = 'docker://umcugenbioinf/dims:1.3' | ||
| shell = ['/bin/bash', '-euo', 'pipefail'] |
There was a problem hiding this comment.
| shell = ['/bin/bash', '-euo', 'pipefail'] | |
Remove those here, shell directives are/should be declared in the main nextflow.config
| path('init.RData') | ||
| path('replication_pattern.txt') |
There was a problem hiding this comment.
outputs are missing emit statements
MakeInit hernoemd naar ParseSamplesheet
Code opgeschoond, replication pattern wordt nu gemaakt adhv de informatie in de samplesheet. Variabele nr_replicates is niet meer nodig. Unit test toegevoegd.