Feature/refactor_DIMS_MakeInit by mraves2 · Pull Request #94 · UMCUGenetics/CustomModules

mraves2 · 2026-02-13T15:40:16Z

MakeInit hernoemd naar ParseSamplesheet
Code opgeschoond, replication pattern wordt nu gemaakt adhv de informatie in de samplesheet. Variabele nr_replicates is niet meer nodig. Unit test toegevoegd.

Jorisvansteenbrugge · 2026-03-06T09:06:11Z

DIMS/preprocessing/parse_samplesheet_functions.R

+  #' @param sample_sheet: matrix of file names and sample names
+  #'
+  #' @return ints_sorted: list of sample names with corresponding file names (technical replicates)
+


Shouldn't these roxygen style comments be places above the function definition? https://roxygen2.r-lib.org/articles/roxygen2.html#basic-process

Jorisvansteenbrugge · 2026-03-06T09:06:53Z

DIMS/preprocessing/parse_samplesheet_functions.R

@@ -0,0 +1,29 @@
+# function for parse_samplesheet
+generate_repl_pattern <- function(sample_sheet) {
+  #' Generate replication pattern list based on information in sample_sheet


So the function name should be generate_repl_pattern_list?

Jorisvansteenbrugge · 2026-03-06T09:08:43Z

DIMS/preprocessing/parse_samplesheet_functions.R

+  #'
+  #' @return ints_sorted: list of sample names with corresponding file names (technical replicates)
+
+  # get the right columns from the samplesheet


In-line comments are more usefull if they explain a choice, rather than stating what the line(s) below it does.

the right column

does not make it clearer, but if you explain the rationale then it does become clearer :)

Jorisvansteenbrugge · 2026-03-06T09:11:20Z

DIMS/preprocessing/parse_samplesheet_functions.R

+  file_name_col <- grep("File_Name|File Name", colnames(sample_sheet))
+  sample_name_col <- grep("Sample_Name|Sample Name", colnames(sample_sheet))
+  # get the unique sample names from the samplesheet
+  sample_names <- sort(unique(trimws(as.vector(unlist(sample_sheet[sample_name_col])))))


Could you consider using piping to make this more readable?

Suggested change

sample_names <- sort(unique(trimws(as.vector(unlist(sample_sheet[sample_name_col])))))

sample_names <- sample_sheet[sample_name_col] |>

unlist() |>

as.vector() |>

trimws() |>

unique() |>

sort()

Jorisvansteenbrugge · 2026-03-06T09:29:09Z

DIMS/preprocessing/parse_samplesheet_functions.R

+  repl_pattern <- c()
+  for (sample_group in sample_names) {
+    file_indices <- which(sample_sheet[, sample_name_col] == sample_group)
+    file_names <- sample_sheet[file_indices, file_name_col]
+    repl_pattern <- c(repl_pattern, list(file_names))
+  }


Of misschien?

Suggested change

repl_pattern <- c()

for (sample_group in sample_names) {

file_indices <- which(sample_sheet[, sample_name_col] == sample_group)

file_names <- sample_sheet[file_indices, file_name_col]

repl_pattern <- c(repl_pattern, list(file_names))

}

repl_pattern <- split(

sample_sheet[[file_name_col]],

sample_sheet[[sample_name_col]]

)[sample_names]

Jorisvansteenbrugge · 2026-03-06T09:36:42Z

DIMS/ParseSamplesheet.R

+repl_pattern <- generate_repl_pattern(sample_sheet)
+
+# write the replication pattern to text file for troubleshooting purposes
+sink("replication_pattern.txt")


I would add something to the output name to more clearly indicate that this is a logging output, and not actual 'data'. e.g., replication_pattern_log.txt

Jorisvansteenbrugge · 2026-03-06T09:39:31Z

DIMS/ParseSamplesheet.nf

+
+    script:
+        """
+        Rscript ${baseDir}/CustomModules/DIMS/ParseSamplesheet.R $samplesheet $params.preprocessing_scripts_dir


Using $params inside a module/process is not nextflow best practice.

The best option is to add it as an input. So input becomes:

input: path(samplesheet) path(preprocessing_scripts_dir)

Jorisvansteenbrugge · 2026-03-06T09:44:33Z

DIMS/ParseSamplesheet.nf

+
+    script:
+        """
+        Rscript ${baseDir}/CustomModules/DIMS/ParseSamplesheet.R $samplesheet $params.preprocessing_scripts_dir


Why is the Rscript referred from the baseDir? For stability sake, it it would make more sense to use a resource folder and put it in there. For example in the PRS repo: https://github.com/UMCUGenetics/DxNextflowPRS/blob/develop/modules/local/SNPlist/main.nf (this is python but it works in the same way)

Rscript ${baseDir}/CustomModules/DIMS/ParseSamplesheet.R

Jorisvansteenbrugge · 2026-03-06T09:45:19Z

DIMS/ParseSamplesheet.nf

+    tag "DIMS ParseSamplesheet"
+    label 'ParseSamplesheet'
+    container = 'docker://umcugenbioinf/dims:1.3'
+    shell = ['/bin/bash', '-euo', 'pipefail']


Suggested change

shell = ['/bin/bash', '-euo', 'pipefail']

Remove those here, shell directives are/should be declared in the main nextflow.config

Jorisvansteenbrugge · 2026-03-06T09:47:24Z

DIMS/ParseSamplesheet.nf

+       path('init.RData')
+       path('replication_pattern.txt')


outputs are missing emit statements

mraves2 added 5 commits February 9, 2026 11:56

created function for generating replication pattern from sample sheet

e1e144e

renamed MakeInit step to ParseSamplesheet

f8fbc49

added unit tests for parse_samplesheet_functions

c4d4cf1

changed process name from MakeInit to ParseSamplesheet

99860c5

corrected file name for replication_pattern.txt

6dd934e

Jorisvansteenbrugge requested changes Mar 6, 2026

View reviewed changes

This was referenced Mar 6, 2026

Feature/refactor_MakeInit UMCUGenetics/DIMS#121

Open

Feature/refactor_DIMS_GenerateBreaks #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/refactor_DIMS_MakeInit#94

Feature/refactor_DIMS_MakeInit#94
mraves2 wants to merge 5 commits intodevelopfrom
feature/refactor_DIMS_MakeInit

mraves2 commented Feb 13, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Jorisvansteenbrugge Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-  sample_names <- sort(unique(trimws(as.vector(unlist(sample_sheet[sample_name_col])))))
+sample_names <- sample_sheet[sample_name_col] |>
+    unlist() |>
+    as.vector() |>
+    trimws() |>
+    unique() |>
+    sort()

Conversation

mraves2 commented Feb 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants