Add module + python script to generate QC plots of the relationship between mutation density and the sequencing depth.
This module should work with the files that compile the mutation densities : all_mutdensities.tsv and all_adjusted_mutdensities.tsv and also with the file that has all the omega values: all_omegas_globalloc.tsv
Ideally if the omegas file is not available the QC for the mutation densities should still happen (if mutdensity is activated).
And in terms of functions, the module should take care of plotting as a scatterplot the relationship between the mutation density/omega per sample (get information from the samples.json generated within the pipeline) and the average sequencing depth per gene per sample (also generated within the pipeline).
If the scatterplot is too busy because there are too many genes, the solution might be to split it into several different plots or find any other clever solution.
In addition to the plotting, it would be great to have a TSV file summarizing the results of the effect of depth on the corresponding analysis metrics.
Also if you find a scenario where for a given gene many samples have missing values this is also something interesting to report so that we can see at around which depth we start to have no value since these may have some consequences for the interpretation.
If there is information on the coverage of the gene in addition to the mean sequencing depth that may also be interesting to use.
This would generate an overview of the impact of the average sequencing depth on the metrics of mutagenesis and selection that should then be interpreted carefully to not over-interpret or misinterpret the results.
Add module + python script to generate QC plots of the relationship between mutation density and the sequencing depth.
This module should work with the files that compile the mutation densities : all_mutdensities.tsv and all_adjusted_mutdensities.tsv and also with the file that has all the omega values: all_omegas_globalloc.tsv
Ideally if the omegas file is not available the QC for the mutation densities should still happen (if mutdensity is activated).
And in terms of functions, the module should take care of plotting as a scatterplot the relationship between the mutation density/omega per sample (get information from the samples.json generated within the pipeline) and the average sequencing depth per gene per sample (also generated within the pipeline).
If the scatterplot is too busy because there are too many genes, the solution might be to split it into several different plots or find any other clever solution.
In addition to the plotting, it would be great to have a TSV file summarizing the results of the effect of depth on the corresponding analysis metrics.
Also if you find a scenario where for a given gene many samples have missing values this is also something interesting to report so that we can see at around which depth we start to have no value since these may have some consequences for the interpretation.
If there is information on the coverage of the gene in addition to the mean sequencing depth that may also be interesting to use.
This would generate an overview of the impact of the average sequencing depth on the metrics of mutagenesis and selection that should then be interpreted carefully to not over-interpret or misinterpret the results.