Skip to content

Add coverm/contig to nf-core/modules#997

Open
vinisalazar wants to merge 43 commits intonf-core:devfrom
vinisalazar:dev
Open

Add coverm/contig to nf-core/modules#997
vinisalazar wants to merge 43 commits intonf-core:devfrom
vinisalazar:dev

Conversation

@vinisalazar
Copy link
Copy Markdown
Contributor

@vinisalazar vinisalazar commented Mar 12, 2026

Add CoverM to calculate read alignment metrics for contig/bin alignments (#587)

  • Run nf-core modules install coverm/contig
  • Create new module directory and modify modules.json

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@vinisalazar
Copy link
Copy Markdown
Contributor Author

Converting this to draft until I add the module logic to the subworkflows, will ping when done.

@vinisalazar
Copy link
Copy Markdown
Contributor Author

Update to changes:

  • in bin/get_mag_depths.py, add option to open gzip and non-gzip files
  • contig depth calculation by default is now done with COVERM_CONTIG (METABAT2 is still an option)
  • on mag_depths_convert/main.nf, define function to remove gz extension
  • Add new params to nextflow.config with coverm options
  • Update nextflow_schema.json with relevant new params
  • On subworkflows/local/binning/main.nf:
    • Add checks to make sure CoverM mapper is bowtie2 (otherwise doesn't generate BAM files required for binning)
    • Enable both METABAT2 and COVERM as contig depth calculators
      • Same is done on subworkflows/local/binning_preparation for SR and LR
    • Rename output channel from metabat2depths to contig_depths

@vinisalazar vinisalazar marked this pull request as ready for review March 12, 2026 05:08
@vinisalazar
Copy link
Copy Markdown
Contributor Author

This one is good to go but will require merging nf-core/modules#10588 for the lint checks to pass.

@vinisalazar
Copy link
Copy Markdown
Contributor Author

PS all code generated by @claude was reviewed

@vinisalazar vinisalazar marked this pull request as draft March 12, 2026 11:50
@vinisalazar
Copy link
Copy Markdown
Contributor Author

Making this into draft again until the upstream update to the coverm module is merged.

@vinisalazar vinisalazar force-pushed the dev branch 2 times, most recently from c0d42d6 to 55f610f Compare March 12, 2026 12:56
vinisalazar and others added 14 commits March 17, 2026 10:50
Add CoverM to calculate read alignment metrics for contig/bin alignments (nf-core#587)

  - Run `nf-core modules install coverm/contig`
  - Create new module directory and modify modules.json
Introduces --coverm_mapper (default: 'bowtie2') to control which aligner
is used when estimating contig depths. CoverM-native mappers skip bowtie2
alignment and let CoverM handle reads→depths directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- COVERM_CONTIG_SHORTREAD/LONGREAD replace both METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS variants
- bowtie2 path: BAMs → CoverM (bam_input=true)
- coverm-native path: reads+assembly → CoverM (bam_input=false)
- Add runtime error when BAM-requiring binners are used with coverm-native
- Rename emit metabat2depths → contig_depths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add COVERM_CONTIG_SHORTREAD and COVERM_CONTIG_LONGREAD blocks with
--methods metabat to output MetaBAT2-compatible depth format.
Shortread also passes --mapper for coverm-native aligner selection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CoverM outputs uncompressed depth files; update get_mag_depths.py and
CONVERT_DEPTHS to accept both .gz and plain-text inputs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS references with COVERM_CONTIG
and update depth file extension from .txt.gz to .txt. Snapshots will need
regeneration after the first test run to capture updated version strings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add --depth_calculator (coverm|metabat2, default: coverm) to switch
  between COVERM_CONTIG and METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS for
  contig depth estimation
- Add --coverm_methods with full CoverM methods enum (default: metabat)
  to configure the coverage calculation method passed to CoverM
- Restore METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS_SHORTREAD/LONGREAD config
  blocks in modules.config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update coverm/contig to use eval/topic-based versioning and handle
  methods as List or string; remove versions.yml generation
- Remove ch_versions.mix for COVERM_CONTIG (topic versioning is automatic)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Sync coverm/contig main.nf and meta.yml to reference (eval/topic
  versioning, List-aware methods string handling)
- Remove ch_versions.mix for COVERM_CONTIG calls (topic versioning
  is automatic; no versions.yml emitted)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
  - Add coverm to dependencies
  - Add nf-core#997 to 'Added' section
  - Reflect addition of COVERM_CONTIG as contig depth calculator
@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 6, 2026

Hi @vinisalazar, sorry for the delay on the review, I had a busy week after all.

Regarding the nf-test errors: this was due to a malformation in the snapshot when editing it manually (when a version has 3 components X.Y.Z, it has to be in quotes, so "coverm": 0.7.0 should be "coverm": "0.7.0").

Regarding the implementation, this looks nice! but I still feel that this needs some deeper integration to really take advantage of coverm. In general, I have the following ideas about how the process should look:

  1. coverm contig should replace the manual alignment step in binning_preparation (except when using bowtie2, obviously), aligning the reads against the contigs and exporting both the depth file and the BAM files. So, coverm contig should live in binning_preparation, not in the binning subworkflow. For this:
    • Instead of coverage_mapper, I think it should be called shortread_coverage_mapper to be more explicit, and the options should only include short-read mappers (minimap2-sr, bwa-mem, bwa-mem2, strobealign).
    • For long reads, coverm should also replace the manual mapping in binning_preparation_longread, choosing the appropriate mapper (minimap2-ont/minimap2-pb/minimap2-hifi) based on the long-read platform from the samplesheet metadata.
    • Right now the coverm/contig module doesn't emit the cached BAM files as outputs, so this will need a module update. I can do that tomorrow!
  2. The method for coverm contig should remain metabat, as this output is what the binners (MetaBAT2, MaxBin2, MetaBinner) expect.
  3. For calculating per-bin coverage downstream, we should replace the get_mag_depths.py script with coverm genome, using the BAMs from the previous step (coverm contig or bowtie2) as input plus the bin FASTAs. We could also expose --methods as a pipeline parameter so users can choose which metric(s) to use (trimmed_mean, mean, covered_fraction, relative_abundance, etc). The question here is: if a user chooses multiple methods, which one should we report in bin_summary.tsv? Perhaps only the first one, leave the full coverm output in its TSV?

We can definitely leave step 3 for another PR and focus on the coverm contig integration in this one to keep things manageable.

What do you think about this @jfy133, @prototaxites?

Brief look looks fine to me, only 3 needs more careful so I agree to push to another PR!

@prototaxites
Copy link
Copy Markdown
Contributor

Yes, agreed, your plan sounds good to me!

@vinisalazar
Copy link
Copy Markdown
Contributor Author

Thanks @dialvarezs @jfy133 @prototaxites for the comments. Hope everyone could get some days off for Easter. I will be picking this up on Monday to implement points 1 and 2 suggested by Diego.

@vinisalazar
Copy link
Copy Markdown
Contributor Author

@dialvarezs I'm having some difficult interpreting the failing checks here, could I please request another review?

Thanks

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 15, 2026

@dialvarezs I'm having some difficult interpreting the failing checks here, could I please request another review?

Thanks

The rorr is:


The current character read is ']' with an int value of 93
Unable to determine the current character, it is not a string, number, array, or object
line number 224
index number 13623
            ],
............^

You're somehow missing a , in the .snap files most likely - or rather you've got an extra trailing one

Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this

Comment thread tests/test_alternatives.nf.test.snap Outdated
Comment thread tests/test_assembly_input.nf.test.snap Outdated
Comment thread tests/test_assembly_input.nf.test.snap Outdated
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 15, 2026

I suspect it is this, as I learnt the hard way it's best not to edit snapshot files directly for the same reason 😆

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 15, 2026

I think I fixed the original errors but appears tehre are others now

@dialvarezs
Copy link
Copy Markdown
Member

Hi @vinisalazar! As @jfy133 mentioned, it’s generally a good idea to update snapshots manually, since automatic updates can result in malformed JSON or subtle issues things nf-test may not like (e.g. differences in ordering).

I’ve now regenerated all the snapshots. If you’d like a full review, I can most likely do it over the weekend.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the meantime, it seems that something is not working on long read data. Many processes disspaeared from the snapshot, so I suppose that mag is finishing early now.
Can you give it a look?

@vinisalazar
Copy link
Copy Markdown
Contributor Author

Understood, apologies for that. My laptop can't handle the tests so I was struggling to update the snapshots programmatically. Over the weekend investigate the issue on the long reads data on a bigger machine that can actually run the tests.

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 15, 2026

Hi @vinisalazar! As @jfy133 mentioned, it’s generally a good idea to update snapshots manually, since automatic updates can result in malformed JSON or subtle issues things nf-test may not like (e.g. differences in ordering).

I’ve now regenerated all the snapshots. If you’d like a full review, I can most likely do it over the weekend.

Uhhh don't you mean the opposite? Manual bad, automatic,good?

@dialvarezs
Copy link
Copy Markdown
Member

Hi @vinisalazar! As @jfy133 mentioned, it’s generally a good idea to update snapshots manually, since automatic updates can result in malformed JSON or subtle issues things nf-test may not like (e.g. differences in ordering).
I’ve now regenerated all the snapshots. If you’d like a full review, I can most likely do it over the weekend.

Uhhh don't you mean the opposite? Manual bad, automatic,good?

Oops, my bad. Not the smartest idea to give recommendations when I’m still half asleep 😅.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants