Hi,
I've been using yacht as a way to reduce false positives in sourmash, and I wanted to ask if it's possible to update the tool to incorporate the latest features from sourmash_plugin_branchwater? This would be helpful for a couple of reasons:
- Currently, the newest version of yacht only supports processing one sample at a time, which becomes time-consuming when working with many samples.
- As highlighted in the tutorial, the training process is indeed time-consuming, especially with large databases. I've been training GTDB-R220 (all genomes) for nearly a week without results, whereas training on the genomic representatives version only took me about a morning. This performance gap is significant.
I believe incorporating improvements like supporting new rocksdb data format and using manysketch and/or fastmultigather could help reduce processing times and allow handling of multiple samples simultaneously.
Thanks for the great tool, and I'm looking forward to potential improvements in future releases!
Hi,
I've been using yacht as a way to reduce false positives in sourmash, and I wanted to ask if it's possible to update the tool to incorporate the latest features from sourmash_plugin_branchwater? This would be helpful for a couple of reasons:
I believe incorporating improvements like supporting new rocksdb data format and using manysketch and/or fastmultigather could help reduce processing times and allow handling of multiple samples simultaneously.
Thanks for the great tool, and I'm looking forward to potential improvements in future releases!