-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hi,
I am currently using tsinfer + tsdate to reconstruct ARGs and estimate coalescence times. I recently ran this pipeline on Chromosome 22 of the Yoruba (YRI) samples from the 1000 Genomes Project and compared the results with those inferred by Relate.
I noticed a substantial discrepancy in the TMRCA estimates for local trees between the two methods. Specifically, tsdate produces significantly older estimates compared to Relate, especially in the deep past.
Observations
Here is the distribution of TMRCAs across Chromosome 22:
-
tsdate:
- Median TMRCA: ~2.57 million years (assuming 25 years/gen)
- Max TMRCA: ~40 million years
-
Relate:
- Median TMRCA: ~1.46 million years (assuming 25 years/gen)
- Max TMRCA: ~15 million years
As shown, the maximum TMRCA inferred by tsdate reaches ~40 MYA, whereas Relate caps around ~15 MYA. The median age is also nearly double in tsdate.
Parameters & Reproducibility
I used the following parameters:
tsinfer:
- mismatch_ratio: 1
- recombination_rate: 1.25e-8
tsdate:
- mutation_rate: 1.25e-8
- Ne (Effective Population Size): I didn't specify Ne in this pipeline.
Questions:
Is such a large discrepancy expected for African populations which have deep coalescence times?
Could this be related to the prior
Any insights or suggestions on how to align these estimates would be greatly appreciated.
Thanks!