Skip to content

Significant TMRCA discrepancy between tsdate and Relate on 1KGP YRI (Chr 22) #507

@Jesson-mark

Description

@Jesson-mark

Hi,

I am currently using tsinfer + tsdate to reconstruct ARGs and estimate coalescence times. I recently ran this pipeline on Chromosome 22 of the Yoruba (YRI) samples from the 1000 Genomes Project and compared the results with those inferred by Relate.

I noticed a substantial discrepancy in the TMRCA estimates for local trees between the two methods. Specifically, tsdate produces significantly older estimates compared to Relate, especially in the deep past.

Observations

Here is the distribution of TMRCAs across Chromosome 22:

  • tsdate:

    • Median TMRCA: ~2.57 million years (assuming 25 years/gen)
    • Max TMRCA: ~40 million years
Image
  • Relate:

    • Median TMRCA: ~1.46 million years (assuming 25 years/gen)
    • Max TMRCA: ~15 million years
Image

As shown, the maximum TMRCA inferred by tsdate reaches ~40 MYA, whereas Relate caps around ~15 MYA. The median age is also nearly double in tsdate.

Parameters & Reproducibility

I used the following parameters:

tsinfer:

  • mismatch_ratio: 1
  • recombination_rate: 1.25e-8

tsdate:

  • mutation_rate: 1.25e-8
  • Ne (Effective Population Size): I didn't specify Ne in this pipeline.

Questions:

Is such a large discrepancy expected for African populations which have deep coalescence times?

Could this be related to the prior $N_e$ used in tsdate? If I haven't specified a demographic history, would the default prior cause this overestimation in deep time?

Any insights or suggestions on how to align these estimates would be greatly appreciated.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions