-
Notifications
You must be signed in to change notification settings - Fork 83
Description
A number of use-cases have come up recently where people want to get a single ancestral haplotype out of a tree sequence. The quick-and-dirty way to do this, as used by @saurabhbelsare , is to change the flag on the ancestral node to mark it as a sample, and then use the haplotypes iterator. This (presumably) will report the unknown sections as "missing data". I wonder if we want a .haplotype() method that gets a single haplotype (possibly an ancestral one) from a tree sequence.
Alternatively, and this is maybe better, we could wrap the method suggested above by modifying the .haplotypes() iterator to (optionally) take a list of nodes: haplotypes(nodes=None, *, ...). If nodes is None, we default to the sample nodes, as currently, but if nodes is not None it can be a list of any nodes, including non-sample ones. In this case, we take a copy of the tables, mark the specified nodes as samples (and all other nodes as non-samples), simplify(), and output the haplotypes. This would also be a way to allow people to output a few sample haplotypes without slurping the whole variant matrix into memory (at the moment we advise that if only a few haplotypes are needed, the user should simplify first, but we don't actually do the work for them).
At the moment all the parameters to ts.haplotypes() are keyword only, so this won't break API compatibility.