Less memory-intensive method for getting a single haplotype, including ancestral haplotypes

A number of use-cases have come up recently where people want to get a single ancestral haplotype out of a tree sequence. The quick-and-dirty way to do this, as used by @saurabhbelsare , is to change the flag on the ancestral node to mark it as a sample, and then use the `haplotypes` iterator. This (presumably) will report the unknown sections as "missing data". I wonder if we want a `.haplotype()` method that gets a single haplotype (possibly an ancestral one) from a tree sequence.

Alternatively, and this is maybe better, we could wrap the method suggested above by modifying the `.haplotypes()` iterator to (optionally) take a list of nodes: `haplotypes(nodes=None, *, ...)`. If `nodes` is None, we default to the sample nodes, as currently, but if `nodes` is not None it can be a list of any nodes, including non-sample ones. In this case, we take a copy of the tables, mark the specified nodes as samples (and all other nodes as non-samples), simplify(), and output the haplotypes. This would also be a way to allow people to output a few sample haplotypes without slurping the whole variant matrix into memory (at the moment we advise that if only a few haplotypes are needed, the user should simplify first, but we don't actually do the work for them).

At the moment all the parameters to `ts.haplotypes()` are keyword only, so this won't break API compatibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less memory-intensive method for getting a single haplotype, including ancestral haplotypes #1200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Less memory-intensive method for getting a single haplotype, including ancestral haplotypes #1200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions