Skip to content

Less memory-intensive method for getting a single haplotype, including ancestral haplotypes #1200

@hyanwong

Description

@hyanwong

A number of use-cases have come up recently where people want to get a single ancestral haplotype out of a tree sequence. The quick-and-dirty way to do this, as used by @saurabhbelsare , is to change the flag on the ancestral node to mark it as a sample, and then use the haplotypes iterator. This (presumably) will report the unknown sections as "missing data". I wonder if we want a .haplotype() method that gets a single haplotype (possibly an ancestral one) from a tree sequence.

Alternatively, and this is maybe better, we could wrap the method suggested above by modifying the .haplotypes() iterator to (optionally) take a list of nodes: haplotypes(nodes=None, *, ...). If nodes is None, we default to the sample nodes, as currently, but if nodes is not None it can be a list of any nodes, including non-sample ones. In this case, we take a copy of the tables, mark the specified nodes as samples (and all other nodes as non-samples), simplify(), and output the haplotypes. This would also be a way to allow people to output a few sample haplotypes without slurping the whole variant matrix into memory (at the moment we advise that if only a few haplotypes are needed, the user should simplify first, but we don't actually do the work for them).

At the moment all the parameters to ts.haplotypes() are keyword only, so this won't break API compatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions