Skip to content

Latest commit

 

History

History
97 lines (70 loc) · 2.74 KB

File metadata and controls

97 lines (70 loc) · 2.74 KB

simGL

PyPI

DOI

Documentation

simGL simulates genotype likelihoods (GLs) from haplotypic genotype matrices, given per-sample coverage and sequencing error rates. It is designed to work seamlessly with msprime and tskit pipelines, but accepts any NumPy haplotype matrix.

Installation

pip install simGL

Or via conda:

conda install -c conda-forge simGL

Or from source:

git clone https://github.com/RacimoLab/simGL.git
cd simGL
pip install -e .

Quick example

import msprime
import numpy as np
import simGL

# 1. Simulate a tree sequence and extract the biallelic genotype matrix
ts = msprime.sim_ancestry(
    samples=10, ploidy=2, sequence_length=100_000,
    recombination_rate=1e-8, population_size=10_000, random_seed=1,
)
ts = msprime.sim_mutations(ts, rate=1e-4, random_seed=1)

gm_full  = ts.genotype_matrix()
biallelic = gm_full.max(axis=1) == 1
gm       = gm_full[biallelic]           # shape (n_sites, n_haplotypes)

# 2. Get reference and alternative alleles
variants = list(ts.variants())
ref = np.array([v.alleles[0] for v in variants])[biallelic]
alt = np.array([v.alleles[1] for v in variants])[biallelic]

# 3. Simulate allele read counts
arc = simGL.sim_allelereadcounts(
    gm, mean_depth=10., std_depth=2., e=0.01,
    ploidy=2, seed=42, ref=ref, alt=alt,
)
# arc shape: (n_sites, n_individuals, 4)  —  A, C, G, T read counts

# 4. Compute genotype likelihoods
GL = simGL.allelereadcounts_to_GL(arc, e=0.01, ploidy=2)
# GL shape: (n_sites, n_individuals, 10)  —  all diploid ACGT genotypes

# 5. Subset to biallelic genotypes and write a VCF
Ra     = simGL.ref_alt_to_index(ref, alt)
GL_sub = simGL.subset_GL(GL, Ra, ploidy=2)

pos   = np.array([int(v.site.position) for v in variants])[biallelic] + 1
names = [f"ind{i}" for i in range(ts.num_individuals)]
simGL.GL_to_vcf(GL_sub, arc, ref, alt, pos, names, "output.vcf")

Documentation

Full documentation — installation, user guide, API reference, and theory — is available at https://simgl.readthedocs.io.

Citation

If you use simGL in your work, please cite:

Coll Macià, M. & Gower, G. (2026). simGL: Simulate genotype likelihoods from haplotypic genotype matrices (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.19637642

Please also cite the methodological papers listed in the Citation page of the documentation.

License

MIT