A small collection of fast, standalone Python scripts for processing genotype data from VCF files and related resources (reference FASTA, HPO, ClinVar).
Requirements: Python 3. No heavy dependencies; scripts use the standard library (plus optional network access for ClinVar).
Extracts flanking reference sequence for each variant in a VCF.
- Input: VCF + whole-genome FASTA
- Output: FASTA with one record per variant (configurable window, e.g. 100 bp up + ref + 100 bp down)
- Features: Uses a
.faiindex (built if missing), supports--include-ref, chr1/1 normalization, REF validation
python vcf_to_flanks_fasta.py --fasta genome.fa --vcf variants.vcf --out flanks.fasta
python vcf_to_flanks_fasta.py --fasta genome.fa --vcf variants.vcf --out flanks.fasta --include-ref --up 100 --down 100Checks VCF variants against ClinVar via NCBI E-utilities and reports pathogenic / likely pathogenic hits.
- Input: VCF
- Output: Tab-delimited file of variants that are pathogenic or likely pathogenic in ClinVar
- Features: GRCh37 or GRCh38 via
--assembly; rate-limited NCBI requests
python vcf_clinvar_pathogenic.py --vcf input.vcf --assembly GRCh38 --out clinvar_hits.tsvPredicts likely causal genes from a list of HPO phenotype terms using an HPO annotation database.
- Input: File of HPO term IDs (one per line) + phenotype annotation (HPOA or custom TSV)
- Output: Ranked candidate genes (and optionally top diseases)
- Features: Binary or IC-weighted scoring, optional OBO propagation, disease→gene mapping
python hpo_terms_to_gene.py --hpo_terms patient_hpo.txt --hpoa phenotype.hpoa --topk 20 --show_diseases
python hpo_terms_to_gene.py --hpo_terms terms.txt --tsv disease_hpo_gene.tsv --out results.tsv --weights icBuilds a small example reference FASTA (contigs 16 and 20) with REF bases matching ex2.vcf, for testing vcf_to_flanks_fasta.py without a full genome.
python make_example_genome.py
# Creates genome.fa in the current directory- Clone or download this repo.
- Run any script with
--helpfor full options, e.g.
python vcf_to_flanks_fasta.py --help - Use
ex2.vcfand the genome produced bymake_example_genome.pyto try the flank extraction pipeline.
Use and modify as needed for your projects.