The Practical Haplotype Graph

Optimized pangenome representation for plant breeding and genetics.

The Practical Haplotype Graph (PHG) is optimized for plant breeding and genetics, where genomic diversity can be high, phased haplotypes are common (e.g., inbred lines), and imputation with low-density markers is essential for breeding efficiency. This complements other imputation tools (e.g., BEAGLE) designed explicitly for handling samples from unphased species characterized by low genetic diversity and high-density genotyping.

Pangenomes for plant breeding and genetics.

Versatile uses cases for your research program.

The PHG is a graph-based trellis representation of consecutive genic and intergenic regions (called reference ranges), representing diversity across and between samples. It can be used to:

Create custom genomes for alignment
Call rare alleles
Impute genotypes
Efficiently store genomic data from many samples (i.e., reference, assemblies, and other lines)

With our improved code redesign and user interface, building a PHG database is quick and efficient. For example, using ~7-8 commands, we can build a pangenome database consisting of 58 sorghum assemblies in less than a day with minimal computational resources:

Efficient code. Quick turnarounds.

Community-defined standards.

Version 2 of the PHG leverages several community-defined formats and tools for enhanced productiviy and cross-compatibility. Tools included are TileDB-VCF (performant API for storing and querying VCF data), AnchorWave (sensitive aligner for genomes with high sequence diversity), agc (efficient FASTA genome compression), and the Breeding API (BrAPI) for easy exchange of genotype data between plant breeding applications: