PHGv2 Terminology¶
General terms¶
Term | Definition |
---|---|
Reference genome | genome used for initial alignment and base coordinates |
Reference range | segment of the reference genome |
Haplotype | sequence of part of an individual chromosome with its start and stop defined by the reference range |
Reference Haplotype | haplotype from the reference genome |
Alternate genome | high quality genomes used to identify alternate haplotypes |
Alternate haplotype | haplotype derived from a genome assembly |
Composite genome | inferred genome based on its composite set of alternate and reference haplotypes |
Haplotype ID | MD5 checksum for the haplotype sequence |
Sample | genotype (haploid or diploid or higher), taxon, individual |
Path | phased set of haplotype ids through the pangenome graph |
File types¶
File Type | Acronym definition | Usage |
---|---|---|
.agc |
Assembled Genomes Compressor | Efficient genome sequence compression. |
.bam |
Binary Alignment Map | Binary representation of a SAM file; useful for efficient processing. |
.bed |
Browser Extensible Data | Genomic feature coordinate (e.g. reference ranges) storage. |
.bcf |
Binary Call Format | Binary representation of a VCF file; useful for efficient processing. |
.fasta |
FAST-All | Sequence representation and storage. |
.g.VCF |
genomic VCF file | Variant and non-variant genomic storage. |
.h.VCF |
haplotype VCF file | Haplotype information representation and storage. More information can be found here. |
.maf |
Multiple Alignment Format | Multiple alignment storage; basis for gVCF and hVCF creation. |
.sam |
Sequence Alignment Map | Sequence alignment to a reference sequence. |
.vcf |
Variant Call Format | Genetic variant representation and storage. |
Software¶
Software | Purpose |
---|---|
agc | Performant FASTA genome compression |
AnchorWave | Sensitive aligner for genomes with high sequence diversity |
bcftools | Utilities for indexing VCF data |
samtools | bgzip compression for VCF data |
TileDB | Performant storage core for array data |
TileDB-VCF | API for storing and querying VCF data |