PHGv2 Terminology¶
General terms¶
| Term | Definition |
|---|---|
| Reference genome | genome used for initial alignment and base coordinates |
| Reference range | segment of the reference genome |
| Haplotype | sequence of part of an individual chromosome with its start and stop defined by the reference range |
| Reference Haplotype | haplotype from the reference genome |
| Alternate genome | high quality genomes used to identify alternate haplotypes |
| Alternate haplotype | haplotype derived from a genome assembly |
| Composite genome | inferred genome based on its composite set of alternate and reference haplotypes |
| Haplotype ID | MD5 checksum for the haplotype sequence |
| Sample | genotype (haploid or diploid or higher), taxon, individual |
| Path | phased set of haplotype ids through the pangenome graph |
File types¶
| File Type | Acronym definition | Usage |
|---|---|---|
.agc |
Assembled Genomes Compressor | Efficient genome sequence compression. |
.bam |
Binary Alignment Map | Binary representation of a SAM file; useful for efficient processing. |
.bed |
Browser Extensible Data | Genomic feature coordinate (e.g. reference ranges) storage. |
.bcf |
Binary Call Format | Binary representation of a VCF file; useful for efficient processing. |
.fasta |
FAST-All | Sequence representation and storage. |
.g.VCF |
genomic VCF file | Variant and non-variant genomic storage. |
.h.VCF |
haplotype VCF file | Haplotype information representation and storage. More information can be found here. |
.maf |
Multiple Alignment Format | Multiple alignment storage; basis for gVCF and hVCF creation. |
.sam |
Sequence Alignment Map | Sequence alignment to a reference sequence. |
.vcf |
Variant Call Format | Genetic variant representation and storage. |
Software¶
| Software | Purpose |
|---|---|
| agc | Performant FASTA genome compression |
| AnchorWave | Sensitive aligner for genomes with high sequence diversity |
| bcftools | Utilities for indexing VCF data |
| samtools | bgzip compression for VCF data |
| TileDB | Performant storage core for array data |
| TileDB-VCF | API for storing and querying VCF data |