Convenience commands¶
In addition to the primary commands for the build, imputation, and resequencing pipelines, PHGv2 also provides a suite of "convenience commands" for miscellaneous "quality of life (QoL)" improvements. In this document, we will discuss the currently available external commands for performing highly used tasks.
Convert gVCF files to hVCF files¶
Create hVCF files from existing gVCF files created by the PHG
Command - gvcf2hvcf
Example
phg gvcf2hvcf \
--bed my/bed/file.bed \
--reference-file my/updated/ref/fasta.fa \
--gvcf-dir gvcf/directory \
--db-path my/phg/db
Parameters
Parameter name | Description | Default value | Required? |
---|---|---|---|
--bed |
BED file with entries that define the haplotype boundaries. | "" |
|
--gvcf-dir |
Directory containing bgzipped and CSI indexed gVCF files. | "" |
|
--reference-file |
Path to local Reference FASTA file. | "" |
|
--conda-env-prefix |
Prefix for the Conda environment to use. If provided, this should be the full path to the Conda environment. | Current active Conda environment | |
--db-path |
Folder name where TileDB datasets and AGC record is stored. If not provided, the current working directory is used. | Current working dir |
Convert hVCF files to gVCF files¶
Create gVCF files from existing hVCF files created by the PHG
Command - hvcf2gvcf
Example
phg hvcf2gvcf \
--reference-file my/updated/ref/fasta.fa \
--hvcf-dir hvcf/directory \
--db-path my/phg/db
--output-dir output/directory/for/gvcfs
Parameters
Parameter name | Description | Default value | Required? |
---|---|---|---|
--hvcf-dir |
Path to directory holding hVCF files. Data will be pulled directly from these files instead of querying TileDB. | "" |
|
--reference-file |
Path to local Reference FASTA file. | "" |
|
--conda-env-prefix |
Prefix for the Conda environment to use. If provided, this should be the full path to the Conda environment. | Current active Conda environment | |
--db-path |
Folder name where TileDB datasets and AGC record is stored. If not provided, the current working directory is used. | Current working dir | |
--output-dir |
Output directory for the gVCF files. If not provided, the current working directory is used. | Current working dir |
Merge gVCF files¶
Merge multiple gVCF files into a single gVCF file
Command - merge-gvcfs
Example
Parameters
Parameter name | Description | Default value | Required? |
---|---|---|---|
--input-dir |
Path to input gVCF file directory. | "" |
|
--output-dir |
Path and/or filename for merged gVCF file. | "" |
Merge hVCF files¶
Merge multiple hVCF files into a single hVCF file
Command - merge-hvcfs
Example
phg merge-hvcfs \
--input-dir my/hvcf/directory \
--output-file output/merged_hvcfs.h.vcf \
--id-format CHECKSUM \
--reference-file \
--range-bedfile
Parameters
Parameter name | Description | Default value | Required? |
---|---|---|---|
--input-dir |
Path to input hVCF file directory. | "" |
|
--output-dir |
Path and/or filename for merged hVCF file. | "" |
|
--id-format |
ID format for hVCF files. Options are: CHECKSUM or RANGE_SAMPLE_GAMETE (see notes for further details). |
RANGE_SAMPLE_GAMETE |
|
--reference-file |
Path to reference FASTA file. | "" |
|
--range-bedfile |
Path to reference range BED file. | "" |
Note - id-fomat
If you select CHECKSUM
for the --id-format
parameter, the ID
values will be MD5 checksums in the ##ALT
header:
If you select RANGE_SAMPLE_GAMETE
, the ID
values will change
to a reference range/sample/gamete
ID format: