| Literature DB >> 34946927 |
Francesco Musacchia1,2, Marianthi Karali1,3, Annalaura Torella4, Steve Laurie5, Valeria Policastro6,7, Mariateresa Pizzo1, Sergi Beltran5,8,9, Giorgio Casari1,10, Vincenzo Nigro1,4, Sandro Banfi1,4.
Abstract
Homozygous deletions (HDs) may be the cause of rare diseases and cancer, and their discovery in targeted sequencing is a challenging task. Different tools have been developed to disentangle HD discovery but a sensitive caller is still lacking. We present VarGenius-HZD, a sensitive and scalable algorithm that leverages breadth-of-coverage for the detection of rare homozygous and hemizygous single-exon deletions (HDs). To assess its effectiveness, we detected both real and synthetic rare HDs in fifty exomes from the 1000 Genomes Project obtaining higher sensitivity in comparison with state-of-the-art algorithms that each missed at least one event. We then applied our tool on targeted sequencing data from patients with Inherited Retinal Dystrophies and solved five cases that still lacked a genetic diagnosis. We provide VarGenius-HZD either stand-alone or integrated within our recently developed software, enabling the automated selection of samples using the internal database. Hence, it could be extremely useful for both diagnostic and research purposes.Entities:
Keywords: copy-number variation; homozygous deletion; rare diseases
Mesh:
Year: 2021 PMID: 34946927 PMCID: PMC8701221 DOI: 10.3390/genes12121979
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Precision/Recall/Specificity obtained by the tools with the 50 samples from 1KGP dataset. TP (True positives), TN (True Negatives), FN (False Negatives), FP (False Positives).
| Algorithm | TotalPutativeHZDel | TP | TN | FN | FP | Recall | Specificity | Precision | NewTP |
|---|---|---|---|---|---|---|---|---|---|
| ExomeDepth | 274 | 1 | 1 | 3 | 273 | 0.25 | 0.0036 | 0.0036 | 2 |
| VarGenius-HZD | 51 | 3 | 1 | 1 | 48 | 0.75 | 0.0204 | 0.0588 | 2 |
| HMZDelFinder | 10 | 1 | 1 | 3 | 9 | 0.25 | 0.10 | 0.10 | 1 |
| DECoN | 267 | 1 | 1 | 3 | 266 | 0.25 | 0.0037 | 0.0037 | 1 |
Precision/recall/specificity obtained by the tools used with the synthetic HD test.
| Algorithm | TotalCalls | TotalFiltered | TP | TN | FP | FN | Recall | Specificity | Precision |
|---|---|---|---|---|---|---|---|---|---|
| HMZDelFinder | NA | 4 | 4 | 0 | 0 | 1 | 0.8 | 0 | 1 |
| VarGenius-HZD | 4201 | 6 | 5 | 0 | 1 | 0 | 1 | 0 | 0.83 |
| DECoN | 38234 | 45 | 0 | 0 | 45 | 5 | 0 | 0 | 0 |
| ExomeDepth | 3949 | 3 | 0 | 0 | 3 | 5 | 0 | 0 | 0 |
Figure 1Experimental validation of the HD detected with our algorithm in the RAX2 gene. (A) Coverage heatmap of retinopathy genes in the WES data of IRD patients. Patient samples are shown in the x axis and gene names on the y axis. The extent of coverage is plotted according to the reported color scale. The RAX2 gene is well covered across all individuals but poorly covered in A392 (asterisk in the framed column). (B) IGV coverage tracks for the alignment file from patient A392 (upper track) and a control patient (lower track). The lack of reads spanning the exon 2 of RAX2 in A392 (green box) suggested that the corresponding region was deleted in both alleles of the analyzed proband. (C) PCR amplification of the genomic region spanning the identified deletion in the proband’s genomic DNA (‘Patient DNA’) and in a control DNA sample. The difference in size between the two amplicons (red arrowheads) indicates the presence of an extensive HD in the proband.
Availability of SNV/CNV analysis automation in existing open-source software.
| Software | SNV/Indel Calling | CNV Calling | Scalability | Automated Dataset Creation |
|---|---|---|---|---|
| bcbio | yes | yes | yes | no |
| Nf-sarek | yes | yes | yes | no |
| Hpexome | yes | no | yes | no |
| HemoMIPs | yes | no | yes | no |
| Swift/T | yes | no | yes | no |
VRCIRD cases resolved by detection of a HD.
| Sample | Gene | Region | XHMM | ExomeDepth (BF) | VarGenius-HZD | HMZDelFinder |
|---|---|---|---|---|---|---|
| ID_A739 |
| 19:3772155-3772224 | NO | 7.4 | YES | YES |
| CREv1_A392 |
| 19:3771519-3772224 | NO | 11 | YES | YES |
| CREv1_A348 |
| X:46719422-46719537 | NO | 7.8 | YES | YES |
| CREv1_ARRP129 |
| X:46719424-46719537 | NO | 9 | YES | YES |
| ID_A860 |
| X:38186587-38186793 | NO | 9 | YES | NO |
Summary of samples used from the VRCIRD cohort.
| Platform | CREv1 | CCP | ID | Total | Solved Cases | Examined |
|---|---|---|---|---|---|---|
| NextSeq500 | 14 | 51 | 123 | 188 | 124 | 64 |
Figure 2VarGenius-HZD workflow. The workflow of our algorithm consists of three steps: 1. sample selection, which is automated in VarGenius software and manual in the stand-alone version; 2. pre-processing, which includes the generation of NCEs files and raw DoC information; 3. rare-HD detection step, which involves the calculation of NCE frequencies, the detection of putative HDs, and the annotation of such regions for variant prioritization.
Figure 3VarGenius-HZD algorithm and results illustration. VarGenius-HZD leverages BoC along with DoC, which is used as follows: (A) the target BED used for the sequencing is intersected with UCSC exon intervals to obtain an exon-on-target file, which is used to compute the BoC and DoC exploiting bedtools coverage. (B) NCEs for each sample are counted, and only those with frequency lower or equal to 2 are retained as putative HDs (e.g., exon 4 in (B)). (C) The tabular output contains statistics for putative HDs: chromosome, start and end, the BoC for the subject sample, the DoC for the parents (FDoC and MDoC), and average exon DoC for the overall dataset.
The 5 simulated HDs inserted in samples of 1KGP dataset.
| Sample | Chr | Start | End |
|---|---|---|---|
| NA06989 | 21 | 48063447 | 48063551 |
| NA07347 | 21 | 27326904 | 27327003 |
| NA12058 | 21 | 35091133 | 35091161 |
| NA12748 | 21 | 10906904 | 10907040 |
| NA12830 | 21 | 40188932 | 40189015 |
Figure 4Flowchart of CNV detection and annotation pipeline in VarGenius. This is performed using XHMM, ExomeDepth, and the VarGenius-HZD algorithm. Several unrelated samples must be used for such analyses; thus VarGenius collects sample identifiers from the database querying for samples sequenced with the same target and considering the kinship. XHMM requires the use of GATK DepthOfCoverage with specific parameters. This is called for all samples parallelizing the execution within the cluster. Once all tools produced their calls, results are merged within a unique tabular output and are annotated using AnnotSV.