| Literature DB >> 29263817 |
Aaron C Noll1,2,3, Neil A Miller1, Laurie D Smith1,2,4, Byunggil Yoo1, Stephanie Fiedler5, Linda D Cooley4,5, Laurel K Willig1,3,4, Josh E Petrikin1,3,4, Julie Cakici6, John Lesko1, Angela Newton1, Kali Detherage1, Isabelle Thiffault1,4,5, Carol J Saunders1,4,5, Emily G Farrow1,3,4, Stephen F Kingsmore2,6.
Abstract
Optimal management of acutely ill infants with monogenetic diseases requires rapid identification of causative haplotypes. Whole-genome sequencing (WGS) has been shown to identify pathogenic nucleotide variants in such infants. Deletion structural variants (DSVs, >50 nt) are implicated in many genetic diseases, and tools have been designed to identify DSVs using short-read WGS. Optimisation and integration of these tools into a WGS pipeline could improve diagnostic sensitivity and specificity of WGS. In addition, it may improve turnaround time when compared with current CNV assays, enhancing utility in acute settings. Here we describe DSV detection methods for use in WGS for rapid diagnosis in acutely ill infants: SKALD (Screening Konsensus and Annotation of Large Deletions) combines calls from two tools (Breakdancer and GenomeStrip) with calibrated filters and clinical interpretation rules. In four WGS runs, the average analytic precision (positive predictive value) of SKALD was 78%, and recall (sensitivity) was 27%, when compared with validated reference DSV calls. When retrospectively applied to a cohort of 36 families with acutely ill infants SKALD identified causative DSVs in two. The first was heterozygous deletion of exons 1-3 of MMP21 in trans with a heterozygous frame-shift deletion in two siblings with transposition of the great arteries and heterotaxy. In a newborn female with dysmorphic features, ventricular septal defect and persistent pulmonary hypertension, SKALD identified the breakpoints of a heterozygous, de novo 1p36.32p36.13 deletion. In summary, consensus DSV calling, implemented in an 8-h computational pipeline with parameterised filtering, has the potential to increase the diagnostic yield of WGS in acutely ill neonates and discover novel disease genes.Entities:
Year: 2016 PMID: 29263817 PMCID: PMC5685307 DOI: 10.1038/npjgenmed.2016.26
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Software tools evaluated for performance in detection of DSVs in WGS
| Breakdancer | PEM | PASS | PASS |
| Clever | Read alignment graph and max cliques | PASS | |
| Cn.MOPS | DOC and Poisson distribution | ||
| Control-freec | SNP B allele frequencies and DOC | PASS | |
| Dindel | Realignment with probabilistic indel calls | ||
| ERDS | DOC and paired Hidden Markov Model | PASS | |
| GasvPRO | DOC, PEM and probabilistic model | ||
| GenomeStrip | DOC, PEM and SRM | PASS | PASS |
| Lumpy | PEM and DOC (SRM with special aligner) | PASS | |
| SVDetect | DOC, PEM |
Abbreviations: BD, breakdancer; DOC, depth of coverage; ERDS, estimation by read depth with single-nucleotide variants; GS, GenomeStrip; PEM, paired end mapping; SNP, single nucleotide polymorphism; SRM, split read mapping; TP, true positive; WGS, whole-genome sequence.
Wide differences in the performance of the five SV detection tools that detected a true positive in a simulated Chr 1 DSV data set
| Breakdancer | 102 | 24 | 168 | 37.78% | 81.0% | 42.3% |
| Clever | 32 | 1,683 | 238 | 11.9% | 1.9% | 5.7% |
| Control-freec | 5 | 449 | 265 | 1.9% | 1.1% | 1.6% |
| ERDS | 149 | 1,204 | 121 | 55.2% | 11.0% | 30.6% |
| GenomeStrip | 146 | 673 | 124 | 54.1% | 17.8% | 38.4% |
| Lumpy | 247 | 526,524 | 23 | 91.5% | 0.05% | 0.2% |
Abbreviations: DSV, deletion structural variant; ERDS, estimation by read depth with single-nucleotide variants; FN, false negative; FP, false positive; TP, true positive.
DSV predictions that overlapped a DSV by >1 nt.
TP/(TP+FN).
TP/(TP+FP).
Figure 1Performance of five DSV detection tools as determined by reciprocal overlap of predictions from three iterations on one of three WGS simulations. Shown are true positives (TP, a), false positives (b), false negatives (c), sensitivity (recall, d), positive predictive value (precision, e) and F2 measure (f). ERDS did not yield any TP DSVs in this simulation. Similar results were observed for two other WGS simulations (data not shown).
Figure 2Development of a DSV prediction classifier with reference data. (a) Concordance of confirmed DSVs among three published NA12878 reference DSV sets (Mills et al.,[40] Layer et al.[41]and Zook et al.[42]). DSVs were considered concordant if their chromosomal coordinates had a reciprocal overlap of at least 50%. About 33% of DSV calls were common to all three sets; 49% were common to more than two sets. Rates of concordance were lower if the overlap requirement was increased. (b) Process employed to develop a DSV prediction classifier. FN, false negative; FP, false positive; TP, true positive. Numbers shown represent a 50% reciprocal intersection between the union of DSVs from Mills et al.,[13] Layer et al.[41] and Zook et al.,[42] and the union of GS and BD calls for NA12878 technical replicates. TPs were defined as calls which had a ⩾50% reciprocal overlap between Mills U Layer U Zook and GS U BD. FNs were defined as Mills_Layer_Zook DSVs that were not found to have a ⩾50% reciprocal overlap with GS U BD calls. FPs were defined as GS U BD calls not found to have a 50% reciprocal overlap with Mills_Layer_Zook DSVs.
Performance of the random forest classifier on NA12878 replicates
| 1, BD U GS unfiltered | 1,780 | 2,861 | 7,958 | 0.38 | 0.18 | 0.31 |
| 1, Random forest filtered | 1,262 | 292 | 8,635 | 0.81 | 0.13 | 0.39 |
| 2, BD U GS unfiltered | 2,409 | 4,212 | 7,525 | 0.36 | 0.24 | 0.33 |
| 2, Random forest filtered | 1,743 | 405 | 8,133 | 0.81 | 0.18 | 0.47 |
| 3, BD U GS unfiltered | 3,808 | 18,272 | 4,553 | 0.17 | 0.46 | 0.20 |
| 3, Random forest filtered | 3,198 | 2,867 | 5,161 | 0.53 | 0.38 | 0.49 |
| 4, BD U GS unfiltered | 3,521 | 3,852 | 5,452 | 0.48 | 0.39 | 0.46 |
| 4, Random forest filtered | 3,400 | 67 | 5,535 | 0.98 | 0.38 | 0.75 |
| Average, BD U GS unfiltered | 2,880 | 7,299 | 6,372 | 0.35 | 0.32 | 0.32 |
| Average, filtered | 2,401 | 908 | 6,866 | 0.78 | 0.27 | 0.52 |
Abbreviations: BD, breakdancer; GS, GenomeStrip; WGS, whole-genome sequence.
Overlap in BD and GS DSV predictions in three sets of WGS and samples U173 and pg96, showing that filtering increased the r1∩50%r2∩50%r3 proportion
| Run 1 | 10,139 | 1,384 | 6,340 | 1,298 |
| Run 2 | 11,335 | 1,542 | 24,033 | 2,459 |
| Run 3 | 6,813 | 988 | 2,581 | 1,587 |
| r1∩50%r2 | 12,433 | 537 | 2,664 | 1,674 |
| r2∩50%r3 | 444 | 341 | 784 | 193 |
| r1∩50%r3 | 1,097 | 1,154 | 925 | 544 |
| r1∩50%r2∩50%r3 | 237,733 | 94,335 | 8,259 | 3,468 |
| Total DSV calls | 279,994 | 100,281 | 45,586 | 11,223 |
| r1∩50%r2∩50%r3 as % of total | 85% | 94% | 18% | 31% |
Abbreviations: BD, breakdancer; DSV, deletion structural variant; GS, GenomeStrip; WGS, whole-genome sequence.
r1∩50%r2: DSVs called by BD and GS in run 1 and run 2 with >50% overlap in chromosomal coordinates.
Figure 3A flow diagram of the SKALD pipeline and downstream analysis for detection of likely disease-causative DSVs in WGS. After reads were aligned, GS and BD were executed concurrently on bam files from parent–child trios. Filter attributes, overlap % and annotations were obtained for each BD U GS DSV prediction. Since genes that were commonly deleted were unlikely to be deleterious, the population frequency of DSVs overlapping genes helped determine whether the DSV was likely to cause a rare genetic disease. Finally to identify likely pathogenic compound heterozygote states, any SNVs or indels overlapping a DSV were included as part of the SKALD output in the form of a tab separated text file.