| Literature DB >> 29914369 |
Xin Zhou1, Serafim Batzoglou1, Arend Sidow2,3, Lu Zhang4,5.
Abstract
BACKGROUND: De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls.Entities:
Keywords: De novo mutation; Haploid genotype; Linked read sequencing; Phasing
Mesh:
Year: 2018 PMID: 29914369 PMCID: PMC6006847 DOI: 10.1186/s12864-018-4867-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Workflow of HAPDeNovo. Software is in brackets. HP: haplotype
The performance of FreeBayes, TrioDeNovo, GATK, and DeNovoGear with sequencing depth threshold 15 and their optimal parameters (GL = − 50 for FreeBayes, DQ = 7 for TrioDeNovo, PL = 450 for GATK, and PP = 1E-4 for DeNovoGear)
| Optimal-Paras | HAPDeNovo | HAPDeNovo | ||||||
|---|---|---|---|---|---|---|---|---|
| TP | FP | TP | FP | HC | LC | |||
| TP | FP | TP | FP | |||||
| FreeBayes | 44 | 5785 | 44 | 1083 | 33 | 219 | 11 | 864 |
| TrioDeNovo | 44 | 3673 | 44 | 674 | 33 | 124 | 11 | 550 |
| GATK | 43 | 242,487 | 43 | 1396 | 32 | 258 | 11 | 1138 |
| DeNovoGear | 43 | 89,187 | 43 | 1241 | 32 | 246 | 11 | 995 |
After further applying HAPDeNovo, the number of false positives decreases significantly for all four inputs. HAPDenovo also calculates the confidence of DNMs. A high proportion of TP (33/44, 32/43) comes from high-confidence DNMs. TP (True Positive): number of de novo mutations in both candidate set and the gold standard. FP (False Positive): number of mutations in the candidate set but not in the gold standard. HC (High Confidence): high-confidence DNMs. LC (Low Confidence): low-confidence DNMs
Fig. 2ROC curves of DNMs called by FreeBayes, TrioDeNovo, GATK, and DeNovoGear by optimal parameter setting, and the improved ROC curves after applying HAPDeNovo (red line). Sequencing depth threshold is varied from 10 (start of each plot line, leftmost point) to 30 (end of each plot line, rightmost point). FP (False Positive): Number of false positive DNMs. Sensitivity: Number of true positive DNMs divided by the total number of true positive plus false negative DNMs. GL: Genotype Likelihood; DQ: De Novo Quality; PL: Posterior Likelihood; PP: Posterior Probability. Blue curves show the sensitivity and number of FPs at default settings (no GL and no PL thresholds for FreeBayes and GATK, respectively). Green curves show the sensitivity and number of FP at optimal parameter settings (GL = −50 for FreeBayes, DQ = 7 for TrioDeNovo, PL = 450 for GATK, and PP = 3E-5 for DeNovoGear). Red curves show the performance after applying HAPDeNovo
Comparing the performance of TrioDeNovo, Long Ranger and HAPDeNovo (using TrioDeNovo as input) as a function of sequencing depth ranging from 10 to 20. DQ = 7 was used as the quality threshold for TrioDeNovo
| Depth | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TrioDeNovo | TP | 44 | 44 | 44 | 44 | 44 | 44 | 43 | 42 | 41 | 39 | 36 |
| FP | 3932 | 3926 | 3923 | 3862 | 3789 | 3673 | 3532 | 3410 | 3250 | 3106 | 2969 | |
| Long Ranger | TP | 44 | 44 | 44 | 44 | 44 | 44 | 43 | 42 | 41 | 39 | 36 |
| FP | 3911 | 3907 | 3904 | 3844 | 3771 | 3655 | 3515 | 3394 | 3235 | 3092 | 2956 | |
| HAPDeNovo | TP | 44 | 44 | 44 | 44 | 44 | 44 | 43 | 42 | 41 | 39 | 36 |
| FP | 768 | 766 | 765 | 744 | 715 | 674 | 626 | 593 | 558 | 525 | 496 |
TP (True Positive): Number of DNMs in candidate set plus gold standard. FP (False Positive): Number of DNMs in the candidate set but not in the gold standard
Comparing the performance of HAPDeNovo using HP1 and HP2 only versus HP0, HP1 and HP2 (both with TrioDeNovo as input), with DQ = 7 and sequencing depth changing from 10 to 20
| Depth | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TrioDeNovo | TP | 44 | 44 | 44 | 44 | 44 | 44 | 43 | 42 | 41 | 39 | 36 |
| FP | 3932 | 3926 | 3923 | 3862 | 3789 | 3673 | 3532 | 3410 | 3250 | 3106 | 2969 | |
| HAPDeNovo | TP | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 39 | 39 | 37 | 35 |
| by HP1, HP2 | FP | 605 | 604 | 603 | 584 | 557 | 521 | 481 | 453 | 423 | 394 | 372 |
| HAPDeNovo | TP | 44 | 44 | 44 | 44 | 44 | 44 | 43 | 42 | 41 | 39 | 36 |
| by HP1, HP2, HP0 | FP | 768 | 766 | 765 | 744 | 715 | 674 | 626 | 593 | 558 | 525 | 496 |
TP (True Positive): Number of DNMs in candidate plus gold standard. FP (False Positive): Number of DNMs in the candidate set but not in the gold standard