| Literature DB >> 31271967 |
Charlotte A Darby1, James R Fitch2, Patrick J Brennan2, Benjamin J Kelly2, Natalie Bir2, Vincent Magrini3, Jeffrey Leonard4, Catherine E Cottrell3, Julie M Gastier-Foster3, Richard K Wilson3, Elaine R Mardis3, Peter White3, Ben Langmead5, Michael C Schatz6.
Abstract
Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%-50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license.Entities:
Keywords: Bioinformatics; Biological Sciences; Genomics
Year: 2019 PMID: 31271967 PMCID: PMC6609817 DOI: 10.1016/j.isci.2019.05.037
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1Schematic Representation of Somatic Mutations within a Phased Sample
(A) A mosaic mutation occurs on haplotype H2.
(B) Therefore, in linked-read sequencing, where short reads can be phased when linked reads overlap phased heterozygous variants, mosaic mutations manifest on reads from only one haplotype, here H2. Adapted from Figure 3 of Dou et al., 2018.
Figure 3Precision and Recall Calculated for Samovar, MuTect2, and MosaicHunter Variant Calls Stratified by Mosaic Allele Fraction (MAF) in the Whole-Genome Sequencing Data (WGS)
(A–D) (A) 30X coverage, precision; (B) 60X coverage, precision; (C) 30X coverage, recall; (D) 60X coverage, recall.
Figure 2Samovar Workflow
Precision (Prec), Recall (Rec), and F Score of Each Tool for the Synthetic Mosaic Variants Inserted by Bamsurgeon
| 30X Coverage | Samovar | MuTect2 | MosaicHunter | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tumor-Only | Paired | Tumor-Only | Paired | Trio | ||||||||||||||
| Prec | Rec | Prec | Rec | Prec | Rec | Prec | Rec | Prec | Rec | Prec | Rec | |||||||
| Autosomes | 84.0 | 30.1 | 44.4 | 3.0 | 83.2 | 5.7 | 60.8 | 91.4 | 73.0 | 31.5 | 5.1 | 8.8 | 79.2 | 20.7 | 32.8 | 70.4 | 20.7 | 32.0 |
| Exons | 84.0 | 28.3 | 42.4 | 3.6 | 85.3 | 7.0 | 60.1 | 92.0 | 72.7 | 35.0 | 7.1 | 11.8 | 82.1 | 30.8 | 44.8 | 73.7 | 30.8 | 43.4 |
| Genes | 84.9 | 30.1 | 44.4 | 3.2 | 84.4 | 6.2 | 63.0 | 92.0 | 74.8 | 32.6 | 5.7 | 9.7 | 79.9 | 22.7 | 35.4 | 71.2 | 22.7 | 34.5 |
| Enhancer | 88.5 | 31.0 | 45.9 | 3.9 | 86.7 | 7.5 | 72.9 | 92.3 | 81.4 | 37.8 | 5.9 | 10.1 | 85.5 | 29.5 | 43.8 | 80.2 | 29.5 | 43.1 |
| Promoter | 83.3 | 26.1 | 39.8 | 3.0 | 83.2 | 5.8 | 59.4 | 90.9 | 71.9 | 35.3 | 6.1 | 10.4 | 80.5 | 25.1 | 38.3 | 73.7 | 25.1 | 37.5 |
| Alu | 82.0 | 28.6 | 42.4 | 2.3 | 78.2 | 4.5 | 54.5 | 88.4 | 67.4 | 8.6 | 0.0 | 0.1 | 56.5 | 0.3 | 0.6 | 53.1 | 0.3 | 0.6 |
| RepeatMasker | 84.2 | 29.6 | 43.9 | 2.8 | 81.5 | 5.3 | 58.9 | 90.1 | 71.2 | 20.2 | 0.3 | 0.6 | 72.3 | 1.4 | 2.7 | 61.3 | 1.4 | 2.7 |
| Seg. Dup. | 25.6 | 10.4 | 14.8 | 1.3 | 56.9 | 2.5 | 18.4 | 62.8 | 28.5 | 6.6 | 0.5 | 0.9 | 39.3 | 1.7 | 3.2 | 29.1 | 1.7 | 3.2 |
| 60X coverage | Prec | Rec | Prec | Rec | Prec | Rec | Prec | Rec | ||||||||||
| Autosomes | 84.6 | 43.0 | 57.1 | 3.6 | 76.0 | 7.0 | 32.4 | 15.5 | 20.9 | 46.8 | 27.2 | 34.4 | ||||||
| Exons | 84.3 | 41.8 | 55.9 | 4.7 | 79.6 | 8.8 | 38.5 | 25.3 | 30.5 | 54.0 | 45.5 | 49.4 | ||||||
| Genes | 85.6 | 43.4 | 57.6 | 3.9 | 77.2 | 7.5 | 33.1 | 17.0 | 22.4 | 47.7 | 30.0 | 36.8 | ||||||
| Enhancer | 90.8 | 47.8 | 62.6 | 4.8 | 77.9 | 9.0 | 36.9 | 22.7 | 28.1 | 51.6 | 40.0 | 45.1 | ||||||
| Promoter | 85.4 | 40.7 | 55.2 | 4.0 | 76.8 | 7.6 | 38.5 | 21.1 | 27.3 | 56.4 | 40.5 | 47.2 | ||||||
| Alu | 81.1 | 42.9 | 56.1 | 3.0 | 68.0 | 5.7 | 16.5 | 0.2 | 0.5 | 31.7 | 0.5 | 1.0 | ||||||
| RepeatMasker | 84.2 | 42.2 | 56.2 | 3.4 | 74.1 | 6.4 | 24.7 | 1.0 | 1.9 | 38.3 | 1.8 | 3.4 | ||||||
| Seg. Dup. | 28.0 | 13.1 | 17.8 | 1.6 | 48.5 | 3.1 | 9.8 | 1.5 | 2.6 | 18.5 | 2.7 | 4.7 | ||||||
Number of Variant Calls in the Exome Capture Regions and Precision (Prec) Based on Supporting Reads Found in WES
| Case | Samovar | MuTect2 | MosaicHunter | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Full Model | Tumor-Only | Paired | Tumor-Only | Paired | ||||||
| Calls | Prec | Calls | Prec | Calls | Prec | Calls | Prec | Calls | Prec | |
| 1 | 22 | 23,216 | 0.03 | 406 | 0.45 | 202 | 0.63 | 144 | 0.62 | |
| 2 | 23 | 23,960 | 0.02 | 341 | 0.20 | 258 | 0.25 | 124 | 0.27 | |
| 3 | 42 | 23,866 | 0.02 | 359 | 0.34 | 177 | 0.45 | 68 | 0.66 | |
| 4 | 37 | 24,317 | 0.02 | 285 | 0.28 | 159 | 0.46 | 81 | 0.59 | |
| 5 | 21 | 24,036 | 0.01 | 321 | 0.33 | 170 | 0.45 | 69 | 0.70 | |
| 6 | 50 | 23,978 | 0.01 | 265 | 0.36 | 234 | 0.41 | 108 | 0.56 | |
| 7 | 23 | 23,905 | 0.02 | 245 | 0.29 | 88 | 0.63 | 58 | 0.78 | |
| 8 | 28 | 23,949 | 0.02 | 322 | 0.24 | 187 | 0.44 | 86 | 0.47 | |
| 9 | 25 | 24,893 | 0.02 | 276 | 0.31 | 185 | 0.46 | 78 | 0.56 | |
| 10 | 29 | 25,290 | 0.01 | 313 | 0.28 | 344 | 0.33 | 144 | 0.49 | |
| 11 | 22 | 0.70 | 24,043 | 0.02 | 284 | 0.41 | 105 | 0.75 | 83 | |
| 12 | 21 | 0.58 | 23,875 | 0.02 | 278 | 0.48 | 178 | 0.58 | 72 | |
| 13 | 15 | 0.71 | 23,663 | 0.02 | 268 | 0.35 | 112 | 0.76 | 66 | |
| Total | 358 | 312,991 | 3,963 | 2,399 | 1,181 | |||||
Samovar has the highest validation rate in 10 of the 13 cases. Bold indicates the highest precision for each pediatric case.
Figure 4WES Support for Pediatric Cancer Somatic Variant Calls
Plots show fraction of variant calls in exome capture region supported by WES data (black line, left axis ticks) and number of variant calls (gray bars, right axis ticks) stratified by mosaic allele fraction (MAF), combined for the 13 pediatric cancer cases studied. The panels show results for (A) MuTect2 Paired, (B) MuTect2 tumor-only, (C) MosaicHunter paired, (D) MosaicHunter tumor-only, and (E) Samovar.