| Literature DB >> 33781308 |
Yifan Wang1,2, Taejeong Bae3, Jeremy Thorpe4, Maxwell A Sherman5,6, Attila G Jones7,8, Sean Cho9,10,11, Kenneth Daily12, Yanmei Dou5, Javier Ganz13,14,15, Alon Galor5, Irene Lobon16,17, Reenal Pattni18,19, Chaggai Rosenbluh7, Simone Tomasi20, Livia Tomasini20, Xiaoxu Yang21,22, Bo Zhou18,19, Schahram Akbarian23,24, Laurel L Ball21,22, Sara Bizzotto13,14,15, Sarah B Emery1, Ryan Doan13,14,15, Liana Fasching20, Yeongjun Jang3, David Juan16, Esther Lizano16, Lovelace J Luquette5, John B Moldovan1, Rujuta Narurkar25, Matthew T Oetjens1, Rachel E Rodin13,14,15, Shobana Sekar3, Joo Heon Shin25,26, Eduardo Soriano17,27,28,29, Richard E Straub25, Weichen Zhou2, Andrew Chess7,8,23,30, Joseph G Gleeson21,22, Tomas Marquès-Bonet16,31,32,33, Peter J Park5, Mette A Peters12, Jonathan Pevsner9,10, Christopher A Walsh13,14,15, Daniel R Weinberger10,25,26,34,35, Flora M Vaccarino20,36, John V Moran1,37, Alexander E Urban18,19,38, Jeffrey M Kidd1,2, Ryan E Mills1,2, Alexej Abyzov39.
Abstract
BACKGROUND: Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells.Entities:
Mesh:
Year: 2021 PMID: 33781308 PMCID: PMC8006362 DOI: 10.1186/s13059-021-02285-3
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Assessment of existing tools to detect simulated mosaic SNVs in DNA mixing experiments (a, b) or candidate somatic SNVs in the common reference brain sample (c, d). a Genomic DNAs from four commonly used human lymphoblastoid cell lines were mixed at various proportions (x-axis) and subjected to WGS; germline SNPs from the cell lines are present at a range of allele frequencies (y-axis) in the different mixes and act as a proxy for mosaic SNVs. b VAF of simulated SNVs (x-axis) vs. sensitivity of detection (y-axis) for the three described SNV callers. c A Venn diagram demonstrating that existing tools are widely discordant in their ability to call mosaic SNVs present in the common brain sample. d The distribution of candidate SNV VAFs (x-axis) and the numbers of candidate mosaic SNV calls (y-axis) detected by existing tools
Fig. 2Overview of the mosaic SNV discovery and validation pipeline. a WGS or WES datasets were generated by six BSMN working groups using a commonly shared, homogenized DLPFC sample from a neurotypical individual and isogenic dural fibroblasts. Six different analytical methods initially were used to call mosaic SNVs. WGS data also was generated from sorted NeuN+ and NeuN− cells from DLPFC, cerebellum, and dura mater samples. Chromium 10X linked read sequencing data was generated from DLPFC and dural fibroblast samples. Single-cell WGS sequencing was conducted on twelve NeuN+ neurons from the DLPFC. These datasets were used to validate mosaic SNVs. b Overlap of putative mosaic SNV calls using different analytical approaches. Indicated are the numbers of mosaic SNV calls (x-axis) and the numbers of mosaic SNV calls identified using different analytical approaches (y-axis; circles with connecting lines indicate candidate SNVs identified by multiple approaches). c Candidate SNVs were subject to validation experiments using four complementary approaches. d Rationale of the empirical substitution error model applied to validate mosaic SNVs in PCR amplicon-based sequencing experiments. e An example of the empirical nucleotide error profiles encountered in a PCR amplicon-based sequencing experiment. Shown is the cumulative fraction of sites (x-axis) and per site noise levels (y-axis)
Categories of validated mosaic SNVs
| Categories | Definition | Number of mosaic SNVs selected for validation (out of total) | Number of validated mosaic SNVs |
|---|---|---|---|
| Multi-calls | Identified by multiple approaches, supportive evidence from multiple data sources | 45 out of 45 | 33 |
| Approach singletons | Identified by one approach, supportive evidence from multiple data sources | 311 out of 1101 | 10 |
| Data source singletons | Identified by multiple approaches, supportive evidence from one data source | 4 out of 4 | 0 |
| Absolute singletons | Identified by one approach, supportive evidence from one data source | 40 out of 148 | 0 |
Fig. 3Summary of validation results for 400 candidate mosaic SNVs. Vertical lines represent candidate mosaic SNVs. Shaded rectangles to the right of the figure provide the keys to interpret the shading presented for each candidate SNV. There was concordance in true-positive mosaic SNV calls (PASS; green rectangle at bottom of figure) in multiple datasets and secondary validation experiments. Chromium linked read haplotype phasing and single-cell sequencing datasets also were effective in supporting a subset of bona fide mosaic SNV calls. By comparison, the VAFs of false-positive calls (red rectangle) are inconsistent across different datasets and often occur within or near insertion/deletion (indel) mutations, short tandem repeat sequences (STRs), homopolymeric nucleotide stretches, or copy number variants (CNVs). Importantly, the panel of normal (PON) filter, but not the comparison to WGS data from a control sample (i.e., to NA12878), was highly effective at identifying contaminating false-positive SNV calls (orange rectangle) and germline SNPs (gray rectangle). We lacked sufficient data to evaluate a subset of candidate SNVs (purple rectangle, NED—not enough data). The two green triangles at the top of the figure denote mosaic SNVs that validation experiments deemed to be false-positive calls; however, cell lineage analyses demonstrated that they are likely bona fide mosaic SNVs (see text and Fig. 4)
Fig. 4Best practices workflow to call mosaic variants. a Schematic of the filtering strategies used to call mosaic SNVs using WGS and WES data. b GATK (at different ploidy settings) and Mutect2 were used to call simulated mosaic SNVs at different VAFs (x-axis) and sensitivities (y-axis) in DNA mixing experiments. c Reconstructed cell lineage trees using a cohort of mosaic SNVs (Table S2) present in eleven single-cell datasets from the common brain sample. Indicated are the names of each SNV (SNV1, SNV2, etc.) and the estimated SNV VAFs (from 250× WGS data). d SNVs marking the L1 (x-axis) and L2 (y-axis) lineages show anti-correlated VAFs across multiple brain and tissue samples, suggesting these SNVs differentiate the earliest cell lineages in this sample. Solid line, linear regression of the SNV anti-correlation values across all samples. Shaded areas is the corresponding 95% confidence intervals. Dashed line, linear regression of the SNV anti-correlation values when only brain samples are included in the analysis.