| Literature DB >> 32461613 |
Qingbo Wang1,2,3, Emma Pierce-Hoffman1, Beryl B Cummings1,2,4, Jessica Alföldi1,2, Laurent C Francioli1,2, Laura D Gauthier1,5, Andrew J Hill1,6, Anne H O'Donnell-Luria1,2, Konrad J Karczewski1,2, Daniel G MacArthur7,8,9,10.
Abstract
Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.Entities:
Mesh:
Year: 2020 PMID: 32461613 PMCID: PMC7253413 DOI: 10.1038/s41467-019-12438-5
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Definition and an example of MNVs, and validation of phasing sensitivity. a Definition and an example of an MNV. In this paper, an MNV is defined as two or more nearby variants existing on the same haplotype in the same individual. b Impact of MNVs in coding regions. The amino acid change caused by an MNV can be different from either of the individual single-nucleotide variants, which creates the potential for missannotation of the functional consequence of variants. c Graphical overview of the analysis of phasing sensitivity and specificity using trio samples from our gnomAD callset. We identified all heterozygous variant pairs that pass quality control (see the Methods section) and compared the phase information assigned by read-based phasing with that of trio-based phasing
Fig. 2Functional impact of MNVs. a The number of MNVs in the gnomAD exome data set per MNV category. Of the 1821 rescued nonsense mutations, 1538 are rescued in all individuals that harbor the original nonsense mutation and are used for the analysis in (b) and (c). Gained and rescued nonsense MNVs were further filtered to HC pLoF in (b) and (c). b The number of gained/rescued nonsense mutations per gene, and examples of disease-associated genes with two or more gained/rescued nonsense mutations. c The fraction of each category of MNV found in a set of 3941 constrained genes (top two deciles of constraint[22])
Fig. 3Mutational origins of MNVs. a Three major categories of the mutational origin of MNVs. (Left) A combination of single-nucleotide mutational events. Since the baseline global mutation rate is highly different between transversions and CpG and non-CpG transitions, even a simple combination of single-nucleotide mutational events could result in a highly skewed distribution of MNVs. (Center) One-step mutation caused by error-prone DNA polymerases. For this class of MNVs, since the two mutations occur at once during DNA replication, the allele frequency of the two constituent SNVs of the MNV is more likely to be equal. (Right) Polymerase slippage at repeat junctions. Mutation rates are highly elevated in repeat regions, and are therefore likely to cause various complex patterns of mutations, occasionally resulting in MNVs. b The log-scaled number of MNVs per substitution pattern. c The fraction of one-step MNVs per substitution pattern. Error bars represent standard error of the mean (often smaller than the dot size). d The fraction of MNVs that are in repetitive contexts, and bits representation[63] of sequence contexts. Error bars represent standard error of the mean. Colors in the bars in panels b–d represents the predicted major mechanism of MNVs for each substitution pattern
Fig. 4Distribution of MNVs across genome. a The number and the fraction of MNVs per origin, per substitution pattern. Gray are the estimated fraction of MNV originating from two single-nucleotide substitution events, brown for polymerase slippage at repeat contexts and purple are the others (presumably mainly replication error by pol-zeta). The colors along the bottom represent the estimated biological origins that dominate MNVs of that specific substitution pattern. b, c MNV density, defined as the number of MNVs per functional annotation divided by the base pair length in the annotation (relative to the whole-genome region), ordered by the methylation level of the functional category. d Estimated fraction of MNVs by different origins, per functional category around the coding region