| Literature DB >> 29382827 |
Tao Wang1,2,3, Chun Hui Bu4, Sara Hildebrand4, Gaoxiang Jia5,6, Owen M Siggs7, Stephen Lyon4, David Pratt4, Lindsay Scott4, Jamie Russell4, Sara Ludwig4, Anne R Murray4, Eva Marie Y Moresco4, Bruce Beutler8.
Abstract
Computational inference of mutation effects is necessary for genetic studies in which many mutations must be considered as etiologic candidates. Programs such as PolyPhen-2 predict the relative severity of damage caused by missense mutations, but not the actual probability that a mutation will reduce/eliminate protein function. Based on genotype and phenotype data for 116,330 ENU-induced mutations in the Mutagenetix database, we calculate that putative null mutations, and PolyPhen-2-classified "probably damaging", "possibly damaging", or "probably benign" mutations have, respectively, 61%, 17%, 9.8%, and 4.5% probabilities of causing phenotypically detectable damage in the homozygous state. We use these probabilities in the estimation of genome saturation and the probability that individual proteins have been adequately tested for function in specific genetic screens. We estimate the proportion of essential autosomal genes in Mus musculus (C57BL/6J) and show that viable mutations in essential genes are more likely to induce phenotype than mutations in non-essential genes.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29382827 PMCID: PMC5789985 DOI: 10.1038/s41467-017-02806-4
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Characteristics of mice and mutations analyzed. a Number of G3 mice descended from the G1 male founder of each pedigree. b Number of heterozygous or homozygous exomic mutations in each G3 mouse. c Number of mutations of each mutation classification bred to homozygosity in 0, 1, or >1 mouse. Mutations were non-synonymous coding and potential splicing changes (b, c)
Mutations analyzed in this study
| Mutation classification | All mutations | Isolated mutations (>100 Mb from nearest neighbor) | Mutations from pedigrees with ≥3 G3 mice | Mutations in known essential genes |
|---|---|---|---|---|
| Probably benign (score ≤0.45) | 26,004 | 2,406 | 2,311 | 477 |
| Possibly damaging (score 0.45–0.95) | 14,412 | 1,314 | 1,270 | 281 |
| Probably damaging (score 0.95–1.0) | 32,669 | 3,077 | 2,982 | 690 |
| Probably null class I | 5,170 | 462 | 441 | 78 |
| Probably null class II | 2,618 | 273 | 268 | 60 |
| Total | 80,873 | 7,532 | 7,272 | 1,586 |
The columns represent progressive filtering steps from left to right
Fig. 2Distribution of homozygous mutant mouse frequencies among G3 mice produced by heterozygous G1 × heterozygous G2 matings. a–e For mutations of the indicated PP2 categories and putative null mutations, the proportions of homozygous mutant G3 mice resulting from heterozygous G2 matings were plotted
Fig. 3Determination of the percentage of essential genes by comparison of real and simulated lethality data. a Boxplot of simulated data showing the number of genes carrying any type of mutation for which at least one homozygous mutant G3 mouse existed, for varying percentages of essential genes. For each essential gene percentage, sampling was performed five times and linear regression was used to fit all the sampled data as a function of essential gene percentage. Red box represents interquartile range, and whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range; thick black line represents median. Blue line indicates the true number of genes carrying any type of mutation for which at least one HOM mouse existed. Outliers are shown as individual data points. n = 1,105,575 mutations analyzed. b Cumulative plot of the number of genes carrying any type of non-synonymous coding and potential splicing mutation for which at least one homozygous mutant G3 mouse existed vs. number of mutations using real data (blue) and simulated data, assuming essential gene percentages of 0% (red) or 34% (yellow). All five simulations were averaged for plotting curves
Fig. 4Genome saturation by 119,452 ENU-induced mutations. The estimated probability of damage for each PP2 mutation class was incorporated into the calculation of genome saturation. Cumulative plot of genome saturation percentage vs. mutation number is shown for each specified cutoff number of G3 mice carrying truly damaging homozygous mutations. Mutations were non-synonymous coding and potential splicing changes