| Literature DB >> 23990802 |
Slavé Petrovski1, Quanli Wang, Erin L Heinzen, Andrew S Allen, David B Goldstein.
Abstract
A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.Entities:
Mesh:
Year: 2013 PMID: 23990802 PMCID: PMC3749936 DOI: 10.1371/journal.pgen.1003709
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1A regression plot illustrating the regression of Y on X.
The plot is annotated for the 2% extremes: red = 2% most intolerant, blue = 2% most tolerant. Five outlier genes with >140 common functional variant sites (y-axis) are not shown.
Summarizing the RVIS and RVIS-PP2 performance in settings of Online Mendelian Inheritance in Man (OMIM) and Mouse Genome Informatics (MGI) gene lists.
| OMIM disease genes | “recessive” | “HI” | “dominant negative” | “ | “HI” and “ | MGI ortholog “lethality” | MGI ortholog “seizure” | Essential Gene List | |
| Keyword search | 2329 | 881 | 202 | 387 | 507 | N/A | 99 | 99 | 2472 |
| No CCDS transcript | 46 | 15 | 2 | 4 | 10 | N/A | 1 | 1 | 28 |
| <70% NHLBI-ESP gene capture | 152 | 49 | 25 | 19 | 30 | N/A | 7 | 3 | 156 |
| Total (%) | 2131 (91.5%) | 817 (92.7%) | 175 (86.6%) | 364 (94.1%) | 467 (92.1%) | 108 | 91 (91.9%) | 95 (96.0%) | 2288 (92.6%) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Essential gene list was extracted from Georgi et al (2013) [ . Described in .
OMIM disease and “haploinsufficient (HI)” gene lists were further filtered. Described in .
To obtain the presented levels of significance, we used a logistic regression model to regress the presence or absence of a gene, within the corresponding gene list, on the residual variation intolerance score. We provide the corresponding RVIS beta coefficient and its 95% CI.
A Mann-Whitney U test comparing the RVIS of the corresponding gene list to the remaining non OMIM list genes (n = 14,712 genes). Non OMIM control list was revised For MGI lethality (n = 14,672), MGI seizure (n = 14,660), and Essential gene list (n = 13,179), to exclude genes overlapping with corresponding gene list.
Figure 2[A] Cumulative percentage plots for the residual variation intolerance scores among six OMIM lists. [B] ROC curves of the residual variation intolerance scores' capacity to predict the corresponding OMIM list.
Figure 3ROC curves of the residual variation intolerance scores' capacity to predict the corresponding independent gene-list.
Figure 4The proportion of genes explained by each of the 25-percentile bins (RVIS) for the human disease networks disorder class with the lowest “Developmental Disorders” and highest “Immunological Disorders” average residual variation intolerance score.
Figure 5The percentage of de novo mutations occurring in the most intolerant quartile (25th percentile) across the severe ID, autistic, epileptic encephalopathy, and control siblings, for the different variant effect types.
LGD = Likely Gene Disrupting (including nonsense, coding indels, and splice acceptor/donor site mutations). *Taking the CCDS of RVIS genes, 38% reflects the total real estate occupied by the 25th percentile most intolerant genes. P-values reflect binomial exact tests where the probability of success is adjusted to 0.38, accounting for the gene sizes of the 25% most intolerant genes.
Figure 62D plots illustrating possible utility of RVIS in conjunction with a variant-level quantitative score (PolyPhen-2) across cohorts with proposed de novo mutation genetic architectures.
Plots reflect the single most damaging de novo missense mutation in individuals with at least one de novo missense mutation: [A] Controls (n = 247); [B] Severe ID (n = 67); [C] Epileptic Encephalopathies (n = 134); [D] Autism Spectrum Disorders (n = 412). Full lists of missense de novo mutations in the “hot zone” are available in Dataset S3, including loss of function SNV mutations (not plotted).