| Literature DB >> 26642228 |
István Bartha1,2, Antonio Rausell1,3, Paul J McLaren1,2, Pejman Mohammadi1,4, Manuel Tardaguila1,3, Nimisha Chaturvedi1,2, Jacques Fellay1,2, Amalio Telenti5.
Abstract
Sequencing projects have identified large numbers of rare stop-gain and frameshift variants in the human genome. As most of these are observed in the heterozygous state, they test a gene's tolerance to haploinsufficiency and dominant loss of function. We analyzed the distribution of truncating variants across 16,260 autosomal protein coding genes in 11,546 individuals. We observed 39,893 truncating variants affecting 12,062 genes, which significantly differed from an expectation of 12,916 genes under a model of neutral de novo mutation (p<10-4). Extrapolating this to increasing numbers of sequenced individuals, we estimate that 10.8% of human genes do not tolerate heterozygous truncating variants. An additional 10 to 15% of truncated genes may be rescued by incomplete penetrance or compensatory mutations, or because the truncating variants are of limited functional impact. The study of protein truncating variants delineates the essential genome and, more generally, identifies rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26642228 PMCID: PMC4671652 DOI: 10.1371/journal.pcbi.1004647
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Observed and expected PTVs in the study population.
A: Fraction of genes with at least one stop-gain or frameshift variant as a function of the number of sampled PTVs. The gray curve shows the expected number of genes under a model of neutral de novo mutation rate [12] representing the null hypothesis (no deleterious effects). The green curve shows the number of genes observed with at least one PTV. The orange curve limits the number of observed genes to those hosting highly damaging variants [13]. The purple curve shows the predicted number of genes with at least one PTV under the estimated best-fit parameters under model A–bootstrap replicas of this fit is shown by pale gray (see Methods). B: Extrapolation of the observed number of genes with at least one PTV assuming a model that includes the possibility of finding PTVs due to biological and technical noise. The purple curve shows the predicted number of genes with at least one PTV under the estimated best-fit parameters, while the green curve shows the observed data. Decomposition of the observed and predicted number of genes with at least one PTV: variants in non-haploinsufficient genes (blue) saturate early; variants found in haploinsufficient genes (red) continue to accumulate PTVs due to the constant contribution of biological and technical noise.
Characteristics of the subset of genes (n = 4,204) observed without PTVs after sequencing 16,260 protein coding autosomal genes in 11,546 individuals.
Tests compare genes with and without heterozygous PTVs.
| Annotation | Effect in non-truncated genes | P-value | Test | Data Source |
|---|---|---|---|---|
| dN/dS | Lower (conservation) | 1E-295 | Rank-sum test | Ensembl primate genomes[ |
| Paralog count | Lower | 4E-94 | Poisson regression | Ensembl Biomart |
| Loss of cell viability (CRISPR-Cas9) | Enrichment | 3E-16 | Logistic regression | Shalem et al. 2014 [ |
| Part of a protein complex | Enrichment | 3E-29 | Logistic regression | Gene Ontology term “Protein complex” GO:0043234 |
| Essentiality | Higher | 4E-34 | Logistic regression | OGEE ( |
| Connectivity in protein-protein interaction network | Higher | 5E-52 | Linear regression | OGEE ( |
| Predicted haploinsufficiency | Higher | 1E-162 | Linear regression | Huang et al. 2010 [ |
| OMIM ‘haploinsufficient’ and ‘dominant negative’ subset | Enrichment | 5E-12 | Logistic regression | Petrovski et al. 2013[ |
| Mouse knock-out mortality phenotype | Enrichment | 5E-63 | Logistic regression | Mouse/Human Orthology with Phenotype Annotations [ |