| Literature DB >> 20523748 |
Nicholas J Marini1, Paul D Thomas, Jasper Rine.
Abstract
Computational predictions of the functional impact of genetic variation play a critical role in human genetics research. For nonsynonymous coding variants, most prediction algorithms make use of patterns of amino acid substitutions observed among homologous proteins at a given site. In particular, substitutions observed in orthologous proteins from other species are often assumed to be tolerated in the human protein as well. We examined this assumption by evaluating a panel of nonsynonymous mutants of a prototypical human enzyme, methylenetetrahydrofolate reductase (MTHFR), in a yeast cell-based functional assay. As expected, substitutions in human MTHFR at sites that are well-conserved across distant orthologs result in an impaired enzyme, while substitutions present in recently diverged sequences (including a 9-site mutant that "resurrects" the human-macaque ancestor) result in a functional enzyme. We also interrogated 30 sites with varying degrees of conservation by creating substitutions in the human enzyme that are accepted in at least one ortholog of MTHFR. Quite surprisingly, most of these substitutions were deleterious to the human enzyme. The results suggest that selective constraints vary between phylogenetic lineages such that inclusion of distant orthologs to infer selective pressures on the human enzyme may be misleading. We propose that homologous proteins are best used to reconstruct ancestral sequences and infer amino acid conservation among only direct lineal ancestors of a particular protein. We show that such an "ancestral site preservation" measure outperforms other prediction methods, not only in our selected set for MTHFR, but also in an exhaustive set of E. coli LacI mutants.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20523748 PMCID: PMC2877731 DOI: 10.1371/journal.pgen.1000968
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Example growth curves from which rate metrics were calculated.
Shown are two examples (major MTHFR allele (open triangle); Y189H substitution variant (closed circle)) where growth in liquid culture was tracked over time, according to Methods. The upper panel shows absorbance (OD595) values and the lower panel shows the log10 transformation of the same absorbance reads. Log10-transformed data were used to calculate maximum slopes that served as growth-rate metrics.
Figure 2Activities of MTHR mutants.
The average maximum slope (growth-rate metric) and standard deviation for each of the 36 MTHFR variants tested as in Methods. Replicate sets (N = 5) were compared against a positive control (major MTHFR allele) and a negative control (A222V allele) using 2 different statistical criteria as described in Methods. Green circles indicate changes not significantly different from the positive control and significantly better than the A222V control and indicate functionality. Red squares indicate changes significantly less active than the positive control and not significantly better than the A222V control and indicate impaired alleles. Pink triangles are classified as equivocal due to disagreement in the statistical methods. The raw replicate data and statistical metrics are in Table S1.
Figure 3Phylogenetic tree and ancestral allele determination from orthologs of human MTHFR.
Tree: MTHFR sequences from modern-day species are indicated. Database identifiers for these entries are listed in Table S2. Gene duplication events are shown with orange circles, and speciation events with green circles. Nodes numbered in red correspond to ancestral branch points in the human MTHFR lineage. Longer branch lengths indicate faster evolutionary rate. The chicken sequence was given an arbitrary, long branch length because it is a sequence fragment and the actual branch length could not be accurately determined. Ancestral allele determinations: The right columns show the amino acids found in the modern-day sequences corresponding to positions 134, 240 and 294 in human MTHFR. These are shown to illustrate how ancestral sites are determined and, consequently, how long the identity of the site in the human enzyme has been preserved in the human lineage (see text for details).
Amino acid substitutions from human MTHFR orthologs tested for functional impact.
| Change | Human Site Conserved in Clade | Substitution Accepted in Ortholog | Human Lineage Preservation | Grantham score | SIFT score |
| Y119Q | Deuterostomes, Nematodes, |
| Bilaterian | 99 | 0.21 |
| R134S |
|
| Rodent-Primate | 110 | 0.66 |
| Y174R | Deuterostomes, Fungi, |
| Unikont | 77 | 0.12 |
| E300H | Amniotes, Nematodes, Fungi |
| Tetrapod | 40 | 0.24 |
| E203Q | Vertebrates, Fungi, Bacteria |
| Vertebrate | 29 | 0.11 |
| I77R | Chordates |
| Vertebrate | 97 | 0.1 |
| M110I | Deuterostomes, |
| Deuterostome | 10 | 0.19 |
| R295Q | Deuterostomes, Fungi |
| Deuterostome | 54 | 0.06 |
| P202S | Deuterostomes, Nematodes, Fungi, Plants, Bacteria |
| Last Universal | 74 | 0.08 |
| Q267S | Chordates |
| Chordate/Last Universal | 80 | 0.02 |
| D223N | All Clades |
| Last Universal | 23 | 0.13 |
| I294V | Deuterostomes |
| Deuterostome/Last Universal | 29 | 1 |
| Q267R | Chordates |
| Chordate/Last Universal | 54 | 1 |
| N152S | Deuterostomes, Fungi, Bacteria |
| Last Universal | 46 | 0.04 |
| W165F | Deuterostomes, Fungi, Bacteria |
| Unikont | 40 | 0.3 |
| F237L | Placental Mammals |
| Mammalian/Eukaryote | 22 | 1 |
| E123T | Vertebrates, Fungi, Plants, |
| Vertebrate | 65 | 0.16 |
| Y189H | Vertebrates, Nematodes, Fungi, Plants |
| Eukaryote | 83 | 0.56 |
| D92T | Deuterostomes, Nematodes, Fungi, Plants |
| Eukaryote | 85 | 0.08 |
| A222V | All Clades |
| Last Universal | 64 | 0.06 |
| S264G | Bilaterians, Fungi |
| Bilaterian | 56 | 0.41 |
| W165E | Deuterostomes, Fungi, Bacteria |
| Unikont | 152 | 0.11 |
| E285V | All Clades |
| Last universal | 121 | 0.02 |
| T69F | Deuterostomes, Fungi, Plants, Bacteria |
| Last Universal | 103 | 0.04 |
| C306S | Deuterostomes, Nematodes, Fungi, Plants, Bacteria |
| Last Universal | 112 | 0.02 |
| V240E | Deuterostomes, Nematodes, Fungi, Plants, Bacteria |
| Last Universal | 121 | 0.08 |
| F237E | Placental mammals |
| Mammalian/Eukaryote | 140 | 0.18 |
| C306V | Deuterostomes, Nematodes, Fungi, Plants, Bacteria |
| Last Universal | 192 | 0.09 |
| G247P | All Clades |
| Last Universal | 42 | 0.03 |
| L336S | All Clades |
| Last Universal | 145 | 0.03 |
*Human-Macaque ancestral reconstruction and substitutions not found in any ortholog (P67V, R134C, R134F, D291N) are not listed.
**Boldface species are present in Figure 3.
***Ancestral reconstruction, >90% probability using PAML.
#The identities of these residues are conserved in the human lineage from the more recent of the two ancestors listed, at which point a longer-standing pattern of preservation (dating back to the more ancient of the two) was broken.
“Human Site Conserved in Clade” lists clades in which the identity of the human enzyme is seen multiple times. The sequence alignments for all positions interrogated in this study are in Table S2. “Human Lineage Preservation” is the most ancient MTHFR ancestor in which the identity of that site in the human enzyme has been preserved. “Substitution accepted in Ortholog”: MTHFR ortholog(s) in which the substitution can be found. Grantham score [5] estimates the physico-chemical dis-similarity between the parent and substituting amino acids. SIFT score [32] estimates whether the amino acid change can be tolerated based on the variability at that site in homologous proteins (see text for details). The order is presented as in Figure 2.
Figure 4Accuracy of discrimination between functional and impaired variants by different methods.
Growth-rate metrics for the 30 variants in Table 1 plotted against scores/classifications from various methods that estimate functional impact. The accuracy of each method was determined by calculating the number of mutations correctly called as functional or impaired divided by the number of mutations unambiguously classified by experimental data. Binning of mutations was determined by using a threshold empirically defined by the functional alleles (dashed vertical line in each panel) to define the functional (left of line) and impaired (right of line) bins. (A) SIFT score; note that the graph plots (1–score) to facilitate comparison with the other methods. All functional variants have a SIFT score >0.09 which, when used as a threshold results in a classification accuracy of 62%. The recommended threshold of 0.05 (solid vertical line) results in a lower classification accuracy (42%). (B) Grantham scale of amino acid dissimilarity between wild-type and substituted amino acid. (C) Ancestral Site Preservation (ASP) measure, using inferred ancestral sequences of human MTHFR. Numbers on the x-axis correspond to increasingly ancient ancestors of human MTHFR as defined by the nodes in Figure 3. (D) Ancestral Site Preservation Extended (ASP) measure. If a site was preserved in the ancestral lineage for a long period before being substituted by the current-day amino acid, the more ancient ancestor is used to define preservation at this site. The preservation measure for 5 variants is shifted by this criterion (see Table 1).
Prediction accuracy of different algorithms on the LacI dataset.
| Algorithm | Accuracy | Reference |
| SIFT (v.1) | 68.3% | 35 |
| SIFT (v.2) | 68.1% | 9 |
| MAPP (including paralogs) | 69.2% | 11 |
| MAPP (only orthologs) | 70.7% | 11 |
| ASP | 72.0% | This study |
| ASP extended | 72.0% | This study |
*Only studies with published prediction results on essentially all of the >4000 mutations in the LacI dataset are listed.