| Literature DB >> 28542158 |
Fan Yang1,2,3,4, Song Sun1,2,3,4,5, Guihong Tan1,2, Michael Costanzo1,2, David E Hill6,7, Marc Vidal6,7, Brenda J Andrews1,2, Charles Boone1,2,8, Frederick P Roth1,2,3,4,6,8.
Abstract
To better understand the health implications of personal genomes, we now face a largely unmet challenge to identify functional variants within disease-associated genes. Functional variants can be identified by trans-species complementation, e.g., by failure to rescue a yeast strain bearing a mutation in an orthologous human gene. Although orthologous complementation assays are powerful predictors of pathogenic variation, they are available for only a few percent of human disease genes. Here we systematically examine the question of whether complementation assays based on paralogy relationships can expand the number of human disease genes with functional variant detection assays. We tested over 1,000 paralogous human-yeast gene pairs for complementation, yielding 34 complementation relationships, of which 33 (97%) were novel. We found that paralog-based assays identified disease variants with success on par with that of orthology-based assays. Combining all homology-based assay results, we found that complementation can often identify pathogenic variants outside the homologous sequence region, presumably because of global effects on protein folding or stability. Within our search space, paralogy-based complementation more than doubled the number of human disease genes with a yeast-based complementation assay for disease variation.Entities:
Mesh:
Year: 2017 PMID: 28542158 PMCID: PMC5466341 DOI: 10.1371/journal.pgen.1006779
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Schematic overview of process for assessing the functional effect of human disease-associated variants via complementation testing.
A. We selected paralog pairs where a human disease protein has a yeast paralog for which all protein domains are also found in the human protein. Homologous pairs of domains are connected by solid lines, while non-homologous domain pairs are connected by a dashed line. B. For a subset of those paralog pairs for which we identified complementation relationships, we used these relationships to assess whether the functionality of variants in these assays predicted variant pathogenicity.
Fig 2Protein domain architecture of yeast Kin28 and human paralogs.
Shown are yeast Kin28 (red text), and human paralogs tested for complementation (in blue text if we found complementation and black text otherwise). Protein domain patterns Pkinase_Tyr (PFAM pattern PF07714) and Pkinase (PFAM pattern PF00069) are indicated in light and dark blue, respectively.
Seven human genes can complement yeast Kin28.
| Human Gene Name | ||
|---|---|---|
| Ribosomal Protein S6 Kinase-Like 1 | RPS6KL1 | PF00069,PF04212 |
| G Protein-Coupled Receptor Kinase 4 | GRK4 | PF00069 |
| Cyclin-Dependent Kinase-Like 3 | CDKL3 | PF00069 |
| Bone Morphogenetic Protein Receptor | BMPR1B | PF00069,PF01064,PF08515 |
| V-Akt Murine Thymoma Viral Oncogene Homolog 2 | AKT2 | PF00069,PF00169,PF00433 |
| Activin Receptor Type-2B | ACVR2B | PF00069,PF01064 |
| Activin A Receptor | ACVR1C | PF00069,PF01064,PF08515 |
Fig 3Functional assay and protein domain architecture of yeast Cak1 and its complementing human paralogs.
(A) Functional complementation assay results showing that expression of human proteins TBK1 and CDK7 complements defects in a strain (YFL029C_tsa650) that encodes a temperature sensitive variant of Cak1 (described as “cak1-ts” above). (B) Pkinase domains are shown in dark blue. Complementing paralogs indicated in blue text.
Fig 4Relating sequence similarity and ability of a paralog to complement.
The average percent identity (PID) score distribution is shown for human-yeast pairs such that multiple human paralogs were tested for a given yeast protein (A), and for human-yeast pairs such that multiple yeast paralogs were tested for a given human protein (B). In each case, the distribution is shown separately for complementing and non-complementing pairs. Each bin height is the count of human or yeast genes having a PID within the appropriate range for that bin. That complementing and non-complementing distributions are both shifted in positon relative to one another and highly overlapping suggests that sequence similarity is an informative but imperfect predictor of complementation.
Deleteriousness predictions from functional complemention (FC), Polyphen-2 (PPH2) and PROVEAN.
| Gene Symbol | Entrez | Variant | Disease Assoc? | FC Score | FC Prediction | FC Correct? | PPH2 Score | PPH2 Prediction | PPH2 Correct? | Provean Score | Provean Prediction | ProveanCorrect? | Within Aligned Region? |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8573 | T573I | No | 0.6 | Damaging | No | 0.021 | Neutral | Yes | -2.35 | Neutral | Yes | No | |
| 8573 | D471N | No | 0.4 | Neutral | 0.005 | Neutral | Yes | -1.48 | Neutral | Yes | No | ||
| 8573 | M438L | No | 0.4 | Neutral | 0 | Neutral | Yes | -1.24 | Neutral | Yes | Yes | ||
| 8573 | R430C | No | 0.4 | Neutral | 0.035 | Neutral | Yes | -2.51 | Damaging | No | Yes | ||
| 8573 | R28L | Yes | 0.8 | Damaging | 1 | Damaging | Yes | -3.59 | Damaging | Yes | Yes | ||
| 1588 | M21T | No | 0.6 | Damaging | No | 0.01 | Neutral | Yes | -0.65 | Neutral | Yes | No | |
| 1588 | M85R | Yes | 0.8 | Damaging | 0.128 | Neutral | No | -2.77 | Damaging | Yes | Yes | ||
| 1588 | W39R | No | 0.4 | Neutral | 0.343 | Neutral | Yes | -5.16 | Damaging | No | Yes | ||
| 1588 | M127R | Yes | 0.8 | Damaging | 1 | Damaging | Yes | -4.87 | Damaging | Yes | Yes | ||
| 1588 | Y81C | Yes | 0.8 | Damaging | 1 | Damaging | Yes | -6.87 | Damaging | Yes | Yes | ||
| 79947 | K42E | Yes | 0 | Neutral | No | 0.786 | Damaging | Yes | -3.65 | Damaging | Yes | Yes | |
| 10436 | D86G | Yes | 0.6 | Damaging | 1 | Damaging | Yes | -6.99 | Damaging | Yes | Yes | ||
| 55764 | G51A | No | 0.2 | Neutral | 0.016 | Neutral | Yes | -4.11 | Damaging | No | No | ||
| 55764 | T91I | No | 0.2 | Neutral | 0.953 | Damaging | No | -3.99 | Damaging | No | No | ||
| 55764 | S373F | Yes | 0.6 | Damaging | 0.951 | Damaging | Yes | -5.038 | Damaging | Yes | No | ||
| 55764 | L99W | No | 0.4 | Neutral | 0.861 | Damaging | No | -0.178 | Neutral | Yes | No | ||
| 55764 | R328W | No | 0.2 | Neutral | 0.994 | Damaging | No | -6.168 | Damaging | No | No | ||
| 83452 | N148K | Yes | 0.8 | Damaging | 0.005 | Neutral | No | 0.6 | Neutral | No | No | ||
| 83452 | K46Q | Yes | 0.8 | Damaging | 1 | Damaging | Yes | -3.55 | Damaging | Yes | Yes | ||
| 83452 | P142L | No | 0.6 | Damaging | No | 1 | Damaging | No | -9.99 | Damaging | No | Yes | |
| 83452 | T177M | No | 0.6 | Damaging | No | 1 | Damaging | No | -5.21 | Damaging | No | Yes | |
| 7415 | A232G | Yes | 0.6 | Damaging | 0.005 | Neutral | No | -1.87 | Neutral | No | No | ||
| 7415 | I151V | Yes | 0.4 | Neutral | No | 0 | Neutral | No | -0.51 | Neutral | No | Yes | |
| 7415 | I27V | No | 0.2 | Neutral | 0 | Neutral | Yes | -0.43 | Neutral | Yes | Yes | ||
| 7415 | Q19R | No | 0.4 | Neutral | 0 | Neutral | Yes | 0.61 | Neutral | Yes | Yes | ||
| 7415 | S171N | No | 0 | Neutral | 0.004 | Neutral | Yes | -1.18 | Neutral | Yes | No | ||
| 7415 | T436I | No | 0.4 | Neutral | 0.236 | Neutral | Yes | -3.76 | Damaging | No | No | ||
| 7415 | I206F | Yes | 0.6 | Damaging | 0.983 | Damaging | Yes | -3.7 | Damaging | Yes | Yes | ||
| 7415 | L198W | Yes | 0.6 | Damaging | 1 | Damaging | Yes | -4.71 | Damaging | Yes | Yes | ||
| 7415 | R159G | Yes | 0.6 | Damaging | 1 | Damaging | Yes | -6.56 | Damaging | Yes | No | ||
| 7415 | R159C | Yes | 0.8 | Damaging | 1 | Damaging | Yes | -6.31 | Damaging | Yes | No | ||
| 7415 | R159H | Yes | 0.8 | Damaging | 0.517 | Damaging | Yes | -2.97 | Damaging | Yes | No | ||
| 7415 | R191G | Yes | 0.6 | Damaging | 0.999 | Damaging | Yes | -6.49 | Damaging | Yes | Yes | ||
| 7415 | P137L | Yes | 0.4 | Neutral | No | 1 | Damaging | Yes | -9.31 | Damaging | Yes | Yes | |
| 7415 | R155G | Yes | 0.4 | Neutral | No | 0.998 | Damaging | Yes | -5.18 | Damaging | Yes | No |
The annotation of “FC correct?”, “PPH2 Correct?”, “Provean Correct?” is based on whether deleteriousness annotations from FC, PPH2 or Provean agree with current pathogenicity (HGMD “DM”) annotations. FC predictions that were correct according to HGMD “DM” are emphasized using a bold-text “Yes”.
Pathogenicity prediction performance for the human disease gene paralog test set.
| Method | MCC | AUPRC | AUROC | REC90 |
|---|---|---|---|---|
| PolyPhen-2 | 0.48 | 0.76 | 0.74 | |
| PROVEAN | 0.37 | 0.7 | 0.52 | 0.71 |
| Paralog-based FC |
(MCC) Matthews correlation coefficient;
(AUPRC) area under the precision-recall curve;
(AUROC) area under the receiver-operating characteristic curve;
(REC90) recall at 90% precision.
Performance estimates for best-performing methods are indicated by underline
Fig 5Ability of functional complementation to predict pathogenicity.
(A) Distribution of FC scores for disease associated (red line) or non-disease-associated variants (blue line). FC scores from paralog-based complementation assays are significantly higher for disease-associated variants than non-disease-associated variants (P-value, Wilcoxon test). (B) Precision vs. recall performance for functional complementation scores (both paralog- and ortholog-based), PolyPhen-2 scores, and various options for combining the two approaches (see Methods).
Fig 6Performance of pathogenic variant identification does not strongly depend on whether the variant is in the aligned region.
Here we show precision vs recall performance for varants that either do (‘aligned’) or do not (non-aligned) fall within the sequence region that can be aligned between human and yeast homologs.
Fig 7The kinome tree of yeast Kin28 and its kinase paralogs tested here.
Kinases that can complement yeast Kin 28 were colored in pink, other kinases tested for ability to complement yeast Kin28 were colored in cyan. (The image was generated from the Kinome-Render Tool [49] hosted at Cell Signaling, Inc.).
Numbers of human disease-associated genes with orthologs and paralogs in five model species.
| Organism | Human disease-associated genes | |
|---|---|---|
| Orthologs | Paralogs | |
| 6648 | 3019 | |
| 5547 | 256 | |
| 5492 | 265 | |
| 4619 | 231 | |
| 3021 | 384 | |
| 2665 | 169 | |
*This figure is conservative, in that the HGMD source for this information used a more stringent criterion for paralogy (elsewhere in this study homologs without annotated orthology are referred to as paralogs).