| Literature DB >> 34141133 |
Stephanie Portelli1,2,3, Lucy Barr1,2,3, Alex G C de Sá1,2,3,4, Douglas E V Pires1,2,3,5, David B Ascher1,2,3,4,6.
Abstract
Phosphate and tensin homolog on chromosome ten (PTEN) germline mutations are associated with an overarching condition known as PTEN hamartoma tumor syndrome. Clinical phenotypes associated with this syndrome range from macrocephaly and autism spectrum disorder to Cowden syndrome, which manifests as multiple noncancerous tumor-like growths (hamartomas), and an increased predisposition to certain cancers. It is unclear, however, the basis by which mutations might lead to these very diverse phenotypic outcomes. Here we show that, by considering the molecular consequences of mutations in PTEN on protein structure and function, we can accurately distinguish PTEN mutations exhibiting different phenotypes. Changes in phosphatase activity, protein stability, and intramolecular interactions appeared to be major drivers of clinical phenotype, with cancer-associated variants leading to the most drastic changes, while ASD and non-pathogenic variants associated with more mild and neutral changes, respectively. Importantly, we show via saturation mutagenesis that more than half of variants of unknown significance could be associated with disease phenotypes, while over half of Cowden syndrome mutations likely lead to cancer. These insights can assist in exploring potentially important clinical outcomes delineated by PTEN variation.Entities:
Keywords: Genotype-phenotype correlations; Machine learning; Mutation analysis; PHTS; PTEN
Year: 2021 PMID: 34141133 PMCID: PMC8180946 DOI: 10.1016/j.csbj.2021.05.028
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Main domains and subdomains present in PTEN. PTEN is primarily made up of two domains (A), the phosphatase domain (light orange) which comprises the P-, TI- and WPD- loops and the C2 domain (green) which comprises the membrane binding CBR3 tip and cɑ2 helix basic patch. The phosphatase is the site for PIP3 binding, shown in (B) bound to tartrate ion (black). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Data curation and in silico analyses. The curation of data from different sources identified subsets of pathogenic mutations apart from ASD and Cancer, which were the main pathogenic classes of interest in this study. To best characterize the biological effects mediated by these mutations, all classes and subclasses served a purpose in our analyses, which consisted of qualitative structural analysis, statistical t-test, data visualization techniques and supervised machine learning (ML). The use of the subsets within each analysis is summarized.
| Class | Description | Analyses | |
|---|---|---|---|
| Cancer | 59 | Mutations present in cancer cases, irrespective of ‘mild PHTS’ and which have not been identified in ASD | Qualitative structural, statistical, data visualization, supervised ML |
| ASD | 65 | Mutations present in ASD patients, irrespective of ‘mild PHTS’ and which have not been identified in Cancer | Qualitative structural, statistical, data visualization, supervised ML: |
| PHTS | 26 | Mutations which either presented with overall PHTS symptoms, including CS and BRRS and no cancer/ASD (“mild PHTS”), or mutations manifesting in both diseases within the same patient (“severe PHTS”) | Qualitative structural, statistical, data visualization |
| Both | 31 | Mutations causing both Cancer and ASD, identified from separate patients | Qualitative structural, statistical, data visualization |
| CS | 26 | ‘Mild PHTS’ mutations identified in CS/BRRS patients with no other phenotype identified | Qualitative structural, statistical, data visualization, supervised ML: identification of mutations increasing cancer risk |
| VUS | 294 | ClinVar classified ‘variants of unknown significance’ or ‘conflicting interpretations of pathogenicity’ which have not been identified in the pathogenic classes | Supervised ML: suggesting reclassification of VUS |
| Non-Pathogenic | 22 | Mutations present in the general population which have not been identified in the pathogenic datasets | Qualitative structural, statistical, data visualization, supervised ML |
Fig. 2Methodology Pipeline followed in this study. Initial mutation curation was carried out to obtain Cancer (n = 59), ASD (n = 65) and Non-Pathogenic (n = 22; labelled as Benign in figure) mutations from five different sources. Data curation also involved processing of the experimental crystal structure to fill in missing residues and model the missing loop (286–309). In silico biophysical tools were then used to measure the effects of mutations on protein structure and function (Feature Generation), which was followed by different structural and statistical analyses and the development of a three-class prediction model.
Fig. 3Mutation distribution. Differences in distribution of mutations across the three main phenotypes: ASD (yellow), Cancer (red) and non-pathogenic (blue) across protein structure (A), and gene (B). Cancer mutations are observed in higher concentrations within the phosphatase region, suggesting a direct effect on PIP3 dephosphorylation and subsequent tumor suppression. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4Principal component analysis plot on all features. When considering the main phenotypes (A), pathogenic classes ASD (yellow) and Cancer (red), were observed to overlap, while Non-pathogenic mutations (blue) mapped at distinct regions on the plot. A comparison of the interim classes (B) shows slight distinctions between Both (purple) and PHTS (green), while CS (grey) mutations lied in between. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 5Metrics for the chosen model. Model was obtained after greedy feature selection, where the confusion matrices (A) calculated the three-class classifier using the OneVsOne method on the Adaboost algorithm, validated through 10-fold cross validation, with prediction performance of up to 0.68 MCC. Confusion matrices show correctly predicted data points per class, across the diagonal. (B) Observing the contribution of each feature within the estimators of our final model shows that MTR score and RSA are important in identification of disease variants, while changes in lipid phosphatase activity plays an important role in distinguishing between disease outcomes.
Balanced metrics observed in final model. The final model performed similarly between cross validation and blind test, suggesting there is no inherent bias underlying predictions.
| Validation method | MCC | B. acc | F1 (micro) | F1 (macro) | F1 (weighted) | Recall ASD | Recall cancer | Recall NP |
|---|---|---|---|---|---|---|---|---|
| 10-fold CV | 0.68 | 0.77 | 0.78 | 0.76 | 0.77 | 0.37 | 0.54 | 0.77 |
| Blind test | 0.68 | 0.77 | 0.81 | 0.77 | 0.81 | 0.91 | 0.41 | 0.60 |
| ASD test | – | 0.32 | 0.32 | 0.16 | 0.48 | 0.32 | – | – |