| Literature DB >> 36118553 |
Malancha Karmakar1,2, Vittoria Cicaloni1,2,3, Carlos H M Rodrigues1,2,4, Ottavia Spiga3, Annalisa Santucci3, David B Ascher1,2,4.
Abstract
Alkaptonuria (AKU), a rare genetic disorder, is characterized by the accumulation of homogentisic acid (HGA) in the body. Affected individuals lack functional levels of an enzyme required to breakdown HGA. Mutations in the homogentisate 1,2-dioxygenase (HGD) gene cause AKU and they are responsible for deficient levels of functional HGD, which, in turn, leads to excess levels of HGA. Although HGA is rapidly cleared from the body by the kidneys, in the long term it starts accumulating in various tissues, especially cartilage. Over time (rarely before adulthood), it eventually changes the color of affected tissue to slate blue or black. Here we report a comprehensive mutation analysis of 111 pathogenic and 190 non-pathogenic HGD missense mutations using protein structural information. Using our comprehensive suite of graph-based signature methods, mCSM complemented with sequence-based tools, we studied the functional and molecular consequences of each mutation on protein stability, interaction and evolutionary conservation. The scores generated from the structure and sequence-based tools were used to train a supervised machine learning algorithm with 89% accuracy. The empirical classifier was used to generate the variant phenotype for novel HGD missense mutations. All this information is deployed as a user friendly freely available web server called HGDiscovery (https://biosig.lab.uq.edu.au/hgdiscovery/).Entities:
Keywords: Alkaptonuria; Machine learning; Precision medicine; Rare genetic disorder; Structural bioinformatics
Year: 2022 PMID: 36118553 PMCID: PMC9471331 DOI: 10.1016/j.crstbi.2022.08.001
Source DB: PubMed Journal: Curr Res Struct Biol ISSN: 2665-928X
Fig. 1HGDiscovery workflow. The first step involves scoping published literature and clinical databases to prepare a curated list of non-synonymous HGD mutations. The second step involves generating various structure and sequence-based features for the curated missense mutations. In the third step, we use these features in a supervised machine learning algorithm to build a binary classifier, which can distinguish between pathogenic and non-pathogenic missense mutations. Finally, we develop a free available user-friendly webserver which contains phenotypic information on all HGD variants.
Fig. 2Boxplot representation of features. A) Structural features. B) Sequence based features. C) Wild-type environment features. The non-pathogenic mutations (NP) are represented as sea green and pathogenic mutations (P) as dark orange. (∗∗∗p < 0.0001, ∗∗p < 0.001, Welch two sample t-test). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Performance metrics for the training and blind test dataset.
| Dataset | AUROC | MCC | Precision | Recall | F-score |
|---|---|---|---|---|---|
| Training | 0.89 | 0.58 | 0.79 | 0.79 | 0.79 |
| Blind test | 0.79 | 0.65 | 0.86 | 0.78 | 0.79 |
Fig. 3Empirical model performance trained on individual class of features. The Extra Tree algorithm was trained using stratified 10-fold cross validation using eight distinct class of features (first 8 bars from left to right; dark blue bars) and with a combination of all features (red bar). The AUROC score is low when a single class of feature is used for training the binary classifier, however, a significant improvement is noticed when all the eight different features are combined to build the model. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 4Area Under the Receiver Operating Characteristic (AUCROC) curves of HGD classifier. The AUROC shown for training and test datasets. The model is robust and outperforms the existing genetic tools like SIFT, PolyPhen 2 (PPH2), PMut.