| Literature DB >> 35111575 |
Chang Woo Ko1,2, June Huh3, Jong-Wan Park1,2.
Abstract
Deep learning technologies have been adopted to predict the functions of newly identified proteins in silico. However, most current models are not suitable for poorly characterized proteins because they require diverse information on target proteins. We designed a binary classification deep learning program requiring only sequence information. This program was named 'FUTUSA' (function teller using sequence alone). It applied sequence segmentation during the sequence feature extraction process, by a convolution neural network, to train the regional sequence patterns and their relationship. This segmentation process improved the predictive performance by 49% than the full-length process. Compared with a baseline method, our approach achieved higher performance in predicting oxidoreductase activity. In addition, FUTUSA also showed dramatic performance in predicting acetyltransferase and demethylase activities. Next, we tested the possibility that FUTUSA can predict the functional consequence of point mutation. After trained for monooxygenase activity, FUTUSA successfully predicted the impact of point mutations on phenylalanine hydroxylase, which is responsible for an inherited metabolic disease PKU. This deep-learning program can be used as the first-step tool for characterizing newly identified or poorly studied proteins.•We proposed new deep learning program to predict protein functions in silico that requires nothing more than the protein sequence information.•Due to application of sequence segmentation, the efficiency of prediction is improved.•This method makes prediction of the clinical impact of mutations or polymorphisms possible.Entities:
Keywords: Deep learning; Point mutation; Protein functions; Sequence segmentation
Year: 2022 PMID: 35111575 PMCID: PMC8790617 DOI: 10.1016/j.mex.2022.101622
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1The MCC-F1, ROC, P-R curves of all tested sequence segmentation sizes. The MCC-F1 (a), ROC (b), P-R (c) curves (FL: full-length model (blue); 16: segmentation size 16 model (green); 32: segmentation size 32 model (orange); 64: segmentation size 64 model (red); 128: segmentation size 128 model (black).
The overall evaluation results for all tested sequence segmentation sizes.
| Model | AP | MCC | F1 | AUPR | AUROC |
|---|---|---|---|---|---|
| FUTUSA_FL | 0.3089 | 0.3863 | 0.3421 | 0.3058 | 0.7525 |
| FUTUSA_16 | 0.4604 | 0.4764 | 0.4533 | 0.4576 | 0.8754 |
| FUTUSA_32 | 0.4661 | 0.4413 | .0.4494 | 0.4631 | 0.8872 |
| FUTUSA_64 | 0.5158 | 0.5186 | 0.5319 | 0.5129 | 0.8835 |
| FUTUSA_128 | 0.4612 | 0.5406 | 0.5135 | 0.4592 | 0.7671 |
The performance comparison of the competing models on the oxidoreductase activity (GO:0016491), acetyltransferase activity (GO:0016407) and demethylase activity (GO:0032451) datasets.
| Programs | AP | MCC | F1 | AUPR | AUROC | |
|---|---|---|---|---|---|---|
| Oxidoreductase | BLAST | 0.1509 | 0.3386 | 0.3014 | - | - |
| FUTUSA | 0.4319 | 0.4508 | 0.4528 | 0.4272 | 0.8136 | |
| Acetyltransferase | BLAST | 0.0649 | 0.1818 | 0.2374 | - | - |
| FUTUSA | 0.3212 | 0.4444 | 0.5331 | 0.3166 | 0.7587 | |
| Demethylase | BLAST | 0.1521 | 0.3529 | 0.3826 | - | - |
| FUTUSA | 0.3486 | 0.5000 | 0.5145 | 0.3297 | 0.6906 |
Calculation of area under ROC curve and PR curve does not assess the performance of binary predictor, BLAST model.
Fig. 2The heatmap visualization of predicted functional contribution score of individual amino acids. The predicted scores of FUTUSA are also overlaid onto crystal structure of the full-length human PAH (residues 21–446; PDB:6N1K) and catalytic domain of human PAH (residues 117-428; PDB:1MMK). (a) The heatmap is mapping with green as low predictive score and red as high predictive score. (b) The iron ion (cyan) is highlighted in balls. The substrate analogue, beta-2-Thienylalanine (THA; yellow) and amino acid residues of binding pockets (gray and brown) are presented as sticks. The prediction was performed with Phenylalanine-4-hydroxylase for monooxygenase activity.
Fig. 3The heatmap of the score changes by single amino acid mutations. Protein function changes by point mutations were predicted using FUTUSA_FL (a), FUTUSA_16 (b), FUTUSA_64 (c) The color indicates the score changes after mutation, blue as decreased score and red ad increased score. Each column represents the position of the amino acid and each row represents the changed amino acid after mutation. The first row, del, indicates the deletion of the amino acid. The prediction was performed with Phenylalanine-4-hydroxylase for monooxygenase activity.
| Subject Area: | Biochemistry, Genetics and Molecular Biology |
| More specific subject area: | Protein Function Prediction |
| Method name: | FUTUSA (function teller using sequence alone) |
| Name and reference of original method: | There is no original method |
| Resource availability: |