| Literature DB >> 35221569 |
Sagarika Dash1, Tanmaya Kumar Sahu2, Subhrajit Satpathy2, Prabina Kumar Meher2,3, Sukanta Kumar Pradhan1.
Abstract
In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12298-022-01130-6. © Prof. H.S. Srivastava Foundation for Science and Society 2022.Entities:
Keywords: Circadian gene; Computational biology; Machine learning; Proteome; Support vector machine
Year: 2022 PMID: 35221569 PMCID: PMC8847649 DOI: 10.1007/s12298-022-01130-6
Source DB: PubMed Journal: Physiol Mol Biol Plants ISSN: 0974-0430