Prabina Kumar Meher1, Ansuman Mohapatra2, Subhrajit Satpathy3, Anuj Sharma4, Isha Saini3, Sukanta Kumar Pradhan2, Anil Rai5. 1. ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India. meherprabin@yahoo.com. 2. Orissa University of Agriculture and Technology, Bhubaneswar, Odisha, India. 3. ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India. 4. Uttarakhand Council for Biotechnology, Pantnagar, Uttarakhand, India. 5. ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India. anil.rai@icar.gov.in.
Abstract
BACKGROUND: Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS: Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG ( https://cran.r-project.org/web/packages/PredCRG/index.html ) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.
BACKGROUND: Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS: Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG ( https://cran.r-project.org/web/packages/PredCRG/index.html ) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.
Authors: S L Harmer; J B Hogenesch; M Straume; H S Chang; B Han; T Zhu; X Wang; J A Kreps; S A Kay Journal: Science Date: 2000-12-15 Impact factor: 47.728
Authors: Alberto Ramos; Estefanía Pérez-Solís; Cristian Ibáñez; Rosa Casado; Carmen Collada; Luis Gómez; Cipriano Aragoncillo; Isabel Allona Journal: Proc Natl Acad Sci U S A Date: 2005-04-28 Impact factor: 11.205