| Literature DB >> 35581553 |
Yunchuan Wang1, Xiuru Dai1, Daohong Fu1, Pinghua Li1, Baijuan Du2.
Abstract
BACKGROUND: The primary determinant of crop yield is photosynthetic capacity, which is under the control of photosynthesis-related genes. Therefore, the mining of genes involved in photosynthesis is important for the study of photosynthesis. MapMan Mercator 4 is a powerful annotation tool for assigning genes into proper functional categories; however, in maize, the functions of approximately 22.15% (9520) of genes remain unclear and are labeled "not assigned", which may include photosynthesis-related genes that have not yet been identified. The fast-increasing usage of the machine learning approach in solving biological problems provides us with a new chance to identify novel photosynthetic genes from functional "not assigned" genes in maize.Entities:
Keywords: Ensemble learning; Functional category; Machine learning; Photosynthesis; RNA-Seq
Mesh:
Year: 2022 PMID: 35581553 PMCID: PMC9112524 DOI: 10.1186/s12859-022-04722-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Overview of dataset
| Category | Gene number | Example gene |
|---|---|---|
| Photosynthesis related genes | 220 | |
| Not related to photosynthesis genes | 405 | |
| Unannotated genes | 9520 |
Fig. 2The performance of four sub-models. a fivefold cross validation. b AUC of sub-models using fivefold cross validation strategy. c Recall of sub-models using fivefold cross validation strategy
Fig. 1Overview of the PGD framework
Performance comparison of different methods by AUC-ROC
| Fold1 | Fold2 | Fold3 | Fold4 | Fold5 | |
|---|---|---|---|---|---|
| RF | 0.941 | 0.924 | 0.948 | 0.896 | |
| CAT | 0.931 | 0.862 | 0.906 | 0.953 | |
| XGB | 0.936 | 0.948 | 0.940 | 0.942 | 0.918 |
| GBDT | 0.942 | 0.959 | 0.931 | ||
| VOTE | 0.942 | 0.940 |
Bold font indicates the highest value in the Fold
Performance comparison of different methods by Recall
| Fold1 | Fold2 | Fold3 | Fold4 | Fold5 | |
|---|---|---|---|---|---|
| RF | 0.944 | 0.928 | 0.960 | 0.912 | |
| CAT | 0.944 | 0.888 | 0.912 | 0.960 | |
| XGB | 0.944 | 0.936 | 0.952 | ||
| GBDT | 0.952 | 0.944 | |||
| VOTE | 0.952 | 0.936 |
Bold font indicates the highest value in the Fold
Fig. 3Performance comparison of PGD model using single and multiple photosynthetic mutants
Fig. 4Expression changes of predicted photosynthetic related genes along maize leaf gradient. a Hierarchy clustering showing the expression changes of 716 genes along maize leaf sections from base to tip. b The expression similarity comparison among predicted genes in cluster 1 and classical photosynthetic related genes. c The expression similarity comparison among predicted genes in cluster 2 and classical photosynthetic related genes. d The expression similarity comparison among predicted genes in cluster 4 and classical photosynthetic related genes