| Literature DB >> 28873455 |
Lei Chen1,2, Yu-Hang Zhang3, ShaoPeng Wang1, YunHua Zhang4, Tao Huang3, Yu-Dong Cai1.
Abstract
Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.Entities:
Mesh:
Year: 2017 PMID: 28873455 PMCID: PMC5584762 DOI: 10.1371/journal.pone.0184129
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flow chart of the whole procedure for investigating essential and non-essential genes.
Fig 2The IFS curve using the MCC as its Y-axis and the number of features participating in classification as its X-axis.
The SN, SP, ACC, and MCC yielded by the optimal SVM prediction model and the model using features of KEGG enrichment scores.
| Model | Number of features | SN | SP | ACC | MCC |
|---|---|---|---|---|---|
| Optimal SVM prediction model | 345 | 0.927 | 0.999 | 0.985 | 0.951 |
| Model using features of KEGG enrichment scores | 279 | 0.873 | 0.989 | 0.966 | 0.891 |
Fig 3Distribution of the corresponding GO terms of the optimal features and top 21 features in the mRMR features in the three groups.
Fig 4A part of the IFS curve shown in Fig 2.
Twenty-one GO terms corresponding to the top 21 features in the mRMR feature list.
| Rank in mRMR feature list | GO term ID | GO term | Cluster |
|---|---|---|---|
| 1 | GO:0032991 | macromolecular complex | Cellular component |
| 2 | GO:0021888 | hypothalamus gonadotrophin-releasing hormone neuron development | Biological process |
| 3 | GO:0071008 | U2-type post-mRNA release spliceosomal complex | Cellular component |
| 4 | GO:0044424 | intracellular part | Cellular component |
| 5 | GO:0000154 | rRNA modification | Biological process |
| 6 | GO:0043226 | organelle | Cellular component |
| 7 | GO:0016071 | mRNA metabolic process | Biological process |
| 8 | GO:0071146 | SMAD3-SMAD4 protein complex | Cellular component |
| 9 | GO:0072669 | tRNA-splicing ligase complex | Cellular component |
| 10 | GO:0044422 | organelle part | Cellular component |
| 11 | GO:0021886 | hypothalamus gonadotrophin-releasing hormone neuron differentiation | Biological process |
| 12 | GO:0002183 | cytoplasmic translational initiation | Biological process |
| 13 | GO:0005622 | intracellular | Cellular component |
| 14 | GO:0015030 | Cajal body | Cellular component |
| 15 | GO:0030874 | nucleolar chromatin | Cellular component |
| 16 | GO:0044446 | intracellular organelle part | Cellular component |
| 17 | GO:0010467 | gene expression | Biological process |
| 18 | GO:0043227 | membrane-bounded organelle | Cellular component |
| 19 | GO:1902369 | negative regulation of RNA catabolic process | Biological process |
| 20 | GO:0044822 | poly(A) RNA binding | Molecular function |
| 21 | GO:0005737 | cytoplasm | Cellular component |