| Literature DB >> 35881609 |
Sima Sazegari1, Ali Niazi1, Zahra Zinati2, Mohammad Hadi Eskandari3.
Abstract
Saccharomyces cerevisiae is known for its outstanding ability to produce ethanol in industry. Underlying the dynamics of gene expression in S. cerevisiae in response to fermentation could provide informative results, required for the establishment of any ethanol production improvement program. Thus, representing a new approach, this study was conducted to identify the discriminative genes between improved and repressed ethanol production as well as clarifying the molecular responses to this process through mining the transcriptomic data. The significant differential expression probe sets were extracted from available microarray datasets related to yeast fermentation performance. To identify the most effective probe sets contributing to discriminate ethanol content, 11 machine learning algorithms from RapidMiner were employed. Further analysis including pathway enrichment and regulatory analysis were performed on discriminative probe sets. Besides, the decision tree models were constructed, the performance of each model was evaluated and the roots were identified. Based on the results, 171 probe sets were identified by at least 5 attribute weighting algorithms (AWAs) and 17 roots were recognized with 100% performance Some of the top ranked presets were found to be involved in carbohydrate metabolism, oxidative phosphorylation, and ethanol fermentation. Principal component analysis (PCA) and heatmap clustering validated the top-ranked selective probe sets. In addition, the top-ranked genes were validated based on GSE78759 and GSE5185 dataset. From all discriminative probe sets, OLI1 and CYC3 were identified as the roots with the best performance, demonstrated by the most weighting algorithms and linked to top two significant enriched pathways including porphyrin biosynthesis and oxidative phosphorylation. ADH5 and PDA1 were also recognized as differential top-ranked genes that contribute to ethanol production. According to the regulatory clustering analysis, Tup1 has a significant effect on the top-ranked target genes CYC3 and ADH5 genes. This study provides a basic understanding of the S. cerevisiae cell molecular mechanism and responses to two different medium conditions (Mg2+ and Cu2+) during the fermentation process.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35881609 PMCID: PMC9321456 DOI: 10.1371/journal.pone.0259476
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Some of the informative probe sets identified by at least four AWAs.
| probe sets | Standard Gene Name | AWAs Names | AWAs number | Gene Name |
|---|---|---|---|---|
| A_06_P4554 | MRP8 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, PCA, Relief, Rule, SVM, Uncertainty | 10 | Uncharacterized, response to stress |
| A_06_P1016 | OLI1 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, PCA, Relief, Rule, SVM, Uncertainty | 10 | ATP synthase subunit 9, mitochondrial;OLI1;ortholog |
| A_06_P1397 | ADH5 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, PCA, Rule, SVM, Uncertainty | 9 | Alcohol dehydrogenase 5;ADH5;ortholog |
| A_06_P1238 | PKC1 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, PCA, Rule, SVM | 8 | Protein serine/threonine kinase |
| A_06_P3384 | GTR2 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, PCA, Rule, SVM, Uncertainty | 9 | GTP-binding protein |
| A_06_P1063 | CYC3 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, Relief, Rule, SVM, Uncertainty | 9 | Cytochrome c heme lyase |
| A_06_P1003 | COX1 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, Rule, SVM, Uncertainty | 8 | cytochrome c oxidase |
| A_06_P2810 | PDA1 | Chi Square Statistic, Correlation, Gini Index, Information Gain, Information Gain ratio, Rule, SVM | 7 | Pyruvate dehydrogenase E1 component subunit alpha, mitochondrial;PDA1;ortholog |
| A_06_P2931 | QCR6 | Chi Square Statistic, Correlation, PCA, Rule, SVM | 5 | Cytochrome b-c1 complex subunit 6;QCR6;ortholog |
| A_06_P6820 | ALD6 | Chi Square Statistic, PCA, Rule, Uncertainty | 4 | Magnesium-activated aldehyde dehydrogenase, cytosolic;ALD6;ortholog |
Decision tree models roots identified as exhibited 100% performance.
| elements | GENE_SYMBOL | AWS | DESCRIPTION |
|---|---|---|---|
| A_06_P1016 | OLI1 | 10 | BioProcess = ATP synthesis coupled proton transport |
| A_06_P2475 | CWC21 | 9 | BioProcess = biological_process unknown |
| A_06_P1002 | ORF:Q0017 | 9 | BioProcess = biological_process unknown |
| A_06_P1034 | CYS3 | 9 | BioProcess = sulfur amino acid metabolism* |
| A_06_P1131 | HTB2 | 8 | BioProcess = chromatin assembly/disassembly |
| A_06_P1068 | KRE23 | 9 | BioProcess = biological_process unknown |
| A_06_P2984 | CGR1 | 9 | BioProcess = rRNA processing* |
| A_06_P1298 | ORF:YBR051W | 9 | BioProcess = biological_process unknown |
| A_06_P1524 | ORF:YBR270C | 8 | BioProcess = biological_process unknown |
| A_06_P3287 | ORF:YGR067C | 9 | BioProcess = biological_process unknown |
| A_06_P1063 | CYC3 | 9 | BioProcess = not yet annotated |
| A_06_P2051 | RPS13 | 9 | BioProcess = protein biosynthesis |
| A_06_P1967 | OST4 | 10 | BioProcess = not yet annotated |
| A_06_P1049 | ORF:YAL027W | 9 | BioProcess = biological_process unknown |
| A_06_P1003 | COX1 | 8 | BioProcess = aerobic respiration |
| A_06_P2023 | ORF:YDR036C | 7 | BioProcess = biological_process unknown |
| A_06_P1023 | ORF:Q0297 | 8 | BioProcess = biological_process unknown |
Fig 1Two-dimensional plot related to the first two principal components.
GSM1968101, GSM1968110, GSM1968100 and GSM1968108 are samples related to Mg2+ supplementation. GSM1968106, GSM1968114, GSM1968103 and GSM1968112 are samples related to Cu2+ supplementation.
Fig 2The heatmap related to 171 probe sets which were recognized by at least 5 attribute weighting algorithms (AWAs).
Each row corresponds to the different samples including Mg2+ (high ethanol production) and Cu2+ supplementation (repressed ethanol production). Columns exhibits hierarchically clustered probe sets. The normalized intensity expressions of probe sets were shown as a color scale. The up and down-expression levels were represented as red and blue scales, respectively.
KEGG enrichment analysis of 171 probe sets.
The significant pathways with adjusted p-value < 0.1 are represented.
| Term | Adjusted p-value | Genes |
|---|---|---|
| Porphyrin and chlorophyll metabolism | 0.000582985 | HEM2;HEM12;CYC3;YFH1 |
| Oxidative phosphorylation | 0.007415084 | OLI1;QCR6;ATP6;COX1;ATP2 |
| Endocytosis | 0.007415084 | CAP1;APL3;LAS17;ARC15;VPS25 |
| RNA degradation | 0.025376563 | POP2;RRP42;SSQ1;CCR4 |
| Meiosis | 0.049279656 | CLN3;HMRA2;MSN4;APC9;TPD3 |
| Autophagy | 0.049279656 | KCS1;VPS8;MSN4;PEP4 |
| Ubiquitin mediated proteolysis | 0.049279656 | UBC13;UBC6;APC9 |
| Protein processing in endoplasmic reticulum | 0.049279656 | OST4;UBC6;PDI1;SSE2 |
| Glycolysis / Gluconeogenesis | 0.064176371 | PDA1;PGM2;ADH5 |
| Galactose metabolism | 0.072088565 | GAL7;PGM2 |
| Phosphatidylinositol signaling system | 0.072088565 | KCS1;PKC1 |
| Amino sugar and nucleotide sugar metabolism | 0.075323668 | GAL7;PGM2 |
| Spliceosome | 0.075323668 | PRP43;ECM2;PRP8 |
| MAPK signaling pathway | 0.072088565 | TUP1;MKC7;MSN4;PKC1 |
| Pentose phosphate pathway | 0.075323668 | SOL4;PGM2 |
| Alanine, aspartate and glutamate metabolism | 0.075323668 | GDH3;NIT3 |
| Cell cycle | 0.075323668 | CLN3;TUP1;APC9;TPD3 |
| Citrate cycle (TCA cycle) | 0.075323668 | PDA1;LSC2 |
| Ribosome biogenesis in eukaryotes | 0.075323668 | UTP15;CKB1;RIO1 |
| Glycine, serine and threonine metabolism | 0.075323668 | SER1;CYS3 |
Fig 3The regulatory clustering heatmap related to genes targeted by identified transcription factors Hap4p and Tup1 and TOS8.
The cluster is represented as the log mRNA ratio of each target gene in each regulator mutant.
Fig 4The schematic illustrates the methodology of the study with summarized results.