| Literature DB >> 31856731 |
Dongwon Kang1, Hongryul Ahn1, Sangseon Lee1, Chai-Jin Lee2, Jihye Hur3, Woosuk Jung4, Sun Kim5,6,7.
Abstract
BACKGROUND: Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built.Entities:
Keywords: Arabidopsis; Machine learning; Stress; Time-series; Transcriptome
Mesh:
Year: 2019 PMID: 31856731 PMCID: PMC6923958 DOI: 10.1186/s12864-019-6283-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Dataset statistic summary. The number of stress types (left) and the frequency of time points (right) in the 138 sample time-series gene expression data of four stress types
Fig. 2StressGenePred’s twin neural network model architecture. The StressGenePred model consists of two submodels: a biomarker gene discovery model (left) and a stress type prediction model (right). The two submodels share a “single NN layer”. Two gray boxes on the left and right models output the predicted results, biomarker gene and stress type, respectively
Fig. 3Biomarker gene discovery model. This model predicts biomarker genes from a label vector of stress type. It generates an observed biomarker gene vector from gene expression data (left side of the figure) and a predicted biomarker gene vector from stress type (right side of the figure), and adjusts the weights of the model by minimizing the difference (“output loss” at the top of the figure)
Fig. 4Stress type prediction model. This model predicts stress types from a vector of gene expression profile. It generates a predicted stress type vector (left side of the figure) and compares it with a stress label vector (right side of the figure) to adjust the weights of the model by minimizing the CMCL loss (“output loss” at the top of the figure)
Result of stress type prediction
| Methods | Accuracy |
|---|---|
| StressGenePred+FC | 0.963 |
| RF+FC | 0.961 |
| SVM+FC | 0.945 |
| StressGenePred+limma | 0.821 |
| RF+limma | 0.853 |
| SVM+limma | 0.813 |
Three stress type prediction models, StressGenePred (our model), random forest (RF) and support vector machine (SVM), are compared combined with two feature embedding models, fold change (FC) and limma
Fig. 5Stress type prediction result. Above GSE64575-NT are cold stress samples and the rest are heat stress samples. E-MEXP-3714-ahk2ahk3 and E-MEXP-3714-NT samples are predicted wrong in our model, but they are not perfectly predicted wrong because they are treated to both salt and cold stress [14]
Gene rank comparison
| Stress type | Gene name | Gene symbol | Our method | Fisher method |
|---|---|---|---|---|
| Heat | ATHSP101 | 2 | 11 | |
| NAD4L | 5 | 44 | ||
| AT4G10250 | ATHSP22.0 | 11 | 9 | |
| AT4G27670 | HSP21 | 12 | 7 | |
| Hsp70b | 14 | 16 | ||
| HSP70T-2 | 16 | 22 | ||
| AT4G25200 | ATHSP23.6-MI | 17 | 3 | |
| NAD9 | 19 | 54 | ||
| MTHSC70-2 | 26 | 51 | ||
| AT5G12020 | HSP17.6II | 34 | 1 | |
| AT5G37670 | 36 | 21 | ||
| AT2G26150 | HSFA2 | 40 | 27 | |
| Cold | AT1G09350 | AtGolS3 | 3 | 1 |
| RAP2.1 | 4 | 29 | ||
| AT2G16890 | 18 | 16 | ||
| UGT78D3 | 21 | 68 | ||
| FP6 | 28 | 35 | ||
| ADS2 | 35 | 116 | ||
| FRO3 | 38 | 195 | ||
| Salt | 1 | 2 | ||
| AT1G52690 | LEA7 | 2 | 1 | |
| AT5G59220 | HAI7 | 4 | 4 | |
| AtLEA4-5 | 6 | 11 | ||
| RAP2.6 | 10 | 12 | ||
| AtMYB74 | 18 | 50 | ||
| ALDH7B4 | 21 | 62 | ||
| ABI2 | 23 | 77 | ||
| Rap2.6L | 26 | 36 | ||
| AT1G52890 | NAC019 | 28 | 15 | |
| NAC047 | 29 | 73 | ||
| AT3G48520 | CYP94B3 | 31 | 27 | |
| CYP707A1 | 33 | 75 | ||
| AT1G07430 | HAI2 | 36 | 16 | |
| Drought | AT2G46680 | ATHB-7 | 3 | 1 |
| NAC019 | 4 | 15 | ||
| CYP89A9 | 11 | 271 | ||
| HIS1-3 | 12 | 21 | ||
| SAUR63 | 13 | 53 | ||
| AGL109 | 21 | 2002 | ||
| GAMMA-VPE | 23 | 426 | ||
| PDCB3 | 25 | 778 | ||
| GolS2 | 31 | 33 | ||
| MEE3 | 38 | 855 | ||
| BRS1 | 39 | 468 |
The 44 known biomarker genes with high EST profiles are collected. In comparison of our method (StressGenePred) with Fisher method, 30 of 44 known biomarker genes (bold) are ranked higher in the result of our method than the Fisher method
Rank comparison of multiple stress-responsive genes
| Genename | GOTerm | Rank of our model | Rank of fisher method |
|---|---|---|---|
| AT2G47180 | heat,cold | heat(243), cold(500) | heat(39), cold(164) |
| AT5G37770 | heat,cold | heat(2007), cold(3414) | heat(1878), cold(2510) |
| AT5G57560 | heat,cold | heat(1357), cold(1428) | heat(235), cold(627) |
| AT5G58070 | heat,cold | heat(693), cold(111) | heat(258), cold(167) |
| AT5G59820 | heat,cold | heat(1069), cold(512) | heat(234), cold(128) |
| AT2G47180 | heat,salt | heat(243), salt(842) | heat(39), salt(722) |
| AT3G09350 | heat,salt | heat(61), salt(1341) | heat(35), salt(1712) |
| AT1G01060 | cold,salt | salt(1762), cold(1342) | salt(1578), cold(298) |
| AT2G17840 | cold,salt | salt(120), cold(247) | salt(279), cold(34) |
| AT2G19450 | cold,salt | salt(1201), cold(86) | salt(700), cold(162) |
| AT2G38470 | cold,salt | salt(234), cold(4958) | salt(142), cold(3504) |
| AT2G42540 | cold,salt | salt(257), cold(79) | salt(538), cold(23) |
| AT2G46830 | cold,salt | salt(506), cold(267) | salt(338), cold(31) |
| AT2G47180 | cold,salt | salt(842), cold(500) | salt(722), cold(1642) |
| AT3G23830 | cold,salt | salt(2516), cold(3530) | salt(1590), cold(2493) |
| AT3G48360 | cold,salt | salt(1007), cold(1968) | salt(111), cold(447) |
| AT5G23860 | cold,salt | salt(1280), cold(320) | salt(2527), cold(449) |
| AT5G52300 | cold,salt | salt(43), cold(2982) | salt(38), cold(1327) |
| AT5G52310 | cold,salt | salt(10), cold(333) | salt(6), cold(4) |
| AT5G58670 | cold,salt | salt(291), cold(2148) | salt(634), cold(1284) |
| AT4G02380 | cold,drought | drought(1013), cold(416) | drought(136), cold(278) |
To investigate that StressGenePred excludes genes that respond to more than one stress, 21 genes known to respond to more than one stress are collected. Among the 21 genes, 13 genes rank lower in the result of StressGenePred than Fisher method (Table 3)
Fig. 6Visualization of gene expression for multiple stress associated genes. Genes that were investigated to be responsive to multiple stresses. In the visualization results, these genes responded to multiple stresses and were not suitable for biomarker genes of a single stress