| Literature DB >> 23565283 |
Gabriela Czibula1, Iuliana M Bocicor, Istvan-Gergely Czibula.
Abstract
Temporal modeling and analysis and more specifically, temporal ordering are very important problems within the fields of bioinformatics and computational biology, as the temporal analysis of the events characterizing a certain biological process could provide significant insights into its development and progression. Particularly, in the case of cancer, understanding the dynamics and the evolution of this disease could lead to better methods for prediction and treatment. In this paper we tackle, from a computational perspective, the temporal ordering problem, which refers to constructing a sorted collection of multi-dimensional biological data, collection that reflects an accurate temporal evolution of biological systems. We introduce a novel approach, based on reinforcement learning, more precisely, on Q-learning, for the biological temporal ordering problem. The experimental evaluation is performed using several DNA microarray data sets, two of which contain cancer gene expression data. The obtained solutions are correlated either to the given correct ordering (in the cases where this is provided for validation), or to the overall survival time of the patients (in the case of the cancer data sets), thus confirming a good performance of the proposed model and indicating the potential of our proposal.Entities:
Mesh:
Year: 2013 PMID: 23565283 PMCID: PMC3614992 DOI: 10.1371/journal.pone.0060883
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The artificially generated survival times associated to the samples.
|
|
|
|
|
|
|
|
|
|
| |
|
| 400 | 650 | 60 | 532 | 125 | 21 | 200 | 480 | 310 | 100 |
Randomly generated survival times (in days) associated to the samples from the synthetic data set.
The similarity scores for the samples from the synthetic data set.
| similarity |
|
|
|
|
|
|
|
|
|
|
|
| – | 4 | 3.67 | 3.34 | 2.68 | 3.67 | 5 | 5 | 6 | 3.67 |
|
| 4 | – | 2.67 | 5 | 2.34 | 2.67 | 3.34 | 7 | 4.01 | 3.01 |
|
| 3.67 | 2.67 | – | 1.67 | 5 | 7 | 4.34 | 3.67 | 3.34 | 7 |
|
| 3.34 | 5 | 1.67 | – | 1.67 | 1.68 | 3.68 | 5.01 | 3.34 | 2.35 |
|
| 2.68 | 2.34 | 5 | 1.67 | – | 4 | 5 | 2.34 | 4 | 6 |
|
| 3.67 | 2.67 | 7 | 1.68 | 4 | – | 3.34 | 3.67 | 3 | 6 |
|
| 5 | 3.34 | 4.34 | 3.68 | 5 | 3.34 | – | 3.68 | 7 | 3 |
|
| 5 | 5 | 3.67 | 5.01 | 2.34 | 3.67 | 3.68 | – | 3.34 | 5.01 |
|
| 6 | 4.01 | 3.34 | 3.34 | 4 | 3 | 7 | 3.34 | – | 3.34 |
|
| 3.67 | 3.01 | 7 | 2.35 | 6 | 6 | 3 | 5.01 | 3.34 | – |
Figure 1Synthetic data set results: the learning process.
Illustration of the overall similarity of the solutions obtained during the training process, from 100 to 13000 training epochs. It can be seen how, during the training, the learned solution converges to the optimal one.
Figure 2Synthetic data set results: the recovered ordering and the corresponding survival times.
The ordering recovered by our algorithm agreed well with the survival time following the point when samples were taken.
Time series data sets.
|
| ||
|
|
|
|
| Heat shock | 8 | 5, 10, 15, 20, 30, 40, 60, 80 (min) |
| DTT exposure | 8 | 5, 15, 30, 45, 60, 90, 120, 180 (min) |
| Amino acid starvation | 5 | 0.5, 1, 2, 4, 6 (h) |
| Nitrogen depletion | 10 | 0.5, 1, 2, 4, 8, 12, 24, 48, 72, 120 (h) |
| Diauxic shift | 7 | 9.5, 11.5, 13.5, 15.5, 18.5, 20.5 (h) |
|
| ||
|
|
| |
| 18 | 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98, 105, 112, 119, 126 (min) | |
|
| ||
|
|
|
|
| Wild type 1 | 6 | 0, 30, 60, 120, 240, 480 (min) |
| Wild type 2 | 6 | 0, 30, 60, 120, 240, 480 (min) |
| Mutant 1 | 6 | 0, 30, 60, 120, 240, 480 (min) |
| Mutant 2 | 6 | 0, 30, 60, 120, 240, 480 (min) |
Illustration of the yeast and human time series data sets: the condition the cells were exposed to, the number of time points and the sampling period, in minutes (min) or hours (h).
Results for the time series data sets.
|
| ||||||
|
|
|
|
|
|
|
|
| Heat shock | 1, 2, 3, 4, 5, 6, 7, 8 | 0 |
| 1, 8, 7, 6, 5, 4, 3, 2 | 2 | Yes |
| DTT exposure | 1, 2, 3, 4, 5, 6, 7, 8 | 0 |
| 1, 2, 3, 4, 5, 6, 7, 8 | 0 | Same |
| Amino acid starvation | 1, 2, 3, 4, 5 | 0 |
| 1, 2, 3, 4, 5 | 0 | Same |
| Nitrogen depletion | 4, 3, 2, 1, 5, 6, 7, 8, 9, 10 | 2 |
| 4, 3, 2, 1, 5, 6, 7, 8, 9, 10 | 2 | Same |
| Diauxic shift | 1, 2, 3, 4, 5, 6, 7 | 0 |
| 1, 2, 3, 4, 5, 6, 7 | 0 | Same |
|
| ||||||
| 1, 2, 3, 4, 5, 6, 7, 8, 9, 17, 14, 15, 16, 18, 10, 11, 12, 13 | 5 |
| 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 18, 17, 16, 15, 14, 13, 12, 11 | 2 | No | |
|
| ||||||
| Wild type 1 | 1, 2, 3, 4, 5, 6 | 0 |
| 1, 2, 3, 4, 5, 6 | 0 | Same |
| Wild type 2 | 1, 2, 3, 4, 5, 6 | 0 |
| 3, 2, 1, 5, 4, 6 | 4 | Yes |
| Mutant 1 | 1, 4, 2, 3, 5, 6 | 2 |
| 1, 3, 2, 6, 5, 4 | 4 | Yes |
| Mutant 2 | 1, 2, 3, 4, 5, 6 | 0 |
| 1, 2, 3, 4, 6, 5 | 2 | Yes |
Presentation of the results obtained by our RL based temporal ordering algorithm, the value of the evaluation measure SMD of the ordering, the computational time, other orderings obtained in the literature for the same data sets and their corresponding evaluation measures. The last column specifies specifies whether our method leads to better solutions (in terms of correct known ordering, or of lower values of the evaluation measure), compared to those that have already been reported in the literature.
Computational time of our algorithm, in seconds.
“Imprv.” is the abbreviation for “Improvement”, specifying whether our method obtained an improvement compared to the other methods existing in the literature.
Figure 3Recovered temporal orderings and survival times for the high-grade glioma data set.
The figure on the left corresponds to the glioblastomas data set, while the one on the right illustrates the results for the anaplastic oligodendroglioma data set. It can be observed that, in both cases, the samples in the right half belong to patients whose survival times are lower, while the ones in the left half belong to patients having higher survival times.
Results for the high-grade glioma data sets.
| Data set | Number of samples (time points) | Average of the left half of the ordering | Average of the right half of the ordering | Computational time (min) |
| Glioblastomas | 28 | 696.64 | 288.14 | ∼30 |
| Anaplastic oligodendrogliomas | 22 | 1057.00 | 418.90 | ∼20 |
For both the glioblastomas and the anaplastic oligodendroglioma data sets the average survival time value of the left half is significantly higher than the average value of the right half. The last column of this table indicates the computational times of out RL algorithm.