| Literature DB >> 30755676 |
Arika Fukushima1, Masahiro Sugimoto2,3,4, Satoru Hiwa1, Tomoyuki Hiroyasu5.
Abstract
INF-β has been widely used to treat patients with multiple sclerosis (MS) in relapse. Accurate prediction of treatment response is important for effective personalization of treatment. Microarray data have been frequently used to discover new genes and to predict treatment responses. However, conventional analytical methods suffer from three difficulties: high-dimensionality of datasets; high degree of multi-collinearity; and achieving gene identification in time-course data. The use of Elastic net, a sparse modelling method, would decrease the first two issues; however, Elastic net is currently unable to solve these three issues simultaneously. Here, we improved Elastic net to accommodate time-course data analyses. Numerical experiments were conducted using two time-course microarray datasets derived from peripheral blood mononuclear cells collected from patients with MS. The proposed methods successfully identified genes showing a high predictive ability for INF-β treatment response. Bootstrap sampling resulted in an 81% and 78% accuracy for each dataset, which was significantly higher than the 71% and 73% accuracy obtained using conventional methods. Our methods selected genes showing consistent differentiation throughout all time-courses. These genes are expected to provide new predictive biomarkers that can influence INF-β treatment for MS patients.Entities:
Year: 2019 PMID: 30755676 PMCID: PMC6372673 DOI: 10.1038/s41598-018-38441-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The concepts of prediction using gene expression data. (a) Genes identified by single time-point data. (b) Genes showing inconsistent differentiation between the current and the future time-points. (c) Genes showing consistent differentiation throughout the data across multiple time-points.
Figure 2The concept of the proposed method. (a) Creating the gene pool by SS using gene expression data at each time-point. (b) Gene selection (GS) using candidate genes and calculating selected probability (SP). GS used Elastic net with SS assigning weights (w1~t) to gene expression data at each time-point. (c) Identifying the genes from gene lists (GLs) with SP. (d) The flow of the third step in the proposed method. In this step, prediction models were evaluated by using LOO (time-point for model construction = time-point for prediction) and utilizing different data (time-point for model construction ≠ time-point for prediction).
Summary of gene expression datasets of INF-β treatments for MS patients.
| Name of dataset | Dataset A | Dataset B |
|---|---|---|
| GEO ID | GSE19285 | GSE24427 |
| Type of INF-β | Intramuscular Interferon beta 1a | Subcutaneous Interferon beta 1a |
| Time-points | first (t1), Second (t2), fifth (t3) | first (t1), Second (t2), 1 month (t3) |
| Number of good responders | 15 | 16 |
| Number of poor responders | 9 | 9 |
| Number of genes | 11220 | 13513 |
| Gene expression | Peripheral blood mononuclear cells | Peripheral blood mononuclear cells |
| Platform | Affymetrix Human Genome U133A Array | Affymetrix Human Genome U133A Array |
| Preprocessing for microarray | MAS5.0 | MAS5.0 |
In this paper, symbols for time-points were presented as “t1”, “t2”, “t3”, etc.
Accuracy of prediction models by the proposed method and conventional methods with dataset.
| Method | Accuracy [%] | |||
|---|---|---|---|---|
| t1 | t2 | t3 | Mean (ACCmean) | |
| Proposed method | (88) |
|
|
|
| Conventional method | (100) |
| 79 | 83 |
|
| (100) |
|
| |
|
| 83 | (100) | 79 | |
A. Values in () were calculated by leave-one-out at the time-point of data used by the prediction model. “” indicates the top accuracy at each time-point, but top accuracy of t1 was not presented as gene expression data at t1 was used data by the proposed method. “Accuracy” was the minimum accuracy (ACCmin) of each method.
Accuracy of prediction models by the proposed method and conventional methods with dataset B.
| Method | Accuracy [%] | |||||
|---|---|---|---|---|---|---|
| t1 | t2 | t3 | t4 | t5 | Mean (ACCmean) | |
| Proposed method | (96) |
|
|
|
|
|
| Conventional method | (92) | 68 | 84 | 76 |
| 77 |
| 72 | (84) | 76 |
| 64 | 71 | |
| 72 |
| (96) | 80 |
| 81 | |
| 72 | 64 | 68 | (100) |
| 69 | |
| 68 |
| 76 | 72 | (96) | 74 | |
Values in () were calculated by leave-one-out at the time-point of data used by the prediction model. “” indicates the top accuracy of each time-point, but top accuracy of t1 was not presented as gene expression data at t1 were used by the proposed method. “Accuracy” indicates the minimum accuracy (ACCmin) of each method.
Figure 3Prediction accuracies and ROC curves obtained by bootstrap sampling. (a) Prediction accuracies for dataset A. The accuracies are the mean accuracies of different time-points (TPs) obtained without using the prediction model. (b) Prediction accuracies for dataset B. As with (a), the accuracies are the mean accuracies. (c) ROC curve generated by the proposed method (PM) at each time-point of dataset A. The AUC and 95% confidence interval (CI) were calculated by ROC curves at each time-point. (d) ROC curve generated by PM at each time-point of dataset B. As with (b), the AUC and 95% CI were calculated.
Identified genes of dataset A by the proposed method.
| Gene symbol | Gene name | P value | Higher GE levels at all time-points | ||
|---|---|---|---|---|---|
| t1 | t2 | t3 | |||
|
| Zinc Finger and BTB Domain Containing 16 | 0.064 | 0.137 | good | |
|
| ZFP37 Zinc Finger Protein | 0.070 | 0.220 | — | |
|
| HPS5, Biogenesis of Lysosomal Organelles Complex 2 Subunit 2 | 0.084 | poor | ||
|
| HOP Homeobox | 0.105 | 0.090 | good | |
|
| ADP Ribosylation Factor GTPase Activating Protein 3 | 0.162 | 0.105 | good | |
|
| Calmodulin Like 5 | 0.077 | 0.126 | good | |
|
| VPS26, Retromer Complex Component A | 0.090 | 0.205 | good | |
|
| Solute Carrier Family 5 Member 4 | 0.190 | 0.190 | good | |
|
| Mannose Binding Lectin 2 | 0.149 | 0.640 | — | |
|
| DLG Associated Protein 4 | 0.115 | 0.390 | good | |
|
| Calcium Voltage-Gated Channel Subunit Alpha1 C | 0.064 | 0.382 | 0.390 | poor |
P values were adjusted using the BH method, and “Bold accuracy” exhibited significantly different gene expression (GE) levels between good and poor responders (p < 0.05). If GE levels of good responders at each gene were higher than those of poor responders at all time-points (TPs), “good” was represented in the final column.
Identified gene list of dataset B by the proposed method.
| Gene symbol | Gene name | P value | Higher GE levels of all TPs | ||||
|---|---|---|---|---|---|---|---|
| t1 | t2 | t3 | t4 | t5 | |||
|
| Survival of Motor Neuron 1, Telomeric | 0.072 | 0.250 | 0.082 | 0.082 | good | |
|
| MicroRNA 7114/NMDA Receptor Synaptonuclear Signaling and Neuronal Migration Factor | 0.072 | 0.082 | 0.130 | 0.314 | good | |
|
| LSM8 Homolog, U6 Small Nuclear RNA Associated | 0.452 | 0.082 | 0.082 | 0.441 | — | |
|
| Flavin Adenine Dinucleotide Synthetase 1 | 0.071 | 0.344 | 0.056 | 0.072 | poor | |
|
| RRN3 Homolog, RNA Polymerase I Transcription Factor Pseudogene 1 | 0.419 | 0.179 | 0.082 | 0.082 | poor | |
|
| RAS Like Family 10 Member A | 0.334 | 0.344 | 0.452 | 0.314 | — | |
|
| Immediate Early Response 3 Interacting Protein 1 | 0.115 | 0.072 | 0.216 | 0.082 | poor | |
|
| Cadherin 2 | 0.250 | 0.397 | 0.082 | good | ||
P values were adjusted using BH method, and “Bold accuracy” represents significantly different gene expression (GE) levels between good and poor responders (p < 0.05). If GE levels of good responders at each gene were higher than those of poor at all time-points (TPs), “good” was represented in the final column.
Figure 4Gene expression levels of good and poor responders at each time-point. Expression levels of HPS5 in dataset A (a) and CDH2 in dataset B (b). Wilcoxon rank sum test. *FDR-corrected p < 0.05.