| Literature DB >> 31350824 |
Qing-Xia Yang1,2, Yun-Xia Wang1, Feng-Cheng Li1, Song Zhang1, Yong-Chao Luo1, Yi Li1, Jing Tang1,2, Bo Li2, Yu-Zong Chen3, Wei-Wei Xue2, Feng Zhu1,2.
Abstract
AIMS: As one of the most fundamental questions in modern science, "what causes schizophrenia (SZ)" remains a profound mystery due to the absence of objective gene markers. The reproducibility of the gene signatures identified by independent studies is found to be extremely low due to the incapability of available feature selection methods and the lack of measurement on validating signatures' robustness. These irreproducible results have significantly limited our understanding of the etiology of SZ.Entities:
Keywords: reproducibility; schizophrenia; significant analysis of microarray; student's t test; transcriptomics
Mesh:
Year: 2019 PMID: 31350824 PMCID: PMC6698965 DOI: 10.1111/cns.13196
Source DB: PubMed Journal: CNS Neurosci Ther ISSN: 1755-5930 Impact factor: 5.243
Datasets collected from nine independent microarray studies (sorted by sample size)
| ID | Dataset reference | Brodmann's area code | Sample size (SZ:HEA) | Platform ID | Platform description |
|---|---|---|---|---|---|
| A |
| 46 | 65 (34:31) | GPL96 | Affymetrix Human Genome U133A Array (HG‐U133A) |
| B |
| 10/46 | 60 (31:29) | GPL96 | Affymetrix Human Genome U133A Array (HG‐U133A) |
| C |
| 46 | 59 (29:30) | GPL4133 | Whole Human Genome Microarray 4x44K G4112F (Agilent‐014850) |
| D |
| 46 | 54 (25:29) | GPL570 | Affymetrix Human Genome U133 Plus 2.0 Array (HG‐U133 Plus 2) |
| E |
| 10 | 47 (26:21) | GPL570 | Affymetrix Human Genome U133 Plus 2.0 Array (HG‐U133 Plus 2) |
| F |
| 9 | 45 (19:26) | GPL96 | Affymetrix Human Genome U133A Array (HG‐U133A) |
| G |
| 46 | 32 (13:19) | GPL570 | Affymetrix Human Genome U133 Plus 2.0 Array (HG‐U133 Plus 2) |
| H |
| 10/46 | 20 (09:11) | GPL96 | Affymetrix Human Genome U133A Array (HG‐U133A) |
| I |
| 46 | 15 (09:06) | GPL96 | Affymetrix Human Genome U133A Array (HG‐U133A) |
These studies were in vivo investigations conducted within the prefrontal cortex of the postmortem brain tissue. Each dataset contained one cohort of SZ subjects (SZ) and another cohort of healthy individuals (HEA). The study IDs assigned in this table were used to indicate those nine datasets throughout the manuscript.
The reproducibility of two popular feature selection methods (Student's t test and SAM) and the new strategy proposed in this study
| Eight datasets used as the test dataset | Measure | This study | Student's | SAM |
|---|---|---|---|---|
| Consistency score among nine signatures discovered by different methods | 429 | 50 | 82 | |
| B: | ACC (%) | 77.4 | 56.7 | 60.0 |
| MCC | 0.53 | 0.15 | 0.21 | |
| C: | ACC (%) | 64.4 | 69.5 | 64.4 |
| MCC | 0.36 | 0.45 | 0.36 | |
| D: | ACC (%) | 75.9 | 63.0 | 61.1 |
| MCC | 0.52 | 0.28 | 0.25 | |
| E: | ACC (%) | 66.0 | 68.1 | 59.6 |
| MCC | 0.38 | 0.43 | 0.23 | |
| F: | ACC (%) | 64.4 | 51.1 | 53.3 |
| MCC | 0.35 | 0.16 | 0.24 | |
| G: | ACC (%) | 87.5 | 68.8 | 62.5 |
| MCC | 0.76 | 0.46 | 0.38 | |
| H: | ACC (%) | 85.0 | 65.0 | 65.0 |
| MCC | 0.72 | 0.45 | 0.45 | |
| I: | ACC (%) | 73.3 | 66.7 | 66.7 |
| MCC | 0.44 | 0.39 | 0.29 | |
The consistency and reproducibility were assessed using CSs among gene signatures discovered from nine independent datasets and the ACCs and MCCs for study A (with the largest sample size) to the remaining eight datasets (Table 1).
Figure 1The effect of feature selection methods on reproducibility. Comparisons between A, the newly proposed strategy of this study and SAM; B, this study and Student's t test (t test); and C, t test and SAM are shown. The size of the square indicates the relative weight assigned to the corresponding study in this analysis. The error bars represent 95% confidence interval of the effect. The analysis revealed significant increase in reproducibility when new strategy was employed compared with traditional methods, as shown in (A) and (B), while no significant difference in reproducibility was observed between t test and SAM
Figure 2Reproducibility assessed by MCCs for each of the nine studies (A‐I) to the remaining eight. The statistical significance of differences among the three methods (this study, t test, and SAM) was calculated, and significant differences were observed (* and ** indicated the P‐values <.05 and <.01, respectively). The IDs of the nine studies (A‐I) are defined in Table 1