| Literature DB >> 20377890 |
Gregor Stiglic1, Mateja Bajgot, Peter Kokol.
Abstract
BACKGROUND: Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely different platforms for gene expression measurements. This study introduces a novel methodology for gene-ranking stability analysis that is applied to the evaluation of gene-ranking reproducibility on NGS and microarray data.Entities:
Mesh:
Year: 2010 PMID: 20377890 PMCID: PMC3098055 DOI: 10.1186/1471-2105-11-176
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Basic information on MAQC sample A vs B data sets
| Name | Platform | Number of A/B samples | Expression measurements | Common mapped genes |
|---|---|---|---|---|
| AFX 1 | Affymetrix HG-U133 Plus 2.0 | 5/5 | 54,675 | 15,578 |
| AFX 2 | Affymetrix HG-U133 Plus 2.0 | 5/5 | 54,675 | 15,578 |
| AFX 3 | Affymetrix HG-U133 Plus 2.0 | 5/5 | 54,675 | 15,578 |
| AFX 4 | Affymetrix HG-U133 Plus 2.0 | 5/5 | 54,675 | 15,578 |
| AFX 5 | Affymetrix HG-U133 Plus 2.0 | 5/5 | 54,675 | 15,578 |
| AFX 6 | Affymetrix HG-U133 Plus 2.0 | 5/5 | 54,675 | 15,578 |
| TSEQ | Roche 454 Genome Sequencer | 5/7 | 24,655 | 15,578 |
| ODT | Roche 454 Genome Sequencer | 5/5 | 24,655 | 15,578 |
Gene selection methods used in calculating percentage of overlapping genes
| Selection method | Short name | Reference |
|---|---|---|
| T-statistic | TTest | Boulesteix and Slawski, 2009 [ |
| Fold change | FC | Boulesteix and Slawski, 2009 [ |
| Wilcoxon statistic | Wilcoxon | Boulesteix and Slawski, 2009 [ |
| Welch T-statistic | WelchT | Boulesteix and Slawski, 2009 [ |
| Bayesian t-statistic 1 | BaldiLong | Baldi and Long, 2001 [ |
| Bayesian t-statistic 2 | FoxDimmic | Fox and Dimmic, 2006 [ |
| Shrinkage t-statistic | ShrinkageT | Opgen-Rhein and Strimmer, 2007 [ |
| Soft-threshold t-statistic | SoftthresholdT | Wu, 2005 [ |
| Parametric empirical Bayes | Limma | Smyth, 2004 [ |
| Nonparametric empirical Bayes | Ebam | Efron et al., 2001 [ |
| Permutation test | Permutation |
Figure 1GSE-MLA workflow. Workflow of the GSE-MLA procedure describing the process from the initial data sets (e.g. next-generation sequencing vs. microarrays) to the final decision tree model.
Figure 2POG scores. Comparison of POG scores for microarray AFX1 (red), ODT (green), and TSEQ (blue) data sets.
Figure 3Heat map of top ranked genes. Heat map of the top 50 ranked genes using fold change gene selection where similarity of gene ranks is observed.
Top 5 gene sets from TSEQ with corresponding ranks for enrichment in sample A
| Gene set (phenotype A) | TSEQ | AFX1 | AFX2 | AFX3 | AFX4 | AFX5 | AFX6 | ODT |
|---|---|---|---|---|---|---|---|---|
| PENG_GLUTAMINE_DN | 1 | 3 | 3 | 3 | 3 | 3 | 5 | 5 |
| PENG_LEUCINE_DN | 2 | 11 | 11 | 9 | 11 | 9 | 12 | 13 |
| TARTE_PLASMA_BLASTIC | 3 | 6 | 4 | 4 | 4 | 5 | 3 | 22 |
| CHANG_SERUM_RESPONSE_UP | 4 | 8 | 8 | 8 | 8 | 8 | 13 | 28 |
| BHATTACHARYA_ESC_UP | 5 | 5 | 7 | 7 | 6 | 7 | 7 | 7 |
Top five gene sets from TSEQ with corresponding ranks for enrichment in sample B
| Gene set (phenotype B) | TSEQ | AFX1 | AFX2 | AFX3 | AFX4 | AFX5 | AFX6 | ODT |
|---|---|---|---|---|---|---|---|---|
| CALCIUM_REGULATION_IN_CARDIAC_CELLS | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 |
| HSA04020_CALCIUM_SIGNALING_PATHWAY | 2 | 7 | 5 | 4 | 3 | 3 | 4 | 16 |
| HSA04912_GNRH_SIGNALING_PATHWAY | 3 | 4 | 4 | 3 | 6 | 4 | 7 | 20 |
| HSA04740_OLFACTORY_TRANSDUCTION | 4 | 5 | 3 | 6 | 5 | 7 | 5 | 2 |
| HDACPATHWAY | 5 | 10 | 7 | 7 | 8 | 8 | 10 | 3 |
Results of GSE-MLA performance on three pairwise comparisons
| J48 | SimpleCART | ADTree | ||||
|---|---|---|---|---|---|---|
| GSE-MLA Comparison | ACC | AUC | ACC | AUC | ACC | AUC |
| ODT vs. AFX1 | 89.50 | 92.31 | 91.00 | 88.98 | 100.00 | 100.00 |
| TSEQ vs. AFX1 | 90.50 | 91.72 | 90.50 | 94.27 | 100.00 | 100.00 |
| TSEQ vs. ODT | 66.50 | 73.92 | 83.00 | 83.28 | 99.00 | 99.94 |
Figure 4Representation of GSE-MLA results using ADTree. ADTree explaining the significant differences in gene set enrichment between ODT and TSEQ sample preparation.