| Literature DB >> 18559105 |
Ning Jiang1, Lindsey J Leach, Xiaohua Hu, Elena Potokina, Tianye Jia, Arnis Druka, Robbie Waugh, Michael J Kearsey, Zewei W Luo.
Abstract
BACKGROUND: Affymetrix high density oligonucleotide expression arrays are widely used across all fields of biological research for measuring genome-wide gene expression. An important step in processing oligonucleotide microarray data is to produce a single value for the gene expression level of an RNA transcript using one of a growing number of statistical methods. The challenge for the researcher is to decide on the most appropriate method to use to address a specific biological question with a given dataset. Although several research efforts have focused on assessing performance of a few methods in evaluating gene expression from RNA hybridization experiments with different datasets, the relative merits of the methods currently available in the literature for evaluating genome-wide gene expression from Affymetrix microarray data collected from real biological experiments remain actively debated.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18559105 PMCID: PMC2442103 DOI: 10.1186/1471-2105-9-284
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistical analyses involved in the seven different methods for calculating gene expression.
| Methods | Background Correction | Normalization | Core Statistical Analysis | References |
| AD | None | Invariant Set | Average difference | Affymetrix [2] |
| MAS5.0 | Spatial effect and MM subtracted | Constant | Robust average (Tukey bi-weight) | Affymetrix [2] |
| MBEI (PM only) | None | Invariant Set | Multiplicative model | Li and Wong [5] |
| MBEI (PM-MM) | MM intensities are subtracted | Invariant Set | Multiplicative model | Li and Wong [5] |
| RMA | Global correction | Quantile | Robust linear model (median polish) | Irizarry et al. [4] |
| PDNN | Model is fitted accounting for background and specific signal | Quantile | Specific and non-specific binding effects are estimated using free energy model | Zhang et al. [7] |
| GCRMA | Based on probe sequence | Quantile | Robust linear model (median polish) | Wu et al. [8] |
Pearson's Product Moment Correlation Coefficients among barley gene expression indices calculated from seven different methods.
| Method | AD | MAS5.0 | MBEI1 | MBEI2 | RMA | GCRMA | PDNN |
| AD | 0.975 ± 0.004 | 0.988 ± 0.001 | 0.985 ± 0.001 | 0.647 ± 0.007 | 0.791 ± 0.008 | 0.615 ± 0.007 | |
| MAS5.0 | 0.973 | 0.961 ± 0.005 | 0.965 ± 0.003 | 0.619 ± 0.026 | 0.748 ± 0.024 | 0.583 ± 0.026 | |
| MBEI1 | 0.987 | 0.958 | 0.988 ± 0.001 | 0.664 ± 0.011 | 0.797 ± 0.009 | 0.629 ± 0.008 | |
| MBEI2 | 0.985 | 0.963 | 0.988 | 0.643 ± 0.006 | 0.774 ± 0.006 | 0.605 ± 0.006 | |
| RMA | 0.647 | 0.616 | 0.662 | 0.643 | 0.914 ± 0.002 | 0.939 ± 0.008 | |
| GCRMA | 0.791 | 0.744 | 0.797 | 0.774 | 0.914 | 0.923 ± 0.004 | |
| PDNN | 0.614 | 0.581 | 0.628 | 0.604 | 0.940 | 0.923 |
The upper triangle shows the mean and corresponding standard deviation of 24 correlation coefficients, r, (k = 1, 2,..., 24). rrepresents the correlation coefficient between 22,840 corresponding pairs of gene expression indices calculated by methods i and j from the kth microarray sample. The diagonal cells show means and standard deviations of 24 correlation coefficients, r(n = 1, 2, 3 and m = 1, 2,..., 8). For n = 1, 2, 3, rcorresponds to three correlation coefficients calculated from three possible pairs of replicates for the mth cultivar (m = 1, 2,..., 8) using method i. The lower triangle shows the correlation coefficients between all pairs of 22,840 gene expression indices calculated from methods i and j across all k = 24 samples.
1 MBEI PM only model
2 MBEI PM-MM model
Figure 1Statistical properties of estimated barley gene expression indices from seven data extraction methods. (a) Intraclass correlation coefficients between biological replicates of the estimated expression indices for 22,840 genes; (b) Sensitivity for detecting differentially expressed genes; (c) Calibration p-values across FDR levels; and (d) The number of differentially expressed SFP genes (red segment) and non-SFP genes (black segment). For each method the three columns from left to right correspond to FDR levels 0.0001, 0.001 and 0.01. The proportion of genes declared differentially expressed that showed SFP is illustrated for FDR level 0.01.
Mutual predictability of the number of barley genes declared differentially expressed from seven data extraction methods.
| Methods | AD | MAS5.0 | MBEI1 | MBEI2 | RMA | PDNN | GCRMA |
| AD | 10066 | 5984(89%) | 7951(87%) | 8030(93%) | 7566(86%) | 9269(82%) | 6610(90%) |
| MAS5.0 | 5984(59%) | 6716 | 5257(57%) | 5289(61%) | 5614(64%) | 6243(55%) | 5067(69%) |
| MBEI1 | 7951(79%) | 5257(78%) | 9185 | 7674(89%) | 6969(79%) | 8206(72%) | 6048(83%) |
| MBEI2 | 8030(80%) | 5289(79%) | 7674(84%) | 8650 | 6744(76%) | 7830(69%) | 5946(81%) |
| RMA | 7566(75%) | 5614(84%) | 6969(76%) | 6744(78%) | 8824 | 8419(74%) | 6736(92%) |
| PDNN | 9269(92%) | 6243(93%) | 8206(89%) | 7830(91%) | 8419(91%) | 11339 | 6994(96%) |
| GCRMA | 6610(66%) | 5067(75%) | 6048(66%) | 5946(69%) | 6736(76%) | 6994(62%) | 7310 |
The diagonal cells show the number of genes declared from each method respectively at FDR = 0.01. The upper and lower triangles show the numbers and percentages (in parentheses) of the genes declared by method j (j = 1st,...,7th column) and also by method i (i = 1st,...,7th row, i≠j). For example, the 5984 genes in common to AD and MAS5.0 represent 89% of those detected by MAS5.0 but only 59% of those detected by AD.
1 MBEI PM only model
2 MBEI PM-MM model