| Literature DB >> 22701729 |
Joaquim Casellas1, Luis Varona.
Abstract
Gene expression data are influenced by multiple biological and technological factors leading to a wide range of dispersion scenarios, although skewed patterns are not commonly addressed in microarray analyses. In this study, the distribution pattern of several human transcriptomes has been studied on free-access microarray gene expression data. Our results showed that, even in previously normalized gene expression data, probe and differential expression within probe effects suffer from substantial departures from the commonly assumed symmetric gaussian distribution. We developed a flexible mixed model for non-competitive microarray data analysis that accounted for asymmetric and heavy-tailed (Student's t distribution) dispersion processes. Random effects for gene expression data were modeled under asymmetric Student's t distributions where the asymmetry parameter (λ) took values from perfect symmetry (λ = 0) to right- (λ>0) or left-side (λ>0) over-expression patterns. This approach was applied to four free-access human data sets and revealed clearly better model performance when comparing with standard approaches accounting for traditional symmetric gaussian distribution patterns. Our analyses on human gene expression data revealed a substantial degree of right-hand asymmetry for probe effects, whereas differential gene expression addressed both symmetric and left-hand asymmetric patterns. Although these results cannot be extrapolated to all microarray experiments, they highlighted the incidence of skew dispersion patterns in human transcriptome; moreover, we provided a new analytical approach to appropriately address this biological phenomenon. The source code of the program accommodating these analytical developments and additional information about practical aspects on running the program are freely available by request to the corresponding author of this article.Entities:
Mesh:
Year: 2012 PMID: 22701729 PMCID: PMC3372486 DOI: 10.1371/journal.pone.0038919
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of the free-access data sets analyzed.
| Platform | Tissue | Groups of comparison (number ofsamples per group) | Reference | GEO | |
| Dataset 1 | Affymetrix GeneChip Human FullLength Array HuGeneFL | Mononuclear cell layer | Non-pulmonary arterial hypertension (6) | Bull et al. | GSE703 |
| Dataset 2 | Affymetrix GeneChip Human FullLength Array HuGeneFL | Bronchoalveolar lavage cells | Non-smoker (5) | Heguy et al. | GSE3212 |
| Dataset 3 | Affymetrix GeneChip Human GenomeU133 Plus 2.0 Array | Spermatozoa | Normal (12) | Platts et al. | GSE6969 |
| Dataset 4 | Illumina humanRef-8 v2.0 expressionbeadchip | Carotid endarterectomy samples | Carotid artery stenosis treated withmycophenolate (9) | Unpublished | GSE13922 |
The approximate number of interrogated transcripts were 5,000, 47,000 and 16,000 for Affymetrix GeneChip Human Full Length Array HuGeneFL (Affymetrix, Inc., Santa Clara, CA), Affymetrix GeneChip Human Genome U133 Plus 2.0 Array (Affymetrix, Inc., Santa Clara, CA) and Illumina human Ref-8 v2.0 expression beadchip (Illumina, Inc., San Diego, CA), respectively.
Gene Expression Omnibus accession number (http://www.ncbi.nlm.nih.gov/geo/).
Model comparison and characterization of the dispersion patter of probe and differential expression within probe under Model AT.
| Dataset 1 | Dataset 2 | Dataset 3 | Dataset 4 | |
| DIC | ||||
| Model SG(c) | 284,161 (189) | 231,581 (5) | 3,692,344 (702) | 1,756,122 (12) |
| Model ST(d) | 247,509 (31) | 224,741 (1) | 3,614,823 (692) | 1,734,053 (4) |
| Model AT(e) | 242,831 (2) | 188,835 (0) | 3,554,488 (639) | 1,724,667 (2) |
| Parameters | ||||
|
| 8.95 (4.21 to 26.61) | 5.62 (4.16 to 11.05) | 6.77 (4.15 to 16.0) | 8.87 (4.40 to 30.09) |
| λ | 0.38 (0.04 to 0.66) | 0.13 (0.01 to 0.32) | 1.84 (1.61 to 1.93) | 2.03 (1.98 to 2.09) |
|
| 7.36 (4.18 to 18.05) | 6.90 (4.15 to 19.76) | 5.99 (4.38 to 11.51) | 8.48 (4.66 to 23.90) |
| λ | 0.01 (−0.04 to 0.06) | −0.00 (−0.04 to 0.04) | −1.88 (−1.96 to −1.81) | −0.00 (−0.01 to 0.01) |
Deviance information criterion.
Differentially expressed genes after Bonferroni [29]-like correction (α = 0.05). The adjusted significance threshold for posterior probabilities was calculated as α/π, were π was the number of probes included in each analysis.
Random effects g and d(g) were assumed as symmetric Gaussian(c), symmetric Student’s t (d) or asymmetric Student’s t (e) distributed following Sahu et al. [18].
Degrees of freedom (v) and asymmetry parameter () for probe (p) and differential expression within probe (d) effects.
Figure 1Distribution of mean estimates for probe (a) and differential expression within-probe (b) effects under Model SG (grey line) and Model AT (black line) for data set 3.