| Literature DB >> 24886675 |
Alain Sewer1, Sylvain Gubian, Ulrike Kogel, Emilija Veljkovic, Wanjiang Han, Arnd Hengstermann, Manuel C Peitsch, Julia Hoeng.
Abstract
BACKGROUND: High-quality expression data are required to investigate the biological effects of microRNAs (miRNAs). The goal of this study was, first, to assess the quality of miRNA expression data based on microarray technologies and, second, to consolidate it by applying a novel normalization method. Indeed, because of significant differences in platform designs, miRNA raw data cannot be normalized blindly with standard methods developed for gene expression. This fundamental observation motivated the development of a novel multi-array normalization method based on controllable assumptions, which uses the spike-in control probes to adjust the measured intensities across arrays.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24886675 PMCID: PMC4077261 DOI: 10.1186/1756-0500-7-302
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1The spike-in control based normalization (SCN) method. (A) Plot of the raw intensities for the 10 spike-in control probe sets “a”… “j” from the 16 arrays of the Exiqon lung dataset. The probe set intensity values were computed as the median values of the corresponding 48 probe intensities and the error bars correspond to the 2.5th–97.5th percentiles of the corresponding distributions. For a given spike-in control probe set, the “coherent deviations” mentioned in the text are estimated by the size of the between-array range of intensity values, divided by the between-array mean intensity value. (B) Heat map of the Pearson correlation matrix between all pairs of spike-in control probe sets shown in panel A. (C) Ratio between the variance of the raw and normalized intensities for the 10 spike-in control probe sets shown in panels A and B. (D) Raw intensity dependence of the spike-in controls based normalization intensity correction function ΔE computed for the 16 arrays of the lung dataset. The continuous curves represent the intensity correction function ΔE(x,k) defined for the continuous raw intensity values x given by the horizontal axis and the 16 discrete array labels k depicted in the color legend, while the points correspond to normalization intensity corrections ΔE for the 10 spike-in control probe sets j and the 16 array labels k (see the “Spike-in controls based normalization method” section in “Methods”).
Figure 2Quality control metrics using the coefficient of variation (CV). (A) Boxplot of the CVs between the raw intensity values for the four probes mapping to a given probe set (CVwithin), computed for all 595 common mouse miRNA probe sets and for all 16 arrays of the lung dataset, excluding the “absent” detection calls (see Additional file 2 “Supplementary Results”). (B) Boxplot of the CVs between the probe set normalized intensity values from the four biological replicates of a given treatment group (CVbetween), computed for all 595 common mouse miRNA probe sets and for all four treatment groups of the lung dataset, excluding the “absent” detection calls. (C) Boxplot of the ratio between the probe set residual variance and the corresponding modeled treatment response (CVtreat), computed for all 595 common mouse miRNA probe sets for the lung dataset, excluding the “absent” detection calls. (D) Boxplot of the CVs between the probe set normalized intensity values from all 16 arrays of the lung dataset, computed for the 10 Exiqon spike-in control probe sets. The boxplots for each preprocessing pipeline (described in Additional file 1: Table S1) show the values relative to the range of the corresponding CVbetween distribution, which is given by the whiskers in panel B and the interval [0,1] in panel D (relativeCVspike, see the “Coefficients of variation” section in “Methods”). Median, red line; first and third quartiles Q1 and Q3, blue box; distribution range given by the two extreme values within the interval [Q1 − 1.5×(Q3 − Q1), Q3 + 1.5×(Q3 − Q1)], black whiskers; outliers, red crosses outside the whiskers.
Figure 3Differential expression and comparison with RT-qPCR results. (A) Heat map for the t-statistics obtained from the linear model for the treatment response of the expression values of each miRNA. The dendrogram is based on the Euclidean distance between the t-values obtained from the various preprocessing pipelines (described in Additional file 1: Table S1). (B) Scatter plot of the mRNA differential expressions obtained from the normalized data of the SCN vs. AQN preprocessing pipelines. The miRNAs selected for the RT-qPCR experiment are indicated in red. (C) Bar chart of the Spearman’s correlation coefficients between the differential expression of the selected miRNAs obtained by RT-qPCR and those obtained from the preprocessing pipelines. The error bars are the 2.5th–97.5th percentiles of the values obtained from a simple leave-one-out re-sampling approach. (D) Scatter plot of the differential expressions of the selected miRNAs obtained by RT-qPCR and those obtained from the SCN pipeline. The error bars show the 95% confidence intervals calculated as t0.975,6 × (SCN differential expression/SCN t-statistic) and the solid circles represent statistically significant -∆∆CT values (t-test p-value < 0.05). The horizontal axes of Panels B and D are identical, so that the points of the two plots can be matched.