| Literature DB >> 26473061 |
Brett Trost1, Catherine A Moir2, Zoe E Gillespie3, Anthony Kusalik1, Jennifer A Mitchell4, Christopher H Eskiw5.
Abstract
DNA microarrays and RNA sequencing (RNA-seq) are major technologies for performing high-throughput analysis of transcript abundance. Recently, concerns have been raised regarding the concordance of data derived from the two techniques. Using cDNA libraries derived from normal human foreskin fibroblasts, we measured changes in transcript abundance as cells transitioned from proliferative growth to quiescence using both DNA microarrays and RNA-seq. The internal reproducibility of the RNA-seq data was greater than that of the microarray data. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarray values were moderate. The two technologies had good agreement when considering probes with the largest (both positive and negative) fold change (FC) values. An independent technique, quantitative reverse-transcription PCR (qRT-PCR), was used to measure the FC of 76 genes between proliferative and quiescent samples, and a higher correlation was observed between the qRT-PCR data and the RNA-seq data than between the qRT-PCR data and the microarray data.Entities:
Keywords: RNA-seq; fibroblasts; gene expression; microarrays; transcriptome analysis
Year: 2015 PMID: 26473061 PMCID: PMC4593695 DOI: 10.1098/rsos.150402
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Differing reproducibility of microarray FC values. (Correlations between FC values (QUI/PRO) are shown for each pair of microarrays. The values in the upper diagonal contain the Pearson correlations, while those in the lower diagonal contain the Spearman correlations. Values not in parentheses represent correlations between untransformed FC values, while those in parentheses represent correlations between log-transformed FC values. As log transformation does not change the rank order, only one number is shown for the Spearman correlation for each pair. Correlations varied substantially depending on the pair of microarrays and the correlation metric used, ranging from −0.55 to 0.74.)
| QP1 | QP2 | QP3 | QP4 | |
|---|---|---|---|---|
| QP1 | — | 0.00 (−0.46) | 0.68 (0.70) | −0.01 (−0.41) |
| QP2 | −0.34 | — | −0.01 (−0.55) | 0.71 (0.74) |
| QP3 | 0.65 | −0.43 | — | −0.01 (−0.44) |
| QP4 | −0.30 | 0.68 | −0.33 | — |
Figure 1.Differing reproducibility of microarray FC values. The log-transformed FC values from some pairs of microarrays were consistent with one another, while negative correlations were observed for other pairs. Panel (a) shows the relationship between the log-transformed FC values from microarray QP2 and those from microarray QP4, which exhibited a moderate to strong correlation (r=0.74). By contrast, panel (b) shows the relationship between the log-transformed FC values from microarray QP1 and those from microarray QP4, which had a negative correlation (r=−0.41).
High reproducibility of RNA-seq read counts, and moderate reproducibility of RNA-seq FC values. (The correlations between read counts (PRO1 versus PRO2 and QUI1 versus QUI2) and FC values (QUI1/PRO1 versus QUI2/PRO2) are shown. Except for the Pearson correlations between non-log-transformed values, correlations between read counts were similar in magnitude to the correlations observed between microarray intensity values (electronic supplementary material, table S1). Correlations between FC values were close to those observed in the most highly correlated pairs of microarrays.)
| read counts | FC values | ||
|---|---|---|---|
| correlation | QUI | PRO | QUI/PRO |
| Pearson | 0.58 | 0.35 | 0.77 |
| Pearson (log) | 0.94 | 0.94 | 0.70 |
| Spearman | 0.93 | 0.93 | 0.59 |
Figure 2.Moderate reproducibility of RNA-seq FC values. The scatterplot shows that there was a moderate to strong linear relationship between the log-transformed FC values for QUI1/PRO1 and those for QUI2/PRO2 (r=0.70).
Low concordance between RNA-seq data and DNA microarray data. (For each cell state (PRO and QUI), reads from the two RNA-seq replicates were pooled to give a single read count for each probe. Concordance was determined using both correlation between reads counts (for the RNA-seq data) and intensity values (for the microarray data), and between FC values (QUI/PRO). Correlations between read counts and intensity values were low, ranging from 0.18 to 0.41, as were correlations between FC values, which ranged from 0.02 to 0.23. ‘All’ represents the geometric mean of the FC values of the four microarrays. The correlations between the RNA-seq data and the mean of the four microarrays was better than between the RNA-seq data and any of the individual microarrays.)
| PRO read count versus intensity | QUI read count versus intensity | FC (QUI/PRO) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| correlation | QP1 | QP2 | QP3 | QP4 | QP1 | QP2 | QP3 | QP4 | QP1 | QP2 | QP3 | QP4 | all |
| Pearson | 0.22 | 0.18 | 0.20 | 0.21 | 0.20 | 0.19 | 0.20 | 0.20 | 0.04 | 0.07 | 0.03 | 0.02 | 0.42 |
| Pearson (log) | 0.33 | 0.33 | 0.40 | 0.37 | 0.32 | 0.32 | 0.41 | 0.35 | 0.23 | 0.18 | 0.18 | 0.17 | 0.42 |
| Spearman | 0.30 | 0.30 | 0.38 | 0.33 | 0.29 | 0.29 | 0.40 | 0.33 | 0.21 | 0.18 | 0.17 | 0.16 | 0.34 |
Figure 3.Moderate concordance between the log-transformed RNA-seq FC values and the log-transformed geometric mean of the microarray FC values. The scatterplot shows that there was a moderate linear relationship between these two variables (r=0.42).
Moderate overlap between the probes with the highest FC values in the RNA-seq data and those with the highest FC values in the DNA microarray data. (k represents the size of a given list (the 10, 50, 100, 500 or 1000 probes with the highest FC values), while n represents the number of probes in common between a list from the RNA-seq data and the corresponding list from the DNA microarray. The p-value represents the proportion of 10 000 random trials that had an equal or greater level of overlap than that actually observed. Thus, if none of the random trials had a greater level of overlap, then the p-value is 0. More overlapping probes than would be expected by chance were observed for all microarrays for k=100, 500 and 1000, while some arrays had statistically significant p-values for k=10 and k=50. ‘All’ represents the geometric mean of the FC values of the four microarrays.)
| QP1 | 0 | 1 | 1 | 0.08 | 9 | 0 | 95 | 0 | 190 | 0 |
| QP2 | 1 | 0.004 | 2 | 0.003 | 4 | 0.0003 | 54 | 0 | 97 | 0 |
| QP3 | 0 | 1 | 3 | 0 | 9 | 0 | 87 | 0 | 187 | 0 |
| QP4 | 0 | 1 | 1 | 0.08 | 5 | 0 | 39 | 0 | 85 | 0 |
| all | 2 | 0 | 12 | 0 | 23 | 0 | 131 | 0 | 257 | 0 |
RNA-seq FC values correlate better with qRT-PCR FC values than do microarray FC values, although not to a statistically significant degree. Correlation coefficients are shown between the qRT-PCR FC values for 76 genes, and the FC values for corresponding probes in each individual microarray or in the combined RNA-seq replicates. ‘All’ represents the geometric mean of the FC values of the four microarrays. For all three correlation measures, the RNA-seq correlation was not significantly different (p-value >0.05) from the correlation of any of the microarrays (Fisher's z-transformation).
| microarrays | ||||||
|---|---|---|---|---|---|---|
| correlation | QP1 | QP2 | QP3 | QP4 | All | RNA-seq |
| Pearson | 0.31 | 0.15 | 0.34 | 0.18 | 0.25 | 0.35 |
| Pearson (log) | 0.39 | 0.43 | 0.38 | 0.35 | 0.48 | 0.56 |
| Spearman | 0.39 | 0.45 | 0.42 | 0.34 | 0.44 | 0.56 |