| Literature DB >> 30104565 |
Ayush T Raman1,2, Amy E Pohodich2,3, Ying-Wooi Wan2,4, Hari Krishna Yalamanchili2,4, William E Lowry5, Huda Y Zoghbi6,7,8,9, Zhandong Liu10,11,12.
Abstract
Recent studies have suggested that genes longer than 100 kb are more likely to be misregulated in neurological diseases associated with synaptic dysfunction, such as autism and Rett syndrome. These length-dependent transcriptional changes are modest in MeCP2-mutant samples, but, given the low sensitivity of high-throughput transcriptome profiling technology, here we re-evaluate the statistical significance of these results. We find that the apparent length-dependent trends previously observed in MeCP2 microarray and RNA-sequencing datasets disappear after estimating baseline variability from randomized control samples. This is particularly true for genes with low fold changes. We find no bias with NanoString technology, so this long gene bias seems to be particular to polymerase chain reaction amplification-based platforms. In contrast, authentic long gene effects, such as those caused by topoisomerase inhibition, can be detected even after adjustment for baseline variability. We conclude that accurate characterization of length-dependent (or other) trends requires establishing a baseline from randomized control samples.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30104565 PMCID: PMC6089998 DOI: 10.1038/s41467-018-05627-1
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Establishing baseline length-dependent trends and comparison of MeCP2 microarray and RNA-seq datasets. a Topotecan datasets[10,19]. The top half of each subgraph: comparison of gene expression between cultured cortical neurons from C57BL/6J × CASTEi/J F1-hybrid mice that were treated with vehicle (V) or with other vehicle-treated samples (V/V, blue line), and comparison of topotecan-treated cortical neurons (D) with vehicle-treated samples (D/V, red line). b–d Mecp2 datasets. Note the change in the scale of the y-axis. b Mecp2-KO datasets. Top half of subgraphs: comparison of gene expression between two sets of WT male C57BL samples (blue line) and comparison of gene expression between male Mecp2-null (KO) and WT male littermates (red line) in amygdala[24], cerebellum[21], and hypothalamus[22]. c MECP2-overexpression (Tg) datasets. The top half of each subgraph: comparison of gene expression between two sets of WT male FVB samples (blue line) and comparison of expression between Tg samples and WT littermates (red line) in amygdala[24], cerebellum[21], and hypothalamus[22]. d Cortical excitatory neurons from Mecp2-heterozygous female mice. The top half of each subgraph: comparison between two sets of WT C57BL samples (blue line), and the comparison between WT samples and WT neurons from R106W-heterozygous mice (left panel), between WT neurons and Mecp2-mutant neurons with the R106W mutation (middle panel), and between WT neurons and Mecp2-mutant neurons bearing the T158M mutation (right panel)[8]. The blue or red line represents fold-change in expression for genes binned according to gene length (bin size: 200 genes; shift size: 40 genes[6]). The blue and red shaded areas correspond to one-half of one standard deviation of each bin and the bottom half of each subgraph is the P value from two-sample t test between D/V and V/V, KO/WT and WT/WT, Tg/WT and WT/WT, and MUT/WT and WT/WT in (A–D), respectively. Red dots: bins with FDR < 0.05. The red dashed line at the bottom of subgraphs indicates the minimum −log10(P value) that corresponds to a FDR (false discovery rate) < 0.05. See Table 1 for comparison details and sample numbers. See Supplementary Fig. 1 for our statistical approach and Supplementary Figs. 2–4 for the additional analyses from published MeCP2 datasets
List of comparisons used in overlap or average plots in Figs. 1 and 2
| Brain region | Mouse strain/human samples compared | Reference |
|---|---|---|
| Fig. | BL: hybrid vehicle vs. hybrid vehicle ( |
|
| Cultured cortical neurons (middle panel) | BL: hybrid vehicle vs. hybrid vehicle ( |
|
| Cultured cortical neurons (right panel) | BL: hybrid vehicle vs. hybrid vehicle ( |
|
| Fig. | BL: C57BL WT vs. C57BL WT ( |
|
| Cerebellum (middle panel) | BL: C57BL WT vs. C57BL/6J WT ( |
|
| Hypothalamus (right panel) | BL: C57BL WT vs. C57BL/6J WT ( |
|
| Fig. | BL: FVB WT vs. FVB WT ( |
|
| Cerebellum (middle panel) | BL: FVB WT vs. FVB WT ( |
|
| Hypothalamus (right panel) | BL: FVB WT vs. FVB WT ( |
|
| Fig. | BL: C57BL WT vs. C57BL WT ( |
|
| Cortical excitatory neurons R106WMUT (middle panel) | BL: C57BL WT vs. C57BL WT ( |
|
| Cortical excitatory neurons T158MMUT (right panel) | BL: C57BL WT vs. C57BL WT ( |
|
| Fig. | BL: iPSC WT vs. iPSC WT ( | GSE107399 |
| NPC (middle panel) | BL: NPC WT vs. NPC WT ( | GSE107399 |
| Neuron (right panel) | BL: neuron WT vs. neuron WT ( | GSE107399 |
| Fig. | RL: postmortem RTT vs. controls ( |
|
| Frontal cortex (right panel) | GL: postmortem pooled sample from 5-year-old patient vs. age-matched control ( |
|
| Fig. | RTT female samples compared to age-matched controls (ages 17–20 years; |
|
| Temporal cortex (right panel) | RTT female samples compared to age-matched controls (ages 17–20 years; |
|
| Fig. | RTT female samples compared to age-matched controls (ages 18 years; | GSE107399 |
| Frontal cortex (right panel) | RTT male samples (age 1 year) to compared to age-matched (age 2 day) controls ( | GSE107399 |
Hybrid refers to C57BL/6J × CASTEi/J F1-hybrid mice. BL, RL, GL, and PL stand for blue line, red line, green line, and purple line, respectively
Fig. 2No bias toward long genes is detected in MECP2 human datasets. a RNA-seq analysis of isogenic human Rett syndrome in vitro models. Overlap plots were used to compare WT and KO samples, where the top half of each subgraph shows the comparison of gene expression between WT samples and other WT samples (blue line), and the comparison between RTT samples and WT samples (red line) in iPSC (left panel), neural progenitor cells or NPC (middle panel), and neurons (right panel). The bottom half of each subgraph is the P value from the two-sample t test between MUT/WT and WT/WT. Bins with FDR < 0.05 are shown as a red dot. The red dotted line in the bottom of the subgraphs indicates the minimum −log10(P value) that corresponds to a FDR (false discovery rate) < 0.05. The blue and red ribbons in correspond to one-half of one standard deviation of each bin for the comparison of WT/WT and MUT/WT, respectively. b Microarray analysis of human RTT brain samples compared to age-matched control for frontal cortex[29]. Comparison of gene expression trends vs. gene length in the pooled sample from 2- and 4-year-old patients (left panel; blue line) and the whole dataset (left panel; red line). Observed changes in gene expression vs. gene length in the sample from 5-year-old (right panel; green line) or 8-year-old RTT patient (right panel; purple line). c Microarray analysis of RTT human frontal cortex samples[31] compared to controls (left panel) and RTT human temporal cortex samples[31] compared to controls (right panel). d RNA-seq analysis from RTT human (female) frontal cortex samples compared to controls (left panel) and RTT human (male) frontal lobe sample compared to controls (right panel). The lines in a–d represent the fold-change in expression for genes binned according to gene length (bin size of 200 genes with shift size of 40 genes[6]). Please refer to Table 1 for the total number of samples used for the comparison between two random sets of WT samples and between WT and RTT samples
Fig. 3Differentially expressed genes show length-dependent misregulation in topotecan datasets but not in MeCP2 studies. a Scatter plot of log fold change in expression between topotecan and vehicle-treated cultured cortical neurons (y-axis) against its gene length (x-axis) in RNA-seq data from King et al.[10] (left panel; n = 5 each; FDR < 0.05) and Mabb et al.[19] (right panel; n = 3 each; FDR < 0.01). b Scatter plot of log fold change in expression (microarray) between C57BL KO and its C57BL WT littermates (y-axis) against its gene length (x-axis) in hypothalamus[22] (left panel; n = 4 each; FDR < 0.05 and log2FC > 0.2) and cerebellum[21] (right panel; n = 4 each; FDR < 0.05 and log2FC > 0.2). c Scatter plot of log fold change in expression (microarray) between FVB Tg to its FVB WT littermates (y-axis) against gene length (x-axis) in hypothalamus[22] (n = 4 each; FDR < 0.05 and log2FC > 0.2) and cerebellum[21] (n = 4 each; FDR < 0.05 and log2FC > 0.2). d Scatter plot of log fold change in expression between KO or Tg and WT littermates (y-axis) against gene length (x-axis) in RNA-seq datasets: Hypothalamus KO/WT comparison[23] (left panel; n = 3 each; FDR < 1e−5) and hypothalamus Tg/WT comparison[23] (right panel; n = 3 each; FDR < 1e−5). Red dots indicate long genes and blue dots indicate short genes. Differentially expressed genes were obtained from the published gene lists. See Supplementary Fig. 5 for additional analyses from published MeCP2 datasets
Fig. 4A long gene bias is observed in SEQC RNA-Seq and microarray, but not in NanoString, datasets. a MDS plot using Euclidean distance on the Novartis SEQC[32] count dataset (A is Universal Human Reference RNA, B is Human Brain Reference RNA, C is a mixture of A and B at a ratio of 3:1, and D is a mixture of A and B at a ratio of 1:3). b Mean Log2 β ratio of gene expression in the SEQC RNA-seq dataset was plotted against gene length (β ratio = (B–A)/(C–A); n = 64 each). Each blue dot is a bin of 200 genes with shift size of 40 genes[6]. c MDS plot using Euclidean distance on the SEQC microarray dataset. d Mean Log2 β ratio of gene expression in the SEQC microarray dataset was plotted against gene length (n = 4 each). Each blue dot is a bin of 200 genes with shift size of 40 genes[6]. e–g Box plots of the ~680 genes that are common amongst the three different platforms. The distributions of the mean Log2 β ratio was calculated for each of the ~680 genes in the samples and plotted for long and short genes are compared across the three platforms: (e) RNA-seq, (f) microarray, and (g): NanoString (n = 6 each). P values were computed using the Wilcoxon–Mann–Whitney test. In the box plots, the center lines show medians. Box limits indicate the 25th and 75th percentiles, and whiskers extend to 1.5 times the interquartile range. See Supplementary Figs. 6–9 for additional analyses of the SEQC and NanoString datasets
Fig. 5Expression changes are overestimated in RNA-seq datasets. Comparison of log fold-change in expression between RNA-seq (n = 3 each) and NanoString (n = 3 each) for short genes (a) and long genes (b). Genes were considered to be differentially expressed if FDR < 0.05. Red dots indicate genes found to be differentially expressed in both platforms. The green and blue dots indicate genes that are only found to be differentially expressed in either the NanoString or the RNA-seq dataset, respectively. The dashed lines in (a, b) on x-axis and y-axis correspond to log2 fold change of −log2(1.2), log2(1), and log2(1.2). c Absolute log fold-change difference between RNA-seq and NanoString (y-axis) plotted against gene length (x-axis). Red dots are long genes, and blue dots are short genes. P values were computed using a chi-square test. See Supplementary Fig. 10 for additional analyses related to this figure