| Literature DB >> 27418388 |
Elmira Forouzmand1, Nick D L Owens2, Ira L Blitz3, Kitt D Paraiso3, Mustafa K Khokha4, Michael J Gilchrist2, Xiaohui Xie1, Ken W Y Cho5.
Abstract
Advances in RNA sequencing technologies have led to the surprising discovery that a vast number of transcripts emanate from regions of the genome that are not part of coding genes. Although some of the smaller ncRNAs such as microRNAs have well-characterized functions, the majority of long ncRNA (lncRNA) functions remain poorly understood. Understanding the significance of lncRNAs is an important challenge facing biology today. A powerful approach to uncovering the function of lncRNAs is to explore temporal and spatial expression profiling. This may be particularly useful for classes of lncRNAs that have developmentally important roles as the expression of such lncRNAs will be expected to be both spatially and temporally regulated during development. Here, we take advantage of our ultra-high frequency (temporal) sampling of Xenopus embryos to analyze gene expression trajectories of lncRNA transcripts over the first 3 days of development. We computationally identify 5689 potential single- and multi-exon lncRNAs. These lncRNAs demonstrate clear dynamic expression patterns. A subset of them displays highly correlative temporal expression profiles with respect to those of the neighboring genes. We also identified spatially localized lncRNAs in the gastrula stage embryo. These results suggest that lncRNAs have regulatory roles during early embryonic development.Entities:
Keywords: Biological timescales; Expression profile; Gaussian processes; Gene expression; Non-coding RNA; RNA-seq; Systems biology; Xenopus tropicalis
Mesh:
Substances:
Year: 2016 PMID: 27418388 PMCID: PMC5233649 DOI: 10.1016/j.ydbio.2016.06.016
Source DB: PubMed Journal: Dev Biol ISSN: 0012-1606 Impact factor: 3.582
Fig. 1LncRNA discovery pipeline. The output of Cuffmerge (step 1) goes through multiple filtering steps to remove unqualified lncRNA genes and any transcripts with coding potential (step 2), short transcripts (step 3), miRNAs (step 4). These processes are performed in parallel rounds for single time points and also using pooled reads over a sliding window of 5 time points). After these commonly used filtering steps, the remaining transcripts are combined as one set and one representative transcript model is kept among the overlapping transcripts (step 5). Then, multi-exon and single-exon lncRNA candidates are separated (step 6). After removing the lncRNA candidates with less than 5 consecutive time points of non-zero expression, the SNR threshold is applied (step 7). We remove any potential lncRNA candidates that have the possibility of being part of exons of a neighboring gene (step 8 and 9). Our final lists of lncRNAs are 1336 multi-exon lncRNAs and 4353 single-exon lncRNAs.
Fig. 3Expression profiles of lncRNAs and the neighboring genes. A) Gene expression values in RPKM are shown for a lncRNA and a neighboring gene during the developmental time course. The blue and red solid lines represent Gaussian processes medians and the shaded areas are the 95% confidence intervals of the data. C denotes the Pearson correlation between the lncRNA and neighboring gene expression dynamics. Gene models of lncRNAs are shown in Supplementary Fig. 5. B) Left panel shows distribution of correlations of pairs of lncRNA – neighboring gene (in blue) and pairs of lncRNA – random gene (green). Right panel shows the distribution of correlations of pairs of lncRNA – neighboring gene (in blue) and pairs of antisense strand lncRNA –neighboring gene (light blue). Pearson coefficient of 1 is highly correlated, and −1 is highly anti-correlated.
Fig. 2Temporal expression dynamics of lncRNAs. The expression values of individual candidate lncRNAs are normalized by their maxima. These expression profiles are assigned (k-means clustering) to 8 different expression clusters. A) The heatmaps show individual normalized expression patterns for all 5689 lncRNAs. B) The plots demonstrate the average expression of all genes within individual clusters. Each blue bar in panel B corresponds to egg (E), late blastula (B), gastrula (G), neurula (N), tailbud (T).
Fig. 4LncRNA distribution in gastrula stage embryos. A) Spatial expression of lncRNAs in gastrula stage embryos. The scatter plot in left panel depicts the comparison between vegetal and animal RPKM values of lncRNAs. The scatter plot in the right panel depicts the comparison between ventral and dorsal expressions. Individual points represent 5689 lncRNAs expressed in gastrula embryos, and the red boxes mark differentially expressed lncRNAs. The black line denotes equal expression between vegetal and animal, or dorsal and ventral tissue fragements. B) RT-qPCR analysis of lncRNAs using RNA isolated from designated tissue fragements.