Literature DB >> 28386015

LncATLAS database for subcellular localization of long noncoding RNAs.

David Mas-Ponte1,2, Joana Carlevaro-Fita3,4, Emilio Palumbo1, Toni Hermoso Pulido1, Roderic Guigo1,2,5, Rory Johnson3,4.   

Abstract

The subcellular localization of long noncoding RNAs (lncRNAs) holds valuable clues to their molecular function. However, measuring localization of newly discovered lncRNAs involves time-consuming and costly experimental methods. We have created "lncATLAS," a comprehensive resource of lncRNA localization in human cells based on RNA-sequencing data sets. Altogether, 6768 GENCODE-annotated lncRNAs are represented across various compartments of 15 cell lines. We introduce relative concentration index (RCI) as a useful measure of localization derived from ensemble RNA-seq measurements. LncATLAS is accessible through an intuitive and informative webserver, from which lncRNAs of interest are accessed using identifiers or names. Localization is presented across cell types and organelles, and may be compared to the distribution of all other genes. Publication-quality figures and raw data tables are automatically generated with each query, and the entire data set is also available to download. LncATLAS makes lncRNA subcellular localization data available to the widest possible number of researchers. It is available at lncatlas.crg.eu.
© 2017 Mas-Ponte et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

Entities:  

Keywords:  chromatin; cytoplasm; lncRNA; long noncoding RNA; nucleus; subcellular localization

Mesh:

Substances:

Year:  2017        PMID: 28386015      PMCID: PMC5473142          DOI: 10.1261/rna.060814.117

Source DB:  PubMed          Journal:  RNA        ISSN: 1355-8382            Impact factor:   4.942


INTRODUCTION

The functions of long noncoding RNAs (lncRNAs) are intimately linked to location in the cell. The first discovered lncRNAs tended to be located in the nucleus and chromatin, and epigenetically regulate gene expression (Hutchinson et al. 2007; Rinn et al. 2007; Zhao et al. 2008; Whitehead et al. 2009; Mondal et al. 2010; Tsai et al. 2010). However, we now appreciate lncRNAs’ localization and molecular functions to be highly diverse. There exists a substantial population of lncRNAs in the cytoplasm (Ulitsky and Bartel 2013; van Heesch et al. 2014; Carlevaro-Fita et al. 2016), with evidence for roles such as translation regulation (Yoon et al. 2012; Schein et al. 2016; Zucchelli et al. 2016), miRNA decoys (Cesana et al. 2011), or protein trafficking (Willingham et al. 2005; Aoki et al. 2010; Kino et al. 2010). Consequently, ascertaining nuclear–cytoplasmic localization has become one of the primary sources of evidence when investigating the molecular role of newly discovered lncRNAs (Hutchinson et al. 2007; Ishizuka et al. 2014; Ounzain et al. 2015; Chen 2016; Chen et al. 2016; Hansji et al. 2016). The various methods to map RNA molecules in the cell operate with trade-offs in throughput, convenience, and accuracy. Among the single-gene approaches, probably the most commonly used is qRTPCR on RNA extracts of purified cellular compartments (Wang et al. 2006). It yields information on relative RNA concentrations between compartments, but not of absolute molecule numbers per cell. Another method is fluorescence in situ hybridization (FISH), which can in principle yield absolute counts of molecules at subcellular resolution (Raj et al. 2008; Dunagin et al. 2015). However, FISH is time-consuming and low-throughput, and requires expensive reagents (Cabili et al. 2015). More recently, the ingenious in situ sequencing method, FISSEQ, has established high-throughput subcellular RNA counting (Lee et al. 2015). But at present just one data set is available, and it is restricted to several hundred highly expressed lncRNAs (Lee et al. 2014). The only method currently capable of whole-genome localization mapping is subcellular RNA sequencing (subcRNAseq). In this process, cells are fractionated, extracted, and RNA sequenced (Djebali et al. 2012). SubcRNAseq yields high-throughput and quantitative data, although as with RT-PCR approaches mentioned above, the absolute counts of RNA molecules per cell are lost (Ulitsky and Bartel 2013). Recently, large amounts of raw subcRNAseq data have become available, most notably from the ENCODE Consortium (Djebali et al. 2012; Dunham et al. 2012). These data remain underutilized and have not been made readily accessible. In light of the growing use of RNA localization to infer the function of newly discovered lncRNAs, and the availability of large amounts of unprocessed subcRNAseq data, we have created a resource to make lncRNA localization data available to the broader scientific community. This resource, “lncATLAS,” enables nonexpert users to rapidly access a rich variety of easily interpreted data on their lncRNA of interest.

RESULTS

A database of lncRNA localization based on human RNA-seq data

In light of growing interest in lncRNA and their functions, we decided to create a resource for accessing and visualizing lncRNA localization within human cells. We collected data from the largest data set of subcRNAseq, produced by the ENCODE Consortium (Djebali et al. 2012; Dunham et al. 2012). Raw RNA-seq data from a panel of human cell lines were used to quantify the reference GENCODE gene annotation (Derrien et al. 2012; Harrow et al. 2012). RNA-seq experiments were obtained for a total of 15 cell lines comprising 48 individual experiments (see Supplemental Table S1). These cells originate from a wide diversity of adult and embryological organ sites and comprise both transformed and normal cells (Fig. 1A). For each cell, cytoplasmic and nuclear data are available, and for the majority of these, whole-cell data were also obtained. In addition, from a single cell line, K562, subnuclear, and subcytoplasmic compartment data are also available (Fig. 1B; Supplemental Table S1). Hereafter we refer to these as “compartments.” Poly(A)+ RNA samples were available for whole cell, cytoplasm, nucleus, and subcytoplasmic compartments, and total RNA for subnuclear and nuclear compartments.
FIGURE 1.

Overview of lncATLAS data. (A) Cell lines available in lncATLAS, indicating their approximate origin. (B) Cellular compartments available. (*) Compartments with only total RNA samples available. (C) The relative concentration index (RCI), in this case calculated for the cytoplasm and nucleus (CN-RCI). The RCI can be thought of as the log-ratio, between two compartments, of the concentration of a given RNA molecule per unit mass of RNA. (D) Overview of the cell lines and cellular compartments available for lncRNA RCI calculations. “LncRNA genes” column indicates the number of lncRNA for which the RCI could be calculated in the corresponding cell line (see Materials and Methods for details). Sub-C RCI and Sub-N RCI correspond to subcytoplasmic RCI and subnuclear RCI, respectively.

Overview of lncATLAS data. (A) Cell lines available in lncATLAS, indicating their approximate origin. (B) Cellular compartments available. (*) Compartments with only total RNA samples available. (C) The relative concentration index (RCI), in this case calculated for the cytoplasm and nucleus (CN-RCI). The RCI can be thought of as the log-ratio, between two compartments, of the concentration of a given RNA molecule per unit mass of RNA. (D) Overview of the cell lines and cellular compartments available for lncRNA RCI calculations. “LncRNA genes” column indicates the number of lncRNA for which the RCI could be calculated in the corresponding cell line (see Materials and Methods for details). Sub-C RCI and Sub-N RCI correspond to subcytoplasmic RCI and subnuclear RCI, respectively.

Defining localization from RNA-seq data

Throughout the present study, for practical reasons, we adopt a relative scheme to define and quantify RNA localization: the “relative concentration index” (RCI). RCI is defined as the log2-transformed ratio of FPKM (fragments per kilobase per million mapped) in two samples, for example, the cytoplasm and nucleus (Fig. 1C). A similar approach has been used previously (Derrien et al. 2012; Ulitsky and Bartel 2013). It is worth commenting on exactly how these values should be interpreted: RCI is the ratio of a transcript's concentration, per unit mass of sampled RNA, between two compartments. Sampled RNA populations may be poly(A)+ RNA or total RNA, and we are careful to only compute RCI values within the same population. The mass of RNA per compartment per cell is not equal, and typically not quantified prior to RNA-seq (Djebali et al. 2012). Therefore, without knowing the total mass of poly(A)+ RNA in the nucleus and cytoplasm of a single cell, we cannot make statements about the relative “number” of RNA transcripts in cellular compartments of a single cell (Cabili et al. 2015). We here briefly digress to contrast this approach with another possible way to define relative subcellular localization. Perhaps more obvious is a “molecular” definition, in terms of numbers of molecules of a given RNA transcript X in the compartments of a single cell. For example, if one cell has 10 and five molecules of X in the cytoplasm and nucleus, respectively, then its cytoplasmic/nuclear localization would be defined as 10/5 = 2. We define this measure as relative molecules index (RMI). Such information is, in principle, directly accessible from fluorescence-based techniques and has been calculated previously (Lee et al. 2014; Cabili et al. 2015). As mentioned above, our ignorance of the total poly(A)+ RNA mass of the cell lines used here precludes the calculation of RMI in this study.

Computing localization across genes and cell types

RCI was calculated for various selected pairs of cellular compartments (Fig. 1D). In the majority of cases, we calculated the cytoplasmic/nuclear RCI—“CN-RCI” (Supplemental Table S2) (see Materials and Methods). This is a measure of the relative concentration of an RNA sequence in the cytoplasm, compared to the nucleus, in log2 units. For one cell type, K562, total RNA data from subnuclear and poly(A)+ RNA from subcytoplasmic compartments were also calculated, by reference to total RNA from the nucleus or poly(A)+ RNA from the cytoplasm, as appropriate (Supplemental Table S3) (see Materials and Methods). Altogether this yielded localization estimates in 20 compartment/cell combinations (Fig. 1D). Where available, replicate data were used to assess reliability of RCI measurements (see Supplemental Table S1 for information about availability of replicates). Silent and unreliable genes were excluded from further consideration (see Materials and Methods for details in the filtering steps). Between 3114 (max) and 582 (min), lncRNAs’ CN-RCI localization could be estimated per cell, after filtering (Figs. 1D, 2A). Note that H1.hESC has a greater number of detected genes (4923 genes), because biological replicates were not available for cytoplasm or nucleus for this cell line. A total of 24,538 genes (17,770 mRNAs and 6768 lncRNAs) were quantified in at least one cell type. Of these, 31 lncRNAs were detected in all samples tested (Fig. 2B; Supplemental Table S4). LncRNAs display a highly cell-type-specific detection pattern, in contrast to mRNAs, as observed previously (Fig. 2B; Derrien et al. 2012; Guttman and Rinn 2012; Cabili et al. 2015).
FIGURE 2.

Summary of cytoplasmic/nuclear detected genes by lncATLAS. (A) Number of genes analyzed in lncATLAS. The y-axis maximum represents the total number of annotated lncRNA genes in GENCODE. (i) Detected genes: present a reliable expression value in compartments, cytoplasm, and nucleus. Other categories comprise (ii) compartment-specific genes: present a reliable non-zero value in one compartment and zero in the other; (iii) unreliable genes: genes that did not pass the twofold cutoff (see Materials and Methods, “Data Processing” section); and (iv) silent genes: not expressed in any compartment (see full category definitions in Materials and Methods, “Cytoplasmic–Nuclear Relative Concentration Index” section). (*) No biological replicates were available for cytoplasm or nucleus. (**) No biological replicates were available for cytoplasm. (B) The number of lncRNA genes defined as “detected” in the indicated numbers of cell types. LncRNA genes in blue and mRNAs in red. (C) Coverage by lncATLAS of widely used, manually curated lncRNA databases, LncRNAdb (Quek et al. 2015), and LncRNADisease (Chen et al. 2013). The y-axis maximum represents the total number of unique human gene loci in each database. The RNALocate database (Zhang et al. 2016) is also shown for comparison. Note that these databases hold a mixture of GENCODE annotated and non-GENCODE annotated lncRNAs. The bar plot displays the total number of human lncRNA genes in each database (whole bar). Bars are colored to represent, from the total number of human lncRNAs in a database, the number of genes that are present in lncATLAS (green); are part of GENCODE annotation but are not detected in any of the 15 cell lines in lncATLAS (pink); do not have GENCODE identifiers (red).

Summary of cytoplasmic/nuclear detected genes by lncATLAS. (A) Number of genes analyzed in lncATLAS. The y-axis maximum represents the total number of annotated lncRNA genes in GENCODE. (i) Detected genes: present a reliable expression value in compartments, cytoplasm, and nucleus. Other categories comprise (ii) compartment-specific genes: present a reliable non-zero value in one compartment and zero in the other; (iii) unreliable genes: genes that did not pass the twofold cutoff (see Materials and Methods, “Data Processing” section); and (iv) silent genes: not expressed in any compartment (see full category definitions in Materials and Methods, “Cytoplasmic–Nuclear Relative Concentration Index” section). (*) No biological replicates were available for cytoplasm or nucleus. (**) No biological replicates were available for cytoplasm. (B) The number of lncRNA genes defined as “detected” in the indicated numbers of cell types. LncRNA genes in blue and mRNAs in red. (C) Coverage by lncATLAS of widely used, manually curated lncRNA databases, LncRNAdb (Quek et al. 2015), and LncRNADisease (Chen et al. 2013). The y-axis maximum represents the total number of unique human gene loci in each database. The RNALocate database (Zhang et al. 2016) is also shown for comparison. Note that these databases hold a mixture of GENCODE annotated and non-GENCODE annotated lncRNAs. The bar plot displays the total number of human lncRNA genes in each database (whole bar). Bars are colored to represent, from the total number of human lncRNAs in a database, the number of genes that are present in lncATLAS (green); are part of GENCODE annotation but are not detected in any of the 15 cell lines in lncATLAS (pink); do not have GENCODE identifiers (red). RCI data are consistent with known cytoplasmic–nuclear localization tendencies of lncRNAs and mRNAs. Among the top 15 most cytoplasmic measurements, 14 represent mRNAs (the remainder is a lncRNA overlapping a protein-coding gene's 3′UTR on the same strand, meaning that it is impossible to distinguish the RNA-seq of the two overlapping genes, and the cytoplasmic RCI reading may in fact arise from the protein-coding gene) (Supplemental Table S5). In contrast, 12 of the 15 most nuclear RCI values represent lncRNAs (Supplemental Table S6). The nuclear-enriched X-chromosome inactivating transcript XIST occupies the top four positions (Brown et al. 1992; Clemson et al. 1996). Manual inspection of several well-known lncRNAs showed that localization reported here tended to be consistent with literature reports (see next section). Detected lncRNA genes reported by lncATLAS cover a substantial fraction of the entries from manually curated and widely used databases, such as lncRNAdb (Quek et al. 2015) and LncRNADisease (Chen et al. 2013), and localization database RNALocate (Zhang et al. 2016). Note that these databases contain a mixture of GENCODE annotated and non-GENCODE annotated entries. 74, 128, and 150 genes from lncRNAdb, LncRNADisease, and RNALocate, respectively, are present in lncATLAS. These numbers represent 39%, 48%, and 47% of the total number of human lncRNAs and 82%, 75%, and 90% of the total number of human GENCODE annotated lncRNAs of each database, respectively (Fig. 2C). LncRNAs from these that are absent in lncATLAS are either not detected in any of the cell lines, or else do not belong to the GENCODE annotation.

LncATLAS webserver for exploring localization data

The lncATLAS data set was compiled into a relational database that is searchable through a webserver at lncatlas.crg.eu. LncRNAs of interest are accessed using official gene names or GENCODE gene identifiers. A maximum of three genes may be investigated simultaneously. Several well-known lncRNAs with known localization are also available for reference. Once a gene or genes has been selected, a series of data interpretations are presented and summarized below. As examples, results for MALAT1 (nuclear localized) (Hutchinson et al. 2007) and DANCR (cytoplasm localized) (van Heesch et al. 2014; Cabili et al. 2015) lncRNAs data are shown in Figures 3 and 4.
FIGURE 3.

Subcellular localization plots displayed by lncATLAS. MALAT1 and DANCR genes are selected as examples of nuclear and cytoplasmic lncRNAs, respectively. (A) Bars representing CN-RCI values for the selected genes across all cell lines. Expression values (FPKMs) for the genes of interest are shown for both compartments (cytoplasm on top of the bar, nucleus on the bottom). Bars are colored by their absolute nuclear expression. (B) Boxplot showing CN-RCI values distribution of all lncRNAs (blue) and mRNAs (orange) for each cell line (“n” indicates total number of genes, “m” median of CN-RCI values for lncRNAs and mRNAs separately). LncRNAs of interest are located in the distribution and a percentage indicates their percentile rank within the distribution of all lncRNAs (ranks relative to lowest value). (C) Same as in the previous plot but in this case distribution is shown as a density plot and only for a particular cell type, HUVEC. Again, genes of interest are located in the distribution and their percentile rank (relative to lowest value) and RCI are indicated. (D) Contour plot showing lncRNA and mRNA populations as a function of CN-RCI values and whole cell expression [log10(FPKMs)]. LncRNAs of interest are specifically displayed together with their whole-cell expression and RCI.

FIGURE 4.

Subcompartment data as displayed in lncATLAS. For K562 cells, subnuclear and subcytoplasmic fractions were available and RCI for all lncRNAs and mRNAs was computed (see Materials and Methods). Boxplot shows the distribution of these values for each subnuclear and subcytoplasmic compartment (“n” indicates total number of genes in a distribution). Percentile rank (relative to lowest value) of each gene of interest is displayed to contextualize the relative enrichment of these genes in a subcompartment compared to the rest of lncRNAs and mRNAs.

Subcellular localization plots displayed by lncATLAS. MALAT1 and DANCR genes are selected as examples of nuclear and cytoplasmic lncRNAs, respectively. (A) Bars representing CN-RCI values for the selected genes across all cell lines. Expression values (FPKMs) for the genes of interest are shown for both compartments (cytoplasm on top of the bar, nucleus on the bottom). Bars are colored by their absolute nuclear expression. (B) Boxplot showing CN-RCI values distribution of all lncRNAs (blue) and mRNAs (orange) for each cell line (“n” indicates total number of genes, “m” median of CN-RCI values for lncRNAs and mRNAs separately). LncRNAs of interest are located in the distribution and a percentage indicates their percentile rank within the distribution of all lncRNAs (ranks relative to lowest value). (C) Same as in the previous plot but in this case distribution is shown as a density plot and only for a particular cell type, HUVEC. Again, genes of interest are located in the distribution and their percentile rank (relative to lowest value) and RCI are indicated. (D) Contour plot showing lncRNA and mRNA populations as a function of CN-RCI values and whole cell expression [log10(FPKMs)]. LncRNAs of interest are specifically displayed together with their whole-cell expression and RCI. Subcompartment data as displayed in lncATLAS. For K562 cells, subnuclear and subcytoplasmic fractions were available and RCI for all lncRNAs and mRNAs was computed (see Materials and Methods). Boxplot shows the distribution of these values for each subnuclear and subcytoplasmic compartment (“n” indicates total number of genes in a distribution). Percentile rank (relative to lowest value) of each gene of interest is displayed to contextualize the relative enrichment of these genes in a subcompartment compared to the rest of lncRNAs and mRNAs. The following sections summarize the data presented to the users for their gene of interest (GOI).

Inspect the cytoplasmic–nuclear localization of your gene of interest (GOI)

Data are only displayed for selected gene(s). Plot 1: Cytoplasmic/nuclear localization: RCI and expression values (all cell types): In this basic summary, the cytoplasmic/nuclear RCI is shown as a bar plot across all available cell types. Bars are colored to reflect the expression level of the gene, as inferred from nuclear RNA-seq. The individual FPKM values, upon which RCI values are based, are displayed. When a gene is expressed only in one compartment, RCI cannot be computed; then, dashed bars with expression values are shown instead (Fig. 3A).

Inspect the cytoplasmic–nuclear localization of your GOI within the distribution of all genes

The aim of the second section is to understand, in terms of localization, how the genes of interest behave relative to all other genes. Three different plots show CN-RCI values distribution for all lncRNAs and mRNAs, within which the location of the GOI is indicated. Plot 2: Cytoplasmic/nuclear localization: RCI distribution (all cell types): To put RCI values in context, their percentile rank within the distribution of all lncRNAs is indicated (ranks relative to lowest value). Data are shown for all cell types (Fig. 3B). Plot 3: Cytoplasmic/nuclear localization: RCI distribution (individual cell type): The same data are shown as for Plot 2, but in the form of a density plot. The User must here specify a single cell type. When genes are not classed as “Detected,” RCI cannot be computed and no data are shown (Fig. 3C). Plot 4: Cytoplasmic/nuclear localization: comparison with expression (individual cell type): As for Plot 3, gene values are shown in the context of all other genes in the same cell, but here also indicating whole-cell expression values. As before, the data are shown for a single-cell type chosen by the User, and plots are only generated for cells where RCI values are “Detected” (Fig. 3D).

Inspect the localization of your GOI at subcompartment level

The final section gives information about enrichment in the cytoplasmic and nuclear subcompartments of K562 cells. As in the previous section, RCI values for the genes of interest are indicated in the context of full lncRNA and mRNA distributions. Plot 5: Subcytoplasmic and subnuclear localization in K562 cells: Here data are shown for subnuclear and subcytoplasmic compartments K562 cell line. As in Plots 2 and 3, distributions across all detected genes are shown (Fig. 4). In the examples shown, the differences in localization of MALAT1 and DANCR are clear. Their cytoplasmic–nuclear localizations are highly divergent (Figs. 3, 4) and broadly consistent across all the cell lines observed. The difference in localization is observed even in cells where their overall expression level is similar (e.g., HUVEC, Fig. 3D). All figures may be downloaded as publication-quality files in pdf format. Similarly, in the Get Raw Data tab, the underlying RCI and raw expression values for selected genes may be accessed as a batch query. Furthermore, the entire set of data tables for lncATLAS may be downloaded in the same tab using the Download All raw data button.

DISCUSSION

Subcellular localization provides important clues to the molecular function of novel lncRNAs. LncATLAS is designed to make such data available to the largest number of researchers. To our knowledge, only one other database of lncRNA localization exists: RNALocate (Zhang et al. 2016). RNALocate contains manually curated localization classifications across multiple species. Despite focusing on a single species (human), due to the limited availability of subcellular RNA-seq data, lncATLAS has two key advantages: it is quantitative, and it is based on standard GENCODE annotations, the de facto official annotation for both protein-coding and lncRNA genes (Derrien et al. 2012). These features boost the usefulness of lncATLAS data for other research groups and ensure its integration with diverse other genomics data sets. Future subcellular RNA-seq data from other cell types, or other species, will be integrated as they become available.

MATERIALS AND METHODS

Data source

Cytoplasmic and nuclear poly(A)+ RNA-seq data from 15 different cell lines were obtained from ENCODE (Djebali et al. 2012). (ENCODE RNA-seq data in BAM format were obtained from the ENCODE Data Coordination Centre [DCC] in September 2016; https://www.encodeproject.org/matrix/?type=Experiment.) For most cell lines, whole-cell data were also obtained (exceptions being HT1080, NCI.H460, SK.MEL.5, and SK.N.DZ). A full list of processed RNA-seq libraries is available in Supplemental Table S1.

Data processing

Data were mapped to human genome assembly GRCh38 using STAR software (Dobin et al. 2013) and quantified with RSEM (Li and Dewey 2011) for all GENCODE v24 genes, within the GRAPE analysis pipeline (Harrow et al. 2012; Knowles et al. 2013). Data consisted of two independent biological replicates per cell line and fraction (exceptions being H1.hESC cytoplasm and nucleus and NCI.H460 cytoplasm for which only one replicate was available) (see Supplemental Table S1 for a full list of source data sets). For subcytoplasmic RCI, instead of using poly (A)+ cytoplasmic samples from the Gingeras laboratory (used for CNRCI), we used the corresponding sample from the laboratory where the subcytoplasmic fractionation was done (Lécuyer laboratory). This is not considered as an additional biological replicate. Throughout, RNA-seq data are processed at the level of genes, rather than transcripts. From the whole GENCODE v24 annotation, genes contained in the “Long noncoding RNA gene annotation” data set define our lncRNA gene set. The protein-coding gene set is defined by gene biotype “protein_coding” (see Supplemental Table S7). In order to remove genes with high variability between replicates, genes with a greater than twofold difference between replicates are labeled “unreliable” and excluded from further analysis. This cutoff was not possible for the samples mentioned above, for which replicate experiments were not available.

Cytoplasmic–nuclear relative concentration index (CN-RCI)

For cytoplasmic/nuclear localization, poly(A)+ RNA data were used. At this stage, all genes are defined in one of four categories in each cell line: (i) Detected: genes with non-zero and “reliable” values in both cellular compartments; (ii) Compartment-specific: genes considered “reliable” in both compartments but expressed >0 FPKM only in one; (iii) Silent: both compartments have FPKM = 0; (iv) Unreliable: genes that are unreliable in at least one compartment. Genes classed as “Detected” were retained, and their localization was computed as the C/N relative concentration index (CN-RCI) thus: Only detected genes are shown in plots of lncATLAS, with the exception of compartment-specific genes in Plot 1. For this group of genes, CN-RCI value is not available in the plot and the bar only indicates the tendency of the gene toward nucleus or cytoplasm. Color and FPKM are shown normally, to indicate the level of expression.

Subcytoplasmic and subnuclear relative concentration index for K562 (sub-N RCI, sub-C RCI)

RNA-seq data from subnuclear fractions (chromatin, nucleolus, and nucleoplasm) and from subcytoplasmic fractions (membrane and insoluble fraction) were retrieved and processed as above. These data are only available for K562 cells, and in the case of subnuclear samples correspond to total RNA [not poly(A)+-selected RNA]. Data were processed and RCI calculated as above, with the only differences being (i) the RCI was calculated with reference to the nuclear fraction for subnuclear compartments, and cytoplasmic fraction for subcytoplasmic compartments; (ii) for subnuclear compartments total RNA samples were used, instead of poly(A)+.

Database design

LncATLAS is a relational database implemented in MySQL (http://www.mysql.com) and designed through the official MySQL WorkBench tool for Linux. The entity-relationship (ER) diagram extracted summarizes its structure (Supplemental Fig. S1). The tables are hierarchically organized from general information of the samples to the expression value per gene that is stored in the expression table.

Web-tool implementation

LncATLAS is constructed using the Shiny R package (version 0.13.2) (http://www.rstudio.com/shiny/). The database is connected to the application itself via the R package RMySQL (version 0.10.9). Other packages used in the implementation are the ggplot2 package (version 2.2.1) and dplyr (version 0.5.0) used to build custom plots and manipulate the data.

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article.
  39 in total

Review 1.  Regulation of the mammalian epigenome by long noncoding RNAs.

Authors:  Joanne Whitehead; Gaurav Kumar Pandey; Chandrasekhar Kanduri
Journal:  Biochim Biophys Acta       Date:  2008-10-30

2.  Visualization of lncRNA by single-molecule fluorescence in situ hybridization.

Authors:  Margaret Dunagin; Moran N Cabili; John Rinn; Arjun Raj
Journal:  Methods Mol Biol       Date:  2015

3.  Long noncoding RNA as modular scaffold of histone modification complexes.

Authors:  Miao-Chih Tsai; Ohad Manor; Yue Wan; Nima Mosammaparast; Jordon K Wang; Fei Lan; Yang Shi; Eran Segal; Howard Y Chang
Journal:  Science       Date:  2010-07-08       Impact factor: 47.728

Review 4.  lincRNAs: genomics, evolution, and mechanisms.

Authors:  Igor Ulitsky; David P Bartel
Journal:  Cell       Date:  2013-07-03       Impact factor: 41.582

Review 5.  Modular regulatory principles of large non-coding RNAs.

Authors:  Mitchell Guttman; John L Rinn
Journal:  Nature       Date:  2012-02-15       Impact factor: 49.962

6.  Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues.

Authors:  Je Hyuk Lee; Evan R Daugharthy; Jonathan Scheiman; Reza Kalhor; Thomas C Ferrante; Richard Terry; Brian M Turczyk; Joyce L Yang; Ho Suk Lee; John Aach; Kun Zhang; George M Church
Journal:  Nat Protoc       Date:  2015-02-12       Impact factor: 13.491

7.  Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor.

Authors:  Tomoshige Kino; Darrell E Hurt; Takamasa Ichijo; Nancy Nader; George P Chrousos
Journal:  Sci Signal       Date:  2010-02-02       Impact factor: 8.192

8.  The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.

Authors:  Thomas Derrien; Rory Johnson; Giovanni Bussotti; Andrea Tanzer; Sarah Djebali; Hagen Tilgner; Gregory Guernec; David Martin; Angelika Merkel; David G Knowles; Julien Lagarde; Lavanya Veeravalli; Xiaoan Ruan; Yijun Ruan; Timo Lassmann; Piero Carninci; James B Brown; Leonard Lipovich; Jose M Gonzalez; Mark Thomas; Carrie A Davis; Ramin Shiekhattar; Thomas R Gingeras; Tim J Hubbard; Cedric Notredame; Jennifer Harrow; Roderic Guigó
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

9.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

10.  RNALocate: a resource for RNA subcellular localizations.

Authors:  Ting Zhang; Puwen Tan; Liqiang Wang; Nana Jin; Yana Li; Lin Zhang; Huan Yang; Zhenyu Hu; Lining Zhang; Chunyu Hu; Chunhua Li; Kun Qian; Changjian Zhang; Yan Huang; Kongning Li; Hao Lin; Dong Wang
Journal:  Nucleic Acids Res       Date:  2016-08-19       Impact factor: 16.971

View more
  92 in total

1.  lncSLdb: a resource for long non-coding RNA subcellular localization.

Authors:  Xiao Wen; Lin Gao; Xingli Guo; Xing Li; Xiaotai Huang; Ying Wang; Haifu Xu; Ruijie He; Chenglong Jia; Feixiang Liang
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

Review 2.  Towards a complete map of the human long non-coding RNA transcriptome.

Authors:  Barbara Uszczynska-Ratajczak; Julien Lagarde; Adam Frankish; Roderic Guigó; Rory Johnson
Journal:  Nat Rev Genet       Date:  2018-09       Impact factor: 53.242

3.  Pervasive functional translation of noncanonical human open reading frames.

Authors:  Jin Chen; Andreas-David Brunner; J Zachery Cogan; James K Nuñez; Alexander P Fields; Britt Adamson; Daniel N Itzhak; Jason Y Li; Matthias Mann; Manuel D Leonetti; Jonathan S Weissman
Journal:  Science       Date:  2020-03-06       Impact factor: 47.728

4.  In Vitro Silencing of lncRNA Expression Using siRNAs.

Authors:  Meike S Thijssen; Jennifer Bintz; Luis Arnes
Journal:  Methods Mol Biol       Date:  2021

5.  LncRNAs and Available Databases.

Authors:  Sara Napoli
Journal:  Methods Mol Biol       Date:  2021

Review 6.  Short and Long Noncoding RNAs Regulate the Epigenetic Status of Cells.

Authors:  Shizuka Uchida; Roberto Bolli
Journal:  Antioxid Redox Signal       Date:  2017-09-28       Impact factor: 8.401

7.  BCALM (AC099524.1) Is a Human B Lymphocyte-Specific Long Noncoding RNA That Modulates B Cell Receptor-Mediated Calcium Signaling.

Authors:  Sarah C Pyfrom; Chaz C Quinn; Hannah K Dorando; Hong Luo; Jacqueline E Payton
Journal:  J Immunol       Date:  2020-06-22       Impact factor: 5.422

8.  INTACT vs. FANS for Cell-Type-Specific Nuclei Sorting: A Comprehensive Qualitative and Quantitative Comparison.

Authors:  Monika Chanu Chongtham; Tamer Butto; Kanak Mungikar; Susanne Gerber; Jennifer Winter
Journal:  Int J Mol Sci       Date:  2021-05-19       Impact factor: 5.923

Review 9.  RNA motifs and combinatorial prediction of interactions, stability and localization of noncoding RNAs.

Authors:  Minakshi Gandhi; Maiwen Caudron-Herger; Sven Diederichs
Journal:  Nat Struct Mol Biol       Date:  2018-11-12       Impact factor: 15.369

10.  Sex-Specific Role for the Long Non-coding RNA LINC00473 in Depression.

Authors:  Orna Issler; Yentl Y van der Zee; Aarthi Ramakrishnan; Junshi Wang; Chunfeng Tan; Yong-Hwee E Loh; Immanuel Purushothaman; Deena M Walker; Zachary S Lorsch; Peter J Hamilton; Catherine J Peña; Erin Flaherty; Brigham J Hartley; Angélica Torres-Berrío; Eric M Parise; Hope Kronman; Julia E Duffy; Molly S Estill; Erin S Calipari; Benoit Labonté; Rachael L Neve; Carol A Tamminga; Kristen J Brennand; Yan Dong; Li Shen; Eric J Nestler
Journal:  Neuron       Date:  2020-04-17       Impact factor: 17.173

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.