Literature DB >> 23284966

Identification and comparative analysis of ncRNAs in human, mouse and zebrafish indicate a conserved role in regulation of genes expressed in brain.

Zhipeng Qu1, David L Adelson.   

Abstract

ncRNAs (non-coding RNAs), in particular long ncRNAs, represent a significant proportion of the vertebrate transcriptome and probably regulate many biological processes. We used publically available ESTs (Expressed Sequence Tags) from human, mouse and zebrafish and a previously published analysis pipeline to annotate and analyze the vertebrate non-protein-coding transcriptome. Comparative analysis confirmed some previously described features of intergenic ncRNAs, such as a positionally biased distribution with respect to regulatory or development related protein-coding genes, and weak but clear sequence conservation across species. Significantly, comparative analysis of developmental and regulatory genes proximate to long ncRNAs indicated that the only conserved relationship of these genes to neighbor long ncRNAs was with respect to genes expressed in human brain, suggesting a conserved, ncRNA cis-regulatory network in vertebrate nervous system development. Most of the relationships between long ncRNAs and proximate coding genes were not conserved, providing evidence for the rapid evolution of species-specific gene associated long ncRNAs. We have reconstructed and annotated over 130,000 long ncRNAs in these three species, providing a significantly expanded number of candidates for functional testing by the research community.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23284966      PMCID: PMC3527520          DOI: 10.1371/journal.pone.0052275

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Protein-coding genes account for only a small proportion of vertebrate genome complexity, specifically, only ∼2% of the human genome [1]. With better and more sensitive methods for studying gene expression, such as genome tiling arrays and deep RNA sequencing, we now know that vertebrate “RNA-only” transcriptomes are much more complex than their protein-coding transcriptomes [2], [3], [4], [5]. Studies of some vertebrate genomes have indicated that there are tens of thousands of ncRNAs (non-coding RNAs) [6], [7], [8], including structural RNAs, such as ribosomal RNAs, transfer RNAs and small non-coding regulatory transcripts such as siRNAs (small interfering RNAs), miRNAs (micro RNAs) and piRNAs (piwi-interacting RNAs) [9]. In addition to these well-characterized ncRNAs, there are a substantial number long ncRNAs, only a few of which have been functionally characterized [10], [11], [12], [13], [14]. The few functionally characterized long ncRNAs have various regulatory roles ranging from gene imprinting [15], [16], to transcriptional activation/repression of protein-coding genes [17], [18]. Specific long ncRNAs have been found with roles in neural development [19] and cell pluripotency [20], [21]. Long ncRNAs have also been implicated in pathological processes resulting from aberrant gene regulation [13], [22], [23]. But not all long ncRNAs are the same and a number of different methods have been used to discover and annotate them. Guttman et al. identified thousands of lincRNAs (large intervening/intergenic non-coding RNAs) in mouse using chromatin signatures [10], and Khalil et al. extended the catalog of human chromatin-signature-derived lincRNAs to ∼3,300 using the chromatin-state maps of 6 human cell types [11]. Many more lincRNAs have been reconstructed from RNA-seq data from multiple sources in human, mouse and zebrafish [12], [14], [24] and over a thousand long ncRNAs, some of which showed enhancer-like activity, were characterized based on GENCODE annotation [25]. Extrapolation from the limited set of experimentally validated long ncRNAs supports the idea that long ncRNAs are a “hidden” layer of gene regulation. Two lines of evidence supporting this view are their (modest) level of evolutionary sequence conservation and spatial association with regulatory genes. In this report we present the first systematic and methodologically comparable evolutionary analysis of ncRNAs. In order to determine the full extent of evolutionary conservation of ncRNAs, we used a pipeline built for identifying bovine ncRNAs, particularly long ncRNAs, at genome scale from public EST (Expression Sequence Tag) data. By using ESTs, we were able to get comprehensive datasets of long ncRNAs from both sexes, in many different tissues, cell types, developmental stages, and experimental treatments. In this report we have used this pipeline to analyse all publically available human, mouse and zebrafish ESTs and we present the first global and systematic comparative analysis of non-protein-coding transcriptomes across different species. We have found large numbers of novel long ncRNAs, many of which originate from the flanking regions of protein-coding genes. Furthermore, we have also shown that gene flanking, intergenic RNAs show sequence conservation compared to non-transcribed genomic regions and are preferentially found near regulatory/developmental protein-coding genes in a species-specific fashion.

Results

1 Genome-wide Exploration of ncRNAs from Human, Mouse, and Zebrafish ESTs

We used a previously described pipeline [26] to screen non-protein-coding transcripts from all publically available human, mouse and zebrafish ESTs and identified over 130,000 ncRNAs (Table 1 and Table S1, http://share:sharingisgood@genomes.ersa.edu.au/ncRNA_pub/). The large numbers of predicted long ncRNAs from human, mouse and zebrafish, together with previously identified bovine ncRNAs, confirm and significantly extend previous reports of pervasive transcription from these four organisms [1], [27], [28].
Table 1

Summary of procedures for ncRNA identification in human, mouse and zebrafish.

SpeciesNumber of ESTsNumber of assembled transcriptsMapped to RefSeqsMapped to Swiss-ProtWith long ORFsPutative ncRNAsReconstructed ncRNAs
Human* 8,314,4831,037,755* 44,245* 135,073130,291105,99487,173
Mouse4,853,4601,356,763382,8523,91160,34245,97536,280
Zebrafish1,481,936262,387117,3371,82810,77811,3239,877

Due to the large number of ESTs from human, we ran BLAST for all ESTs against human RefSeqs before assembly and removed all high confident ESTs (coverage >90% and identity >90%). This makes the “Number of assembled transcripts” and “Mapped to RefSeqs” smaller than expected.

Our long ncRNAs fell into 3 categories based on their genomic coordinates with respect to protein-coding genes; intergenic ncRNAs, intronic ncRNAs and overlapped ncRNAs, which overlapped by a small number of base pairs with exons of protein-coding genes [26]. In human and mouse, more than 50% of long ncRNAs were intronic (Figure 1 and Table 2), consistent with previous studies based on other methods [8]. In zebrafish, intergenic ncRNAs were far more numerous than intronic transcripts (Figure 1), but because of the much smaller number of zebrafish intergenic ncRNAs compared to human and mouse (Table 2) it is difficult to be sure that this difference in relative abundance of intergenic ncRNAs is real.
Figure 1

Percentage of intergenic, intronic and overlapped ncRNAs in human, mouse and zebrafish.

Table 2

Classification of ncRNAs.

SpeciesNumber of UTR-related ncRNAsNumber of intergenic ncRNAsNumber of intronic ncRNAsNumber of overlapped ncRNAs
Human3,43820,26855,60110,724
Mouse2,1799,49021,5414,414
Zebrafish2,0314,4642,5141,010
Due to the large number of ESTs from human, we ran BLAST for all ESTs against human RefSeqs before assembly and removed all high confident ESTs (coverage >90% and identity >90%). This makes the “Number of assembled transcripts” and “Mapped to RefSeqs” smaller than expected. Because many intergenic ncRNAs have been validated as functional elements from different species [10], [12], [14], [25], [29], we focused our analyses on all predicted intergenic ncRNAs. The distribution of intergenic ncRNAs with respect to protein-coding genes was the first question we addressed. In all three species, intergenic ncRNAs showed a biased distribution with respect to protein-coding genes at both 5′ and 3′ ends (Figure 2). This is consistent with our previous observation in cow [26] and previous observations in human and mouse based on tiling array and RNA-seq analyses [30], [31]. Furthermore, we know that many functional transcripts are located in these regions [8], [31].
Figure 2

Biased positional distribution of intergenic ncRNAs with respect to neighbor protein-coding genes in human, mouse and zebrafish.

The top 2 panels (A & B) are from human, the middle 2 panels (C & D) are from mouse and the bottom 2 panels (E & F) are from zebrafish. A, C and E show the positional distribution of 5′ or 3′ end ncRNAs. B, D and F show the positional distribution of ncRNAs in terms of transcription orientation compared to neighbor genes.

Larger proportions of sense-strand intergenic ncRNAs were transcribed near the 3′ end of protein-coding genes than antisense ncRNAs in all three species (Figure 2), but the positional distributions of intergenic ncRNAs at the 5′ end of protein-coding genes showed a slightly larger proportion of antisense-strand intergenic ncRNAs, compared to sense intergenic ncRNAs in human and mouse. We considered the possibility that gene-proximate 3′ transcripts were un-annotated UTRs (Untranscribed regions) or alternative transcripts, so we classified these ncRNAs into two subcategories: UTR-related RNAs, that shared high sequence similarity with annotated UTRs or located within 1 kb of protein-coding genes, and “true” intergenic ncRNAs. These results are summarized in Table 2. Some the UTR-related ncRNAs were transcribed from the antisense strand of nearby protein-coding genes, and these may correspond to uaRNAs (UTR-associated RNAs), which are independent transcripts with potential functional significance [32].

Biased positional distribution of intergenic ncRNAs with respect to neighbor protein-coding genes in human, mouse and zebrafish.

The top 2 panels (A & B) are from human, the middle 2 panels (C & D) are from mouse and the bottom 2 panels (E & F) are from zebrafish. A, C and E show the positional distribution of 5′ or 3′ end ncRNAs. B, D and F show the positional distribution of ncRNAs in terms of transcription orientation compared to neighbor genes.

2 Problems in the Annotation of Long ncRNA Datasets

Different methods have been used to identify several classes of long ncRNAs, especially lincRNAs, in human [10], [11], [24], [25], mouse [12] and zebrafish [14]. We compared the genomic coordinates of our long ncRNAs from all available tissues and developmental stages in human, mouse and zebrafish, with previously annotated long ncRNA datasets in order to determine the degree of overlap in ncRNAs identified by different methods. The number of EST-based ncRNAs that overlapped with three different human ncRNA datasets was very limited (Figure 3). Only 2,585 ncRNAs in our dataset had overlap with transcripts in at least one of the three known ncRNA datasets (Figure 3A). 1,597 of them overlapped with ∼16% (2,296 out of 14,353) of RNA-seq-based lincRNAs, and 1,009 overlapped with ∼28% (854 out of 3,011) of enhancer-like long ncRNAs. However, only 435 of them overlapped with ∼10% (508 out of 4,860) of chromatin-based lincRNAs (Table 3). The intersection of all four of these long ncRNA datasets contained only 25 transcripts, but this is to be expected if previously annotated ncRNAs were present in RefSeq, which we used to screen out known genes transcripts from our EST input data. We confirmed the small number of overlaps between our mouse ncRNAs with four other annotated mouse long ncRNA datasets (Figure 3B and Table 3). In order to confirm that this lack of overlap between our results and previously reported long ncRNAs was attributable to this screening process, we aligned them to the ESTs we used as a starting point for ncRNA identification. Depending on the dataset, we found between 46% and 99% of previously reported human ncRNAs in the EST data (Figure 4 and Table S2). We discuss this further below. Because gene models are continuously being revised, we found that some of our non intergenic ncRNAs overlapped with ncRNAs previously described as intergenic (Table 3).
Figure 3

Overlap of our predicted ncRNAs with known human or mouse long ncRNAs from different datasets.

A shows the overlap of our ncRNAs with three different human lincRNA datasets. B shows the overlap of our ncRNAs with mouse long ncRNA datasets. “Chromatin based”: lincRNAs identified based on chromatin-state maps [10], [11]. “Enhancer like”: long intergenic ncRNAs identified based on GENCODE [25]. “RNA-seq based”: long ncRNAs identified by reconstruction of RNA-seq data in human. “ES”, “NPC” and “MLF”: long ncRNAs identified by construction of RNA-seq data from 3 different mouse cell types.

Table 3

Overlap of EST-based ncRNAs with previously identified ncRNAs*.

DatasetNumber ofintronic ncRNAsNumber ofoverlapped ncRNAsNumber of UTR-related RNAsNumber of intergenicncRNAs (Percentage**)In total
Chromatin-based lincRNAs(human)21815391/1.93%435
Enhancer-like long ncRNAs(human)221032945/4.66%1,009
RNA-seq-based lincRNAs(human)1119831,484/7.32%1,597
LincRNAs from ES (mouse)261315108/1.14%162
lincRNAs from MLF (mouse)4091170/0.74%130
LincRNAs from NPC (mouse)301415125/1.32%184
Chromatin-based lincRNAs(mouse)278759293/3.09%466
RNA-seq-based longncRNAs (zebrafish)161228105/2.36%161

Numbers in this table are shown as our EST-based ncRNAs.

The percentage is based on the number of all intergenic ncRNAs as shown in table 2.

Figure 4

Comparisons of known long ncRNAs mapped by ESTs or non-repeat ESTs in human and mouse.

“Chromatin based”: lincRNAs identified based on chromatin-state maps [10], [11]. “Enhancer like”: long intergenic ncRNAs identified based on GENCODE [25]. “RNA-seq based”: long ncRNAs identified by reconstruction of RNA-seq data in human. “ES”, “NPC” and “MLF”: long ncRNAs identified by construction of RNA-seq data from 3 different mouse cell types.

Overlap of our predicted ncRNAs with known human or mouse long ncRNAs from different datasets.

A shows the overlap of our ncRNAs with three different human lincRNA datasets. B shows the overlap of our ncRNAs with mouse long ncRNA datasets. “Chromatin based”: lincRNAs identified based on chromatin-state maps [10], [11]. “Enhancer like”: long intergenic ncRNAs identified based on GENCODE [25]. “RNA-seq based”: long ncRNAs identified by reconstruction of RNA-seq data in human. “ES”, “NPC” and “MLF”: long ncRNAs identified by construction of RNA-seq data from 3 different mouse cell types.

Comparisons of known long ncRNAs mapped by ESTs or non-repeat ESTs in human and mouse.

“Chromatin based”: lincRNAs identified based on chromatin-state maps [10], [11]. “Enhancer like”: long intergenic ncRNAs identified based on GENCODE [25]. “RNA-seq based”: long ncRNAs identified by reconstruction of RNA-seq data in human. “ES”, “NPC” and “MLF”: long ncRNAs identified by construction of RNA-seq data from 3 different mouse cell types. Numbers in this table are shown as our EST-based ncRNAs. The percentage is based on the number of all intergenic ncRNAs as shown in table 2.

3 Evolutionary Conservation of ncRNAs in Human, Mouse and Zebrafish

Most protein-coding genes are strongly conserved across different species, as judged by sequence alignment, and this characteristic is exploited to predict genes in newly sequenced organisms. However simple comparison of sequence alignment is insufficient to identify sequence conservation in ncRNAs because they are much less conserved than protein-coding genes. To analyze the evolutionary conservation of predicted ncRNAs, we used a maximum likelihood based method (GERP++ score) [33]. Overall, ncRNAs were conserved, compared to randomly selected un-transcribed genomic fragments, but they were less conserved than protein-coding genes (Figure 5). This result is consistent with previous observations [10], [25], [26], [34]. We also found that many ncRNAs (∼50% in human and ∼60% in mouse, based on GERP++ score) exhibited positive selection compared to control, randomly selected un-transcribed genomic regions (Figure 5A and 5C). Comparison of specific ncRNA subclasses showed that UTR-related RNAs were more conserved than intergenic ncRNAs, which in turn, were more conserved than intronic ncRNAs (Figure 5B, 5D and 5F). These observations were confirmed using two other methods, phastCons and phyloP (Figure S1 and Figure S2).
Figure 5

GERP++ score for ncRNAs identified from human, mouse and zebrafish.

A and B are from human. C and D are from mouse. E and F are from zebrafish.

GERP++ score for ncRNAs identified from human, mouse and zebrafish.

A and B are from human. C and D are from mouse. E and F are from zebrafish. To compare the sequence conservation of our predicted ncRNAs with previously annotated long ncRNAs, we calculated the GERP++, phastCons and phyloP scores for human chromatin-based, enhancer-like and RNA-seq-based long ncRNAs (Figure S3, Figure S4 and Figure S5). Our predicted ncRNAs showed similar, but slightly more conserved cumulative conservation curves compared to all three known ncRNA datasets.

4 Intergenic ncRNAs are Preferentially Transcribed Proximate to Regulatory or Developmental Genes

Many ncRNAs, particularly intergenic ncRNAs can regulate gene transcription via different mechanisms [13], [20], [25], [35], including cis-regulatory mechanisms. We previously showed that intergenic ncRNAs were more likely to be close to regulatory genes [26]. We used the same methods to analyze the functional classification of human, mouse and zebrafish neighbor genes of gene-proximate intergenic ncRNAs. We chose intergenic ncRNAs located within 5 kb gene-flanking regions as “gene-proximate intergenic ncRNAs”, and used GO (Gene Ontology) to functionally classify these neighbor genes in human, mouse and zebrafish [36]. We found that genes with regulatory roles and/or associated with development were enriched in these neighbor genes across all three species with either 5′ end or 3′ end intergenic ncRNAs (Figure 6, Figure 7, Figure S6 and Figure S7). But very few of these neighbor genes were conserved across species, as confirmed by “Gene Symbol” comparison (Figure 8). However, 12 neighbor genes with 5′ proximate ncRNAs in human were found to have sequence-conserved correspondents in mouse and zebrafish neighbor genes, and 96 with 3′ proximate ncRNAs had sequence-conserved correspondents (Identity >60% and coverage >60%) (Table 4, Table S3). Significantly the vast majority of these neighbor genes with conserved proximate ncRNAs are expressed in human brain, suggesting a conserved cis-regulatory role for ncRNAs in brain gene expression. To determine if there was a biased functional distribution of protein-coding genes, many of which are 5 kb away from other protein-coding genes, we analyzed human GO annotation for all protein-coding genes with neighbor genes within 5 kb. We found no over-representation of regulatory or developmental genes in this set, indicating that a biased distribution of protein-coding genes did not affect our finding of enriched developmental and regulatory annotation for genes neighboring intergenic ncRNAs (Figure S8).
Figure 6

Over-represented GO terms of neighbor genes of 5′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C).

The bubble color indicates the P-value (EASE score from DAVID); bubble size indicates the frequency of the GO term in the underlying GOA database. Highly similar GO terms are linked by edges in the graph. Regulatory GO terms were highlighted with cyan-like colors, and developmental-associated GO terms were highlighted with gold colors.

Figure 7

Over-represented GO terms of neighbor genes of 3′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C).

The bubble color indicates the P-value (EASE score from DAVID); bubble size indicates the frequency of the GO term in the underlying GOA database. Highly similar GO terms are linked by edges in the graph. Regulatory GO terms were highlighted with cyan-like colors, and developmental-associated GO terms were highlighted with gold colors.

Figure 8

Venn diagrams show the conserved neighbor genes proximate to intergenic ncRNAs from human, mouse and zebrafish.

A shows the intersection of neighbor genes with ncRNAs at their 5′ end. B shows the intersection of neighbor genes with ncRNAs at their 3′ end.

Table 4

Human genes conserved in mouse and zebrafish with proximate intergenic ncRNAs at their 5′ end (<5 kb).

Official_gene symbolExpression inbrain (Human)* Aliases & DescriptionsDiseases disorders* Related ncRNAs
MAN1A1YesProcessing alpha-1,2-mannosidase IA | MAN9 |processing alpha-1,2-mannosidase IA | mannosyl-oligosaccharide 1,2-alpha-mannosidase IA |mannosidase, alpha, class 1A, member 1 | Man(9)-alpha-mannosidase | man(9)-alpha-mannosidase |Mannosidase alpha class 1A member 1 |HUMM3 |alpha-1,2-mannosidase IA | Alpha-1,2-mannosidase IA |Man9-mannosidase | HUMM9 |EC 3.2.1.113Mannosidasedeficiency diseaseN/A
MAN1A2Yesmannosidase, alpha, class 1A, member 2 |alpha-1,2-mannosidase IB | Mannosidase alpha class 1A member2 | mannosyl-oligosaccharide 1,2-alpha-mannosidase IB |alpha1,2-mannosidase | Processing alpha-1,2-mannosidase IB | processing alpha-1,2-mannosidase IB |MAN1B | Alpha-1,2-mannosidase IB |EC 3.2.1.113N/AN/A
ONECUT2YesOC2 | hepatocyte nuclear factor 6-beta |ONECUT-2homeodomain transcription factor | HNF6B | One cuthomeobox 2 | HNF-6-beta | Hepatocyte nuclear factor6-beta | onecut 2 | OC-2 | one cut domain, familymember 2 | transcription factor ONECUT-2 | one cutdomain family member 2 | Transcription factorONECUT-2 | one cut homeobox 2Oral cancerTarget of miR-9
PANK2YeshPanK2 | pantothenate kinase 2 | FLJ11729 |neurodegeneration with brain iron accumulation 1(Hallervorden-Spatz syndrome) | NBIA1 |Hallervorden-Spatz syndrome | HARP | HSS | Pantothenic acid kinase2 | C20orf48 | pantothenic acid kinase 2 | PKAN |pantothenate kinase 2, mitochondrial |EC 2.7.1.33Hallervorden-Spatz syndrome|dementia |dystoniaHost of miR-103
KCNJ4YesIRK-3 | hIRK2 | IRK3 | inward rectifier K(+) channel Kir2.3| Potassium channel, inwardly rectifying subfamily Jmember 4 | HRK1 | HIRK2 | potassium channel, inwardlyrectifying subfamily J member 4 |hippocampal inwardrectifier potassium channel | potassium inwardly-rectifying channel, subfamily J, member 4 |Hippocampal inward rectifier | inward rectifier K+channel Kir2.3 | HIR | inward rectifier potassium channel4 | Kir2.3 | Inward rectifier K(+) channel Kir2.3N/AN/A
PDCD6IPYesapoptosis-linked gene 2-interacting protein X |dopamine receptor interacting protein 4 | ALIX |programmed cell death 6 interacting protein | ALG-2-interacting protein 1 | programmed cell death 6-interacting protein | PDCD6-interacting protein | Hp95 |KIAA1375 | Alix | HP95 | AIP1 |ALG-2 interacting protein1 | DRIP4N/ATarget ofmiR-1225-5P
SNX14Yessorting nexin 14 | RGS-PX2 |sorting nexin-14N/AN/A
TUBB2BYestubulin beta-2B chain | tubulin, beta polypeptideparalog | MGC8685 | bA506K6.1 | tubulin, beta 2Bclass IIb | DKFZp566F223 | tubulin, beta 2B | classIIb beta-tubulin |class II beta-tubulin isotypeLissencephalyN/A
ZNF41YesTUBB |class IIa beta-tubulin | tubulin, beta 2Aclass IIa | TUBB2 | tubulin, beta polypeptide 2 | tubulin,beta 2 | TUBB2B | dJ40E16.7 | tubulin beta-2A chain |tubulin, beta polypeptide | tubulin, beta 2AAland Island eye disease |mental disorder|intellectual disabilityN/A
ZNF595YesMRX89 |MGC8941 | zinc finger protein 41N/AN/A
ZNF676YesFLJ31740 | zinc finger protein 595N/AN/A
ZNF761Nozinc finger protein 676N/AN/A

The expression and disease annotation were based on GeneCards V3 [57].

Over-represented GO terms of neighbor genes of 5′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C).

The bubble color indicates the P-value (EASE score from DAVID); bubble size indicates the frequency of the GO term in the underlying GOA database. Highly similar GO terms are linked by edges in the graph. Regulatory GO terms were highlighted with cyan-like colors, and developmental-associated GO terms were highlighted with gold colors.

Over-represented GO terms of neighbor genes of 3′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C).

The bubble color indicates the P-value (EASE score from DAVID); bubble size indicates the frequency of the GO term in the underlying GOA database. Highly similar GO terms are linked by edges in the graph. Regulatory GO terms were highlighted with cyan-like colors, and developmental-associated GO terms were highlighted with gold colors.

Venn diagrams show the conserved neighbor genes proximate to intergenic ncRNAs from human, mouse and zebrafish.

A shows the intersection of neighbor genes with ncRNAs at their 5′ end. B shows the intersection of neighbor genes with ncRNAs at their 3′ end. The expression and disease annotation were based on GeneCards V3 [57]. In order to determine if common GO terms were enriched across species, we compared all the significantly over-represented GO terms (p-value <0.05) across all three species. For genes with 5′ proximate intergenic ncRNAs, we found 19 over-represented terms in common, mostly concerning regulation of different biological pathways (Table 5). Specific molecular function terms enriched in all three species were “transcription factor activity” and “transcription regulator activity” (Table 5). In 3′ end neighbor genes, we found 34 significantly over-represented common GO terms, and the majority of them were “regulation” associated functional enrichments, also including “transcription factor activity” and “transcription regulator activity” (Table 6).
Table 5

GO terms in common from human, mouse and zebrafish neighbor genes within 5kb of proximate ncRNAs at their 5′ end.

CategoryTerm *P value (human)P value (mouse)P value (zebrafish)
Molecular FunctionGO:0003700∼transcription factoractivity6.88E-070.0016859350.002045234
Molecular FunctionGO:0030528∼transcriptionregulator activity2.80E-062.50E-050.001720193
Biological ProcessGO:0006355∼regulation oftranscription, DNA-dependent4.53E-060.0001086190.02130028
Biological ProcessGO:0051252∼regulation of RNAmetabolic process7.91E-060.0001785030.023870388
Biological ProcessGO:0010556∼regulation ofmacromolecule biosyntheticprocess8.37E-064.96E-070.000915362
Biological ProcessGO:0060255∼regulation ofmacromolecule metabolicprocess5.89E-057.41E-060.00691373
Biological ProcessGO:0045449∼regulation oftranscription6.20E-052.37E-060.001790827
Biological ProcessGO:0031326∼regulation ofcellular biosynthetic process8.41E-051.10E-060.001054761
Biological ProcessGO:0009889∼regulation ofbiosynthetic process0.0001199021.33E-060.001088173
Biological ProcessGO:0080090∼regulation ofprimary metabolic process0.0001464476.89E-070.002903755
Biological ProcessGO:0010468∼regulation ofgene expression0.0001546861.42E-060.002943972
Biological ProcessGO:0031323∼regulation ofcellular metabolic process0.000158194.08E-060.002422663
Biological ProcessGO:0019219∼regulation ofnucleobase, nucleoside,nucleotide and nucleic acidmetabolic process0.0003215327.14E-060.002751033
Biological ProcessGO:0051171∼regulation ofnitrogen compound metabolicprocess0.0003436476.14E-060.002831208
Biological ProcessGO:0019222∼regulation ofmetabolic process0.0003493721.09E-050.011044253
Biological ProcessGO:0050794∼regulation ofcellular process0.0013484760.0007662390.009737321
Biological ProcessGO:0050789∼regulation ofbiological process0.004338170.0013822950.033481278
Biological ProcessGO:0065007∼biologicalregulation0.0224289920.0020319980.031603795
Biological ProcessGO:0007275∼multicellularorganismal development0.0359167880.0002431420.043621824

The GO terms were ordered by p-value in human.

Table 6

GO terms in common from human, mouse and zebrafish neighbor genes within 5kb of proximate ncRNAs at their 3′ end.

CategoryTerm *P value (human)P value (mouse)P value (zebrafish)
Molecular FunctionGO:0003677∼DNA binding2.52E-070.0010163690.022517442
Biological ProcessGO:0019222∼regulation of metabolic process5.94E-060.0018330530.007240134
Biological ProcessGO:0031323∼regulation of cellular metabolic process7.06E-060.0019320150.002531781
Biological ProcessGO:0080090∼regulation of primary metabolic process8.71E-060.0007464330.001635905
Biological ProcessGO:0060255∼regulation of macromoleculemetabolic process1.52E-050.0010210520.015088588
Cellular ComponentGO:0044464∼cell part2.64E-050.0051389830.021192768
Cellular ComponentGO:0005623∼cell2.75E-050.0051389830.021192768
Biological ProcessGO:0009889∼regulation of biosynthetic process4.64E-050.001532350.001998668
Biological ProcessGO:0010556∼regulation of macromolecule biosynthetic process5.07E-050.0011336690.004636373
Biological ProcessGO:0031326∼regulation of cellular biosynthetic process5.93E-050.0017703850.002769539
Biological ProcessGO:0010468∼regulation of gene expression6.05E-050.0011536470.019089475
Biological ProcessGO:0019219∼regulation of nucleobase, nucleoside,nucleotide and nucleic acid metabolic process7.45E-050.0028350060.006403442
Biological ProcessGO:0045449∼regulation of transcription9.02E-050.0011334230.009147674
Biological ProcessGO:0051171∼regulation of nitrogen compound metabolic process0.0001155220.0039535630.006560818
Molecular FunctionGO:0003700∼transcription factor activity0.0007019590.0064039480.003113804
Biological ProcessGO:0051252∼regulation of RNA metabolic process0.0027516560.0125935760.006423226
Biological ProcessGO:0006355∼regulation of transcription, DNA-dependent0.0028364010.0083139950.007792617
Molecular FunctionGO:0030528∼transcription regulator activity0.0031051960.007820680.001014153
Biological ProcessGO:0031328∼positive regulation of cellular biosynthetic process0.0074284510.0072265980.033533698
Biological ProcessGO:0009891∼positive regulation of biosynthetic process0.0074691040.0087409210.033533698
Biological ProcessGO:0010557∼positive regulation of macromolecule biosynthetic process0.0091969450.0034890050.028269774
Biological ProcessGO:0010628∼positive regulation of gene expression0.0104157110.0090989970.021490484
Biological ProcessGO:0045941∼positive regulation of transcription0.0111437830.005692330.021490484
Molecular FunctionGO:0005515∼protein binding0.0171635740.0008095271.60E-06
Biological ProcessGO:0045893∼positive regulation of transcription, DNA-dependent0.021058590.0049788950.012497621
Molecular FunctionGO:0008270∼zinc ion binding0.0229620240.0030102590.036242576
Biological ProcessGO:0048869∼cellular developmental process0.0241547860.0063140169.66E-07
Biological ProcessGO:0051254∼positive regulation of RNA metabolic process0.0245669190.0056694220.014428949
Biological ProcessGO:0030154∼cell differentiation0.029537090.0076552651.65E-06
Biological ProcessGO:0045935∼positive regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process0.033263290.0117388030.039427105
Biological ProcessGO:0048468∼cell development0.0333199320.0077376140.003006631
Biological ProcessGO:0051173∼positive regulation of nitrogen compound metabolic process0.0333199320.0121967970.04261773
Biological ProcessGO:0044267∼cellular protein metabolic process0.0426395340.0037350080.011732507
Biological ProcessGO:0001655∼urogenital system development0.0483049410.0124388530.04591464

The GO terms were ordered by p-value in human.

The GO terms were ordered by p-value in human. The GO terms were ordered by p-value in human. Taken together, these results indicated that many intergenic ncRNAs were transcribed proximate to regulatory or developmental genes in human, mouse and zebrafish. This positional bias and functional classification of neighbor genes indicated a potential cis-regulatory role for intergenic ncRNAs in the transcription of protein-coding genes.

Discussion

We have assembled and annotated the non-protein-coding transcriptome from human, mouse and zebrafish in a stringent and comprehensive fashion using all publically available ESTs. Our results increase the number of annotated ncRNAs by more than an order of magnitude and are robust and highly significant for the following reasons. First, ESTs used to assemble long ncRNAs were generated from multiple libraries from a broad spectrum of tissues/cell types, developmental stages or biological circumstances. Second, robust, highly stringent selection procedures used to assemble long ncRNAs enabled us to remove possible sequencing artifacts. Third, ESTs generated by traditional sanger sequencing technology gave longer raw reads and could be assembled into longer and more accurate consensus transcripts than possible with short read sequencing technologies used in previous studies [12], [14], [24]. In spite of these positive attributes we also have to acknowledge the potential shortcomings of our reconstructed long ncRNAs. First, many ESTs were archived without transcription orientation, thus it was difficult to deduce transcription orientations for some reconstructed ncRNAs. Second, reconstruction of ESTs from different libraries might have resulted in loss of alternative transcripts. Third, although longer raw reads enabled us to build long consensus transcripts with high accuracy, many reconstructed transcripts are possibly still not full-length. One limitation of our results stemmed from our decision to specifically exclude repetitive ESTs from our analysis because they confounded our sequence reconstructions. This means that repeat containing ncRNAs were not included in our results. Intergenic ncRNAs from all three species showed the same positional bias in their distribution with respect to protein-coding genes, consistent with previous observations in cow [26]. Because this positional bias was also previously reported in long intergenic ncRNAs identified using quite different methods [27], [30], [31], [37], we propose that this is a common property for intergenic ncRNAs across vertebrate species. This biased genomic distribution could result from two possible scenarios: First, the observed positional bias is a functional attribute for intergenic ncRNAs because they cis-regulate nearby protein-coding genes through a number of possible mechanisms. Many long intergenic ncRNAs, such as enhancer-like ncRNAs and promoter-associated ncRNAs, have been validated as cis-regulators of nearby protein-coding genes [25], [38], [39]. The transcription of these long intergenic ncRNAs may remodel the chromatin status of surrounding regions, including the promoters of protein-coding loci [18], [40], [41], [42]. Another possibility is that transcription of long ncRNAs from promoter regions of protein-coding genes competes for the transcription-binding complex between long ncRNAs and nearby genes, thus balancing their transcription [17], [43], [44]. Although many long ncRNAs have been experimentally validated and fed into different gene regulation models, more functional manipulations of long ncRNAs are required to test different regulatory models. The second scenario is that these ncRNAs are fragments of un-annotated UTRs or alternative splicing isoforms. Current ncRNA identification methods are heavily reliant on the available gene models, which may be incomplete. This possibility has some support because some gene-proximate intergenic ncRNAs were similar to UTRs. Because of this possibility, all functional classifications in our analysis were based on stringent intergenic ncRNAs (all UTR-related RNAs removed). However we also observed a large number of antisense transcripts within the gene-proximate intergenic ncRNAs, which cannot be categorized as possible UTRs. Moreover, many studies have identified pervasive, independent functional non-coding transcripts from gene-proximate regions, even in UTRs of protein-coding genes [32]. We conclude that our gene-proximate intergenic ncRNAs are most likely functional, but that we need to wait for further experimental testing to understand how they work [45]. We put forward our ncRNAs as good starting points for functional screening. Long ncRNAs are pervasively transcribed across genomes in different species [1], [46], [47]. However, the true number of long ncRNAs is still not known. Previous studies using whole-genome tiling arrays demonstrated that the majority of the human genome was transcribed [2], [3], [48]. The FANTOM project also revealed thousands of long ncRNAs based on cDNAs in mouse [6]. In the past few years, different categories of long ncRNAs, particularly lincRNAs, have been annotated using a variety of methods [10], [11], [12], [14], [24], [25]. Our ncRNAs are novel because we screened out ESTs with significant similarity to RefSeqs (coding and non-coding). This novelty is confirmed by the limited overlap of our ncRNAs with previous ncRNAs. In order to assess our methodology vis a vis previous methods, we aligned previously reported ncRNAs against the raw EST data we used as input for our pipeline (See Material S1). Generally ncRNAs from other datasets based on transcriptome data were present in the ESTs, but this was not the case with ncRNAs based on prediction from chromatin state [10], [11]. When we assessed the expression of previously reported ncRNAs from chromatin state [10], [11] we found that many of these predicted ncRNAs showed no evidence of transcription based on ESTs. These ncRNAs were validated by using tiling array based expression analysis with reported expression levels of 70% within single tissues/cell types [11]. Because we found no more than 46% of these in the raw human EST data (Figure 4, Table S2 and Material S1), we re-visited the tiling arrays reported for the validation. Most of the chromatin state based predicted ncRNAs contained repeats and about 38% of the tiling array probes used to validate them also contained repetitive sequence (Material S1). It is likely that the reported tiling array validation of 70% of the chromatin state predicted ncRNAs is an inflated estimate, as many transcripts contain repeats in their UTRs which would cross-hybridize to these probes, providing false positive signals. On the whole, the number of ncRNAs that were not found in ESTs was a tiny fraction of the total number of ncRNAs included in previous publications and in the present report. We conclude that the number of ncRNAs, particularly for intergenic, repeat containing ncRNAs, is significantly underestimated based on our current knowledge. Sequence conservation is an important functional signature of genomic transcripts. Many of the ncRNAs that we identified, even though they are clearly less conserved than protein-coding genes, show clear sequence conservation compared to randomly selected, un-transcribed genomic fragments. Furthermore, intergenic ncRNAs are more conserved than intronic ncRNAs in all three species. This weak but significant purifying selection of lincRNAs was observed in a previous study [49] and these results are also consistent with the conservation levels of ncRNAs previously identified from cow [26], as well as previously reported long ncRNA datasets [10], [12], [14]. Sequence conservation is not the only benchmark for functional significance, as we also observed a small number of protein-coding genes under positive selection. Genes for ncRNAs probably evolve more rapidly than protein-coding genes, which are constrained by triplet codons to maintain the conserved functions of translated proteins. For functional ncRNAs, such as microRNAs, conserved secondary structures have been identified as functional elements required to regulate gene expression. Conserved secondary structures may be more important than conserved primary sequence for long ncRNAs [34]. Furthermore, because many long ncRNAs are transcribed in tissue/cell-type specific fashion [12], [14], [24], [50], [51] we suggest that many ncRNAs might be species-specific. The overall lack of correspondence between neighbor genes with proximate intergenic ncRNAs across species supports the idea that ncRNAs evolve rapidly, generating species-specific patterns of tissue specific, developmental regulation. ncRNAs undergoing positive selection might represent novel tissue/cell-type/species specific regulatory transcripts. A significant exception to the lack of correspondence between neighbor genes and proximate intergenic ncRNAs was the conservation of 108 genes with proximate ncRNAs in human, mouse and zebrafish. 97 of these genes are expressed in human brain, suggesting a conserved cis-regulatory role for ncRNAs in brain development. Previously, Chodroff et al. [52] showed that four conserved long ncRNAs also had conserved expression in brain across a range of amniotes. Our results indicate that conservation of ncRNA association with protein-coding genes expressed in brain also occurs (Table 4, Table S3), suggesting the vertebrates possess a conserved co-expression or cis-regulatory network of ncRNA/gene pairs. As discussed above, the biased positional distribution of intergenic ncRNAs suggested cis-regulatory functions. The functional annotation of neighbor genes with nearby intergenic ncRNAs supports this hypothesis. Many intergenic ncRNAs are preferentially transcribed from regions adjacent to regulatory and developmental genes as seen in this report and on a smaller scale by others [10], [24], [38]. In conclusion, we present a significantly expanded set of ncRNAs that suggests that ncRNAs, while exhibiting sequence conservation, evolve rapidly in terms of their association with neighboring regulatory and developmental genes. The exception to this rapid evolution appears to be with respect to a subset of genes expressed in brain. Long ncRNAs, such as intergenic ncRNAs, may function through different mechanisms as genome wide regulatory elements in many biological pathways, including brain development [53].

Methods

1 ncRNA Identification from Human, Mouse and Zebrafish

ncRNA identification was performed using a previously built pipeline [26]. First, all available ESTs were extracted from dbEST (NCBI). After removing low quality sequences and ESTs composed mostly of repetitive elements, all remaining ESTs were clustered and assembled into longer unique consensus transcripts. Protein-coding genes were removed from the unique transcripts based on similarity searches against RefSeqs and Swiss-Prot databases. As a final step, transcripts were checked for ORFs to remove potential un-annotated protein-coding genes. This left a set of long ncRNAs. To further reduce the redundancy of these long ncRNAs, we reconstructed all putative long ncRNAs based on their genomic coordinates using inchworm [54]. The classification of ncRNAs into three different categories, intronic, intergenic and overlapped ncRNAs with respect to protein-coding genes was performed with R as previously described [26]. The intergenic ncRNAs that were located within 1 kb of the 5′ and 3′ ends of protein-coding genes, or with sequence similarity against known UTRs, were further classified as UTR-related RNAs. All remaining intergenic ncRNAs were classified as bona fide intergenic ncRNAs.

2 Neighbor Genes and Transcription Orientation of ncRNAs with Respect to Neighbor Genes

The closest protein-coding gene to an intergenic ncRNA was chosen as the neighbor gene of this intergenic ncRNA. The transcriptional orientation of ncRNAs was determined based on two criteria: First, many ESTs extracted from NCBI have cloning and sequencing information, which was used to determine the transcription orientation of both singletons and contigs. Second, the transcription orientation of spliced long ncRNAs was deduced from splicing information when they were mapped onto the genome. The “sense” intergenic ncRNAs were defined as transcribing from the same strand as neighbor genes, and vice versa.

3 Comparisons with Known Well-characterized Long ncRNAs in Human, Mouse and Zebrafish

The sources and summary information for previously characterized ncRNAs are shown in Table 7. For chromatin-based lincRNAs in human and mouse, we used the exons instead of the long chromatin regions as the known lincRNAs. The overlap of our EST-based ncRNAs with these known long ncRNA datasets were analyzed with the “GenomicFeatures” R package.
Table 7

Previously annotated long ncRNA datasets used for comparison.

DatasetNumber of ncRNAsSourceMethodReference
Chromatin-based lincRNAs (Human)4,860* 10 cell typesChromatin signatureidentification (K4–K36 domain)Khalil AM, 2009 [11]
Enhancer-like long ncRNAs (Human)3,011MultipleScreening from GENCODEannotationOrom UA, 2010 [25]
RNA-seq-based lincRNAs (Human)8,19524 tissues and cell typesScreening from assembledRNA-seq dataCabili MN, 2011 [24]
Chromatin-based lincRNAs (Mouse)2,127* 4 cell typesChromatin signatureidentification (K4–K36 domain)Guttman M, 2009 [10]
RNA-seq-based lincRNAs (Mouse)1,1403 cell typesScreening from assembledRNA-seq dataGuttman M, 2010 [12]
RNA-seq-based long ncRNAs (Zebrafish)1,1338 embryonic stagesScreening from assembledRNA-seq dataPauli A, 2011 [14]

These are the exons identified by microarray from non-coding k4-k36 domains.

These are the exons identified by microarray from non-coding k4-k36 domains.

4 Conservation Analyses of ncRNAs

Three different conservation scores were used to analyze the sequence conservation of ncRNAs. The GERP++ scores for human and mouse were downloaded from http://mendel.stanford.edu/SidowLab/downloads/gerp/. For zebrafish, the GERP++ scores were calculated with GERP++ tool based on the multiple alignments of 7 genomes (hg19/GRCh37, mm9, xenTro2, tetNig2, fr2, gasAcu1, oryLat2) with danRer7 of zebrafish. The phastCons scores and phyloP scores for human, mouse and zebrafish were downloaded from UCSC based on genome assembly hg19/GRCh37 (human), mm9 (mouse) and danRer7 (zebrafish) respectively. The mean GERP++/phastCons/phyloP score for each ncRNA/RefSeq/control sequence was calculated by normalizing the sum of GERP++/phastCons/phyloP scores against the length of the sequence. All RefSeqs excluding “NR” and “XR” entries (non-coding transcripts) were used as the protein-coding gene dataset. The same number of genomic fragments as ncRNAs, which ranged in size from 500 bp to 15,000 bp, were randomly selected from un-transcribed genomic regions (no ESTs mapped) as the control datasets for each species respectively. The cumulative frequency for each dataset was calculated and plotted using the R package.

5 Functional Classifications of Neighbor Genes of Gene-proximate Intergenic ncRNAs

Gene-proximate intergenic ncRNAs were selected from stringent intergenic ncRNAs located within 5 kb of the 5′ and 3′ ends of protein-coding genes. GO classification of neighbor genes was performed on the DAVID (Database for Annotation, Visualization and Integrated Discovery) web server [55]. The thresholds for over-represented GO terms were set as gene count >5 and p-value (EASE score) <0.05. The web server REViGO was used to reduce the redundancy and visualize the overrepresented GO terms based on semantic similarity [56]. The gene symbols of neighbor genes with annotations in GO were compared across species to find common genes. BLAST was used to carry out sequence similarity searches for conserved neighbor genes across all three species. All protein-coding genes with neighbor genes located in their 5 kb flanking regions were analysed in the same fashion as neighbor genes of intergenic ncRNAs. PhastCons scores of ncRNAs identified from human (A, B), mouse (C, D) and zebrafish (E, F). (TIF) Click here for additional data file. Phylop Scores of identified ncRNAs from human (A, B), mouse (C, D) and zebrafish (E, F). (TIF) Click here for additional data file. Comparison of GERP++ scores of our ncRNAs with previously published lincRNA datsets in human. (TIF) Click here for additional data file. Comparison of phastCons scores of our ncRNAs with previously published human lincRNA datasets. (TIF) Click here for additional data file. Comparison of phyloP scores of our ncRNAs with previously published human lincRNA datasets. (TIF) Click here for additional data file. The “Treemap” view of over-represented GO terms of neighbor genes with 5′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C). Each rectangle represents a single cluster. The clusters are joined into ‘superclusters’ of loosely related terms, visualized with different colors. The size of the rectangles was adjusted to reflect the P-value (EASE score in DAVID) of the GO term, with a larger rectangle corresponding to a smaller p-value. (TIF) Click here for additional data file. The “Treemap” view of over-represented GO terms of neighbor genes with 3′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C). Each rectangle represents a single cluster. The clusters are joined into ‘superclusters’ of loosely related terms, visualized with different colors. The size of the rectangles was adjusted to reflect the P-value (EASE score in DAVID) of the GO term, with a larger rectangle corresponding to a smaller p-value. (TIF) Click here for additional data file. Over-represented GO terms for all protein-coding genes with neighbor genes within 5 kb in human. (TIF) Click here for additional data file. Genomic coordinates of predicted ncRNAs in human, mouse and zebrafish. This excel file contains genomic coordinates of predicted ncRNAs identified by our pipeline in human (sheet 1), mouse (sheet 2) and zebrafish (sheet 3). (XLSX) Click here for additional data file. Summary of human and mouse known long ncRNAs that align to ESTs. This table contains a summary of human known long ncRNAs (chromatin-based, enhancer-like and RNA-seq based) and mouse long ncRNAs (chromatin-based, RNA-seq based) mapped against ESTs. (DOCX) Click here for additional data file. Annotation of common protein-coding genes with proximate intergenic ncRNAs (<5 kb) in human, mouse and zebrafish. Sheet 1 in this excel table shows 12 conserved genes with ncRNAs at the 5′ end and sheet 2 shows 96 conserved genes with ncRNAs at the 3′ end. (XLSX) Click here for additional data file. Supporting results. (DOCX) Click here for additional data file.
  57 in total

Review 1.  Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments.

Authors:  Jason M Johnson; Stephen Edwards; Daniel Shoemaker; Eric E Schadt
Journal:  Trends Genet       Date:  2005-02       Impact factor: 11.639

2.  Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function.

Authors:  Ken C Pang; Martin C Frith; John S Mattick
Journal:  Trends Genet       Date:  2005-11-10       Impact factor: 11.639

3.  Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays.

Authors:  Philipp Kapranov; Jorg Drenkow; Jill Cheng; Jeffrey Long; Gregg Helt; Sujit Dike; Thomas R Gingeras
Journal:  Genome Res       Date:  2005-07       Impact factor: 9.043

4.  The amazing complexity of the human transcriptome.

Authors:  Martin C Frith; Michael Pheasant; John S Mattick
Journal:  Eur J Hum Genet       Date:  2005-08       Impact factor: 4.246

Review 5.  Non-coding RNA.

Authors:  John S Mattick; Igor V Makunin
Journal:  Hum Mol Genet       Date:  2006-04-15       Impact factor: 6.150

6.  Regulation of apoptosis by a prostate-specific and prostate cancer-associated noncoding gene, PCGEM1.

Authors:  Xiaoqin Fu; Lakshmi Ravindranath; Nicholas Tran; Gyorgy Petrovics; Shiv Srivastava
Journal:  DNA Cell Biol       Date:  2006-03       Impact factor: 3.311

7.  Elongation of the Kcnq1ot1 transcript is required for genomic imprinting of neighboring genes.

Authors:  Debora Mancini-Dinardo; Scott J S Steele; John M Levorse; Robert S Ingram; Shirley M Tilghman
Journal:  Genes Dev       Date:  2006-05-15       Impact factor: 11.361

Review 8.  The Air noncoding RNA: an imprinted cis-silencing transcript.

Authors:  G Braidotti; T Baubec; F Pauler; C Seidl; O Smrzka; S Stricker; I Yotova; D P Barlow
Journal:  Cold Spring Harb Symp Quant Biol       Date:  2004

9.  The transcriptional landscape of the mammalian genome.

Authors:  P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki
Journal:  Science       Date:  2005-09-02       Impact factor: 47.728

Review 10.  The mammalian transcriptome and the function of non-coding DNA sequences.

Authors:  Svetlana A Shabalina; Nikolay A Spiridonov
Journal:  Genome Biol       Date:  2004-03-25       Impact factor: 13.583

View more
  14 in total

1.  Identification of 4438 novel lincRNAs involved in mouse pre-implantation embryonic development.

Authors:  Jie Lv; Hui Liu; Shihuan Yu; Hongbo Liu; Wei Cui; Yang Gao; Tao Zheng; Geng Qin; Jing Guo; Tiebo Zeng; Zhengbin Han; Yan Zhang; Qiong Wu
Journal:  Mol Genet Genomics       Date:  2014-11-27       Impact factor: 3.291

2.  Identification and characterization of long intergenic non-coding RNAs related to mouse liver development.

Authors:  Jie Lv; Zhijun Huang; Hui Liu; Hongbo Liu; Wei Cui; Bao Li; Hongjuan He; Jing Guo; Qi Liu; Yan Zhang; Qiong Wu
Journal:  Mol Genet Genomics       Date:  2014-07-11       Impact factor: 3.291

Review 3.  Transcriptional regulation of long-term potentiation.

Authors:  Nicola Bliim; Iryna Leshchyns'ka; Vladimir Sytnyk; Michael Janitz
Journal:  Neurogenetics       Date:  2016-06-18       Impact factor: 2.660

Review 4.  Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications.

Authors:  Aurélie Kapusta; Cédric Feschotte
Journal:  Trends Genet       Date:  2014-09-11       Impact factor: 11.639

5.  Generation of a de novo transcriptome from equine lamellar tissue.

Authors:  Heather M Holl; Shan Gao; Zhangjun Fei; Caroline Andrews; Samantha A Brooks
Journal:  BMC Genomics       Date:  2015-10-03       Impact factor: 3.969

6.  Identification and functional analysis of long non-coding RNAs in mouse cleavage stage embryonic development based on single cell transcriptome data.

Authors:  Kunshan Zhang; Kefei Huang; Yuping Luo; Siguang Li
Journal:  BMC Genomics       Date:  2014-10-03       Impact factor: 3.969

7.  Genome wide discovery of long intergenic non-coding RNAs in Diamondback moth (Plutella xylostella) and their expression in insecticide resistant strains.

Authors:  Kayvan Etebari; Michael J Furlong; Sassan Asgari
Journal:  Sci Rep       Date:  2015-09-28       Impact factor: 4.379

8.  Upregulation of long noncoding RNA SPRY4-IT1 modulates proliferation, migration, apoptosis, and network formation in trophoblast cells HTR-8SV/neo.

Authors:  Yanfen Zou; Ziyan Jiang; Xiang Yu; Ming Sun; Yuanyuan Zhang; Qing Zuo; Jing Zhou; Nana Yang; Ping Han; Zhiping Ge; Wei De; Lizhou Sun
Journal:  PLoS One       Date:  2013-11-06       Impact factor: 3.240

9.  Identification of novel transcripts and noncoding RNAs in bovine skin by deep next generation sequencing.

Authors:  Rosemarie Weikard; Frieder Hadlich; Christa Kuehn
Journal:  BMC Genomics       Date:  2013-11-14       Impact factor: 3.969

Review 10.  Long non-coding RNA-dependent transcriptional regulation in neuronal development and disease.

Authors:  Brian S Clark; Seth Blackshaw
Journal:  Front Genet       Date:  2014-06-06       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.