| Literature DB >> 23284966 |
Abstract
ncRNAs (non-coding RNAs), in particular long ncRNAs, represent a significant proportion of the vertebrate transcriptome and probably regulate many biological processes. We used publically available ESTs (Expressed Sequence Tags) from human, mouse and zebrafish and a previously published analysis pipeline to annotate and analyze the vertebrate non-protein-coding transcriptome. Comparative analysis confirmed some previously described features of intergenic ncRNAs, such as a positionally biased distribution with respect to regulatory or development related protein-coding genes, and weak but clear sequence conservation across species. Significantly, comparative analysis of developmental and regulatory genes proximate to long ncRNAs indicated that the only conserved relationship of these genes to neighbor long ncRNAs was with respect to genes expressed in human brain, suggesting a conserved, ncRNA cis-regulatory network in vertebrate nervous system development. Most of the relationships between long ncRNAs and proximate coding genes were not conserved, providing evidence for the rapid evolution of species-specific gene associated long ncRNAs. We have reconstructed and annotated over 130,000 long ncRNAs in these three species, providing a significantly expanded number of candidates for functional testing by the research community.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23284966 PMCID: PMC3527520 DOI: 10.1371/journal.pone.0052275
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of procedures for ncRNA identification in human, mouse and zebrafish.
| Species | Number of ESTs | Number of assembled transcripts | Mapped to RefSeqs | Mapped to Swiss-Prot | With long ORFs | Putative ncRNAs | Reconstructed ncRNAs |
| Human | 8,314,483 | 1,037,755 | 44,245 | 135,073 | 130,291 | 105,994 | 87,173 |
| Mouse | 4,853,460 | 1,356,763 | 382,852 | 3,911 | 60,342 | 45,975 | 36,280 |
| Zebrafish | 1,481,936 | 262,387 | 117,337 | 1,828 | 10,778 | 11,323 | 9,877 |
Due to the large number of ESTs from human, we ran BLAST for all ESTs against human RefSeqs before assembly and removed all high confident ESTs (coverage >90% and identity >90%). This makes the “Number of assembled transcripts” and “Mapped to RefSeqs” smaller than expected.
Figure 1Percentage of intergenic, intronic and overlapped ncRNAs in human, mouse and zebrafish.
Classification of ncRNAs.
| Species | Number of UTR-related ncRNAs | Number of intergenic ncRNAs | Number of intronic ncRNAs | Number of overlapped ncRNAs |
| Human | 3,438 | 20,268 | 55,601 | 10,724 |
| Mouse | 2,179 | 9,490 | 21,541 | 4,414 |
| Zebrafish | 2,031 | 4,464 | 2,514 | 1,010 |
Figure 2Biased positional distribution of intergenic ncRNAs with respect to neighbor protein-coding genes in human, mouse and zebrafish.
The top 2 panels (A & B) are from human, the middle 2 panels (C & D) are from mouse and the bottom 2 panels (E & F) are from zebrafish. A, C and E show the positional distribution of 5′ or 3′ end ncRNAs. B, D and F show the positional distribution of ncRNAs in terms of transcription orientation compared to neighbor genes.
Figure 3Overlap of our predicted ncRNAs with known human or mouse long ncRNAs from different datasets.
A shows the overlap of our ncRNAs with three different human lincRNA datasets. B shows the overlap of our ncRNAs with mouse long ncRNA datasets. “Chromatin based”: lincRNAs identified based on chromatin-state maps [10], [11]. “Enhancer like”: long intergenic ncRNAs identified based on GENCODE [25]. “RNA-seq based”: long ncRNAs identified by reconstruction of RNA-seq data in human. “ES”, “NPC” and “MLF”: long ncRNAs identified by construction of RNA-seq data from 3 different mouse cell types.
Overlap of EST-based ncRNAs with previously identified ncRNAs*.
| Dataset | Number ofintronic ncRNAs | Number ofoverlapped ncRNAs | Number of UTR-related RNAs | Number of intergenicncRNAs (Percentage | In total |
| Chromatin-based lincRNAs(human) | 21 | 8 | 15 | 391/1.93% | 435 |
| Enhancer-like long ncRNAs(human) | 22 | 10 | 32 | 945/4.66% | 1,009 |
| RNA-seq-based lincRNAs(human) | 11 | 19 | 83 | 1,484/7.32% | 1,597 |
| LincRNAs from ES (mouse) | 26 | 13 | 15 | 108/1.14% | 162 |
| lincRNAs from MLF (mouse) | 40 | 9 | 11 | 70/0.74% | 130 |
| LincRNAs from NPC (mouse) | 30 | 14 | 15 | 125/1.32% | 184 |
| Chromatin-based lincRNAs(mouse) | 27 | 87 | 59 | 293/3.09% | 466 |
| RNA-seq-based longncRNAs (zebrafish) | 16 | 12 | 28 | 105/2.36% | 161 |
Numbers in this table are shown as our EST-based ncRNAs.
The percentage is based on the number of all intergenic ncRNAs as shown in table 2.
Figure 4Comparisons of known long ncRNAs mapped by ESTs or non-repeat ESTs in human and mouse.
“Chromatin based”: lincRNAs identified based on chromatin-state maps [10], [11]. “Enhancer like”: long intergenic ncRNAs identified based on GENCODE [25]. “RNA-seq based”: long ncRNAs identified by reconstruction of RNA-seq data in human. “ES”, “NPC” and “MLF”: long ncRNAs identified by construction of RNA-seq data from 3 different mouse cell types.
Figure 5GERP++ score for ncRNAs identified from human, mouse and zebrafish.
A and B are from human. C and D are from mouse. E and F are from zebrafish.
Figure 6Over-represented GO terms of neighbor genes of 5′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C).
The bubble color indicates the P-value (EASE score from DAVID); bubble size indicates the frequency of the GO term in the underlying GOA database. Highly similar GO terms are linked by edges in the graph. Regulatory GO terms were highlighted with cyan-like colors, and developmental-associated GO terms were highlighted with gold colors.
Figure 7Over-represented GO terms of neighbor genes of 3′ end gene-proximate intergenic ncRNAs in human (A), mouse (B) and zebrafish (C).
The bubble color indicates the P-value (EASE score from DAVID); bubble size indicates the frequency of the GO term in the underlying GOA database. Highly similar GO terms are linked by edges in the graph. Regulatory GO terms were highlighted with cyan-like colors, and developmental-associated GO terms were highlighted with gold colors.
Figure 8Venn diagrams show the conserved neighbor genes proximate to intergenic ncRNAs from human, mouse and zebrafish.
A shows the intersection of neighbor genes with ncRNAs at their 5′ end. B shows the intersection of neighbor genes with ncRNAs at their 3′ end.
Human genes conserved in mouse and zebrafish with proximate intergenic ncRNAs at their 5′ end (<5 kb).
| Official_gene symbol | Expression inbrain (Human) | Aliases & Descriptions | Diseases disorders | Related ncRNAs |
| MAN1A1 | Yes | Processing alpha-1,2-mannosidase IA | MAN9 |processing alpha-1,2-mannosidase IA | mannosyl-oligosaccharide 1,2-alpha-mannosidase IA |mannosidase, alpha, class 1A, member 1 | Man(9)-alpha-mannosidase | man(9)-alpha-mannosidase |Mannosidase alpha class 1A member 1 |HUMM3 |alpha-1,2-mannosidase IA | Alpha-1,2-mannosidase IA |Man9-mannosidase | HUMM9 |EC 3.2.1.113 | Mannosidasedeficiency disease | N/A |
| MAN1A2 | Yes | mannosidase, alpha, class 1A, member 2 |alpha-1,2-mannosidase IB | Mannosidase alpha class 1A member2 | mannosyl-oligosaccharide 1,2-alpha-mannosidase IB |alpha1,2-mannosidase | Processing alpha-1,2-mannosidase IB | processing alpha-1,2-mannosidase IB |MAN1B | Alpha-1,2-mannosidase IB |EC 3.2.1.113 | N/A | N/A |
| ONECUT2 | Yes | OC2 | hepatocyte nuclear factor 6-beta |ONECUT-2homeodomain transcription factor | HNF6B | One cuthomeobox 2 | HNF-6-beta | Hepatocyte nuclear factor6-beta | onecut 2 | OC-2 | one cut domain, familymember 2 | transcription factor ONECUT-2 | one cutdomain family member 2 | Transcription factorONECUT-2 | one cut homeobox 2 | Oral cancer | Target of miR-9 |
| PANK2 | Yes | hPanK2 | pantothenate kinase 2 | FLJ11729 |neurodegeneration with brain iron accumulation 1(Hallervorden-Spatz syndrome) | NBIA1 |Hallervorden-Spatz syndrome | HARP | HSS | Pantothenic acid kinase2 | C20orf48 | pantothenic acid kinase 2 | PKAN |pantothenate kinase 2, mitochondrial |EC 2.7.1.33 | Hallervorden-Spatz syndrome|dementia |dystonia | Host of miR-103 |
| KCNJ4 | Yes | IRK-3 | hIRK2 | IRK3 | inward rectifier K(+) channel Kir2.3| Potassium channel, inwardly rectifying subfamily Jmember 4 | HRK1 | HIRK2 | potassium channel, inwardlyrectifying subfamily J member 4 |hippocampal inwardrectifier potassium channel | potassium inwardly-rectifying channel, subfamily J, member 4 |Hippocampal inward rectifier | inward rectifier K+channel Kir2.3 | HIR | inward rectifier potassium channel4 | Kir2.3 | Inward rectifier K(+) channel Kir2.3 | N/A | N/A |
| PDCD6IP | Yes | apoptosis-linked gene 2-interacting protein X |dopamine receptor interacting protein 4 | ALIX |programmed cell death 6 interacting protein | ALG-2-interacting protein 1 | programmed cell death 6-interacting protein | PDCD6-interacting protein | Hp95 |KIAA1375 | Alix | HP95 | AIP1 |ALG-2 interacting protein1 | DRIP4 | N/A | Target ofmiR-1225-5P |
| SNX14 | Yes | sorting nexin 14 | RGS-PX2 |sorting nexin-14 | N/A | N/A |
| TUBB2B | Yes | tubulin beta-2B chain | tubulin, beta polypeptideparalog | MGC8685 | bA506K6.1 | tubulin, beta 2Bclass IIb | DKFZp566F223 | tubulin, beta 2B | classIIb beta-tubulin |class II beta-tubulin isotype | Lissencephaly | N/A |
| ZNF41 | Yes | TUBB |class IIa beta-tubulin | tubulin, beta 2Aclass IIa | TUBB2 | tubulin, beta polypeptide 2 | tubulin,beta 2 | TUBB2B | dJ40E16.7 | tubulin beta-2A chain |tubulin, beta polypeptide | tubulin, beta 2A | Aland Island eye disease |mental disorder|intellectual disability | N/A |
| ZNF595 | Yes | MRX89 |MGC8941 | zinc finger protein 41 | N/A | N/A |
| ZNF676 | Yes | FLJ31740 | zinc finger protein 595 | N/A | N/A |
| ZNF761 | No | zinc finger protein 676 | N/A | N/A |
The expression and disease annotation were based on GeneCards V3 [57].
GO terms in common from human, mouse and zebrafish neighbor genes within 5kb of proximate ncRNAs at their 5′ end.
| Category | Term |
| P value (mouse) | P value (zebrafish) |
| Molecular Function | GO:0003700∼transcription factoractivity | 6.88E-07 | 0.001685935 | 0.002045234 |
| Molecular Function | GO:0030528∼transcriptionregulator activity | 2.80E-06 | 2.50E-05 | 0.001720193 |
| Biological Process | GO:0006355∼regulation oftranscription, DNA-dependent | 4.53E-06 | 0.000108619 | 0.02130028 |
| Biological Process | GO:0051252∼regulation of RNAmetabolic process | 7.91E-06 | 0.000178503 | 0.023870388 |
| Biological Process | GO:0010556∼regulation ofmacromolecule biosyntheticprocess | 8.37E-06 | 4.96E-07 | 0.000915362 |
| Biological Process | GO:0060255∼regulation ofmacromolecule metabolicprocess | 5.89E-05 | 7.41E-06 | 0.00691373 |
| Biological Process | GO:0045449∼regulation oftranscription | 6.20E-05 | 2.37E-06 | 0.001790827 |
| Biological Process | GO:0031326∼regulation ofcellular biosynthetic process | 8.41E-05 | 1.10E-06 | 0.001054761 |
| Biological Process | GO:0009889∼regulation ofbiosynthetic process | 0.000119902 | 1.33E-06 | 0.001088173 |
| Biological Process | GO:0080090∼regulation ofprimary metabolic process | 0.000146447 | 6.89E-07 | 0.002903755 |
| Biological Process | GO:0010468∼regulation ofgene expression | 0.000154686 | 1.42E-06 | 0.002943972 |
| Biological Process | GO:0031323∼regulation ofcellular metabolic process | 0.00015819 | 4.08E-06 | 0.002422663 |
| Biological Process | GO:0019219∼regulation ofnucleobase, nucleoside,nucleotide and nucleic acidmetabolic process | 0.000321532 | 7.14E-06 | 0.002751033 |
| Biological Process | GO:0051171∼regulation ofnitrogen compound metabolicprocess | 0.000343647 | 6.14E-06 | 0.002831208 |
| Biological Process | GO:0019222∼regulation ofmetabolic process | 0.000349372 | 1.09E-05 | 0.011044253 |
| Biological Process | GO:0050794∼regulation ofcellular process | 0.001348476 | 0.000766239 | 0.009737321 |
| Biological Process | GO:0050789∼regulation ofbiological process | 0.00433817 | 0.001382295 | 0.033481278 |
| Biological Process | GO:0065007∼biologicalregulation | 0.022428992 | 0.002031998 | 0.031603795 |
| Biological Process | GO:0007275∼multicellularorganismal development | 0.035916788 | 0.000243142 | 0.043621824 |
The GO terms were ordered by p-value in human.
GO terms in common from human, mouse and zebrafish neighbor genes within 5kb of proximate ncRNAs at their 3′ end.
| Category | Term |
| P value (mouse) | P value (zebrafish) |
| Molecular Function | GO:0003677∼DNA binding | 2.52E-07 | 0.001016369 | 0.022517442 |
| Biological Process | GO:0019222∼regulation of metabolic process | 5.94E-06 | 0.001833053 | 0.007240134 |
| Biological Process | GO:0031323∼regulation of cellular metabolic process | 7.06E-06 | 0.001932015 | 0.002531781 |
| Biological Process | GO:0080090∼regulation of primary metabolic process | 8.71E-06 | 0.000746433 | 0.001635905 |
| Biological Process | GO:0060255∼regulation of macromoleculemetabolic process | 1.52E-05 | 0.001021052 | 0.015088588 |
| Cellular Component | GO:0044464∼cell part | 2.64E-05 | 0.005138983 | 0.021192768 |
| Cellular Component | GO:0005623∼cell | 2.75E-05 | 0.005138983 | 0.021192768 |
| Biological Process | GO:0009889∼regulation of biosynthetic process | 4.64E-05 | 0.00153235 | 0.001998668 |
| Biological Process | GO:0010556∼regulation of macromolecule biosynthetic process | 5.07E-05 | 0.001133669 | 0.004636373 |
| Biological Process | GO:0031326∼regulation of cellular biosynthetic process | 5.93E-05 | 0.001770385 | 0.002769539 |
| Biological Process | GO:0010468∼regulation of gene expression | 6.05E-05 | 0.001153647 | 0.019089475 |
| Biological Process | GO:0019219∼regulation of nucleobase, nucleoside,nucleotide and nucleic acid metabolic process | 7.45E-05 | 0.002835006 | 0.006403442 |
| Biological Process | GO:0045449∼regulation of transcription | 9.02E-05 | 0.001133423 | 0.009147674 |
| Biological Process | GO:0051171∼regulation of nitrogen compound metabolic process | 0.000115522 | 0.003953563 | 0.006560818 |
| Molecular Function | GO:0003700∼transcription factor activity | 0.000701959 | 0.006403948 | 0.003113804 |
| Biological Process | GO:0051252∼regulation of RNA metabolic process | 0.002751656 | 0.012593576 | 0.006423226 |
| Biological Process | GO:0006355∼regulation of transcription, DNA-dependent | 0.002836401 | 0.008313995 | 0.007792617 |
| Molecular Function | GO:0030528∼transcription regulator activity | 0.003105196 | 0.00782068 | 0.001014153 |
| Biological Process | GO:0031328∼positive regulation of cellular biosynthetic process | 0.007428451 | 0.007226598 | 0.033533698 |
| Biological Process | GO:0009891∼positive regulation of biosynthetic process | 0.007469104 | 0.008740921 | 0.033533698 |
| Biological Process | GO:0010557∼positive regulation of macromolecule biosynthetic process | 0.009196945 | 0.003489005 | 0.028269774 |
| Biological Process | GO:0010628∼positive regulation of gene expression | 0.010415711 | 0.009098997 | 0.021490484 |
| Biological Process | GO:0045941∼positive regulation of transcription | 0.011143783 | 0.00569233 | 0.021490484 |
| Molecular Function | GO:0005515∼protein binding | 0.017163574 | 0.000809527 | 1.60E-06 |
| Biological Process | GO:0045893∼positive regulation of transcription, DNA-dependent | 0.02105859 | 0.004978895 | 0.012497621 |
| Molecular Function | GO:0008270∼zinc ion binding | 0.022962024 | 0.003010259 | 0.036242576 |
| Biological Process | GO:0048869∼cellular developmental process | 0.024154786 | 0.006314016 | 9.66E-07 |
| Biological Process | GO:0051254∼positive regulation of RNA metabolic process | 0.024566919 | 0.005669422 | 0.014428949 |
| Biological Process | GO:0030154∼cell differentiation | 0.02953709 | 0.007655265 | 1.65E-06 |
| Biological Process | GO:0045935∼positive regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process | 0.03326329 | 0.011738803 | 0.039427105 |
| Biological Process | GO:0048468∼cell development | 0.033319932 | 0.007737614 | 0.003006631 |
| Biological Process | GO:0051173∼positive regulation of nitrogen compound metabolic process | 0.033319932 | 0.012196797 | 0.04261773 |
| Biological Process | GO:0044267∼cellular protein metabolic process | 0.042639534 | 0.003735008 | 0.011732507 |
| Biological Process | GO:0001655∼urogenital system development | 0.048304941 | 0.012438853 | 0.04591464 |
The GO terms were ordered by p-value in human.
Previously annotated long ncRNA datasets used for comparison.
| Dataset | Number of ncRNAs | Source | Method | Reference |
| Chromatin-based lincRNAs (Human) | 4,860 | 10 cell types | Chromatin signatureidentification (K4–K36 domain) | Khalil AM, 2009 |
| Enhancer-like long ncRNAs (Human) | 3,011 | Multiple | Screening from GENCODEannotation | Orom UA, 2010 |
| RNA-seq-based lincRNAs (Human) | 8,195 | 24 tissues and cell types | Screening from assembledRNA-seq data | Cabili MN, 2011 |
| Chromatin-based lincRNAs (Mouse) | 2,127 | 4 cell types | Chromatin signatureidentification (K4–K36 domain) | Guttman M, 2009 |
| RNA-seq-based lincRNAs (Mouse) | 1,140 | 3 cell types | Screening from assembledRNA-seq data | Guttman M, 2010 |
| RNA-seq-based long ncRNAs (Zebrafish) | 1,133 | 8 embryonic stages | Screening from assembledRNA-seq data | Pauli A, 2011 |
These are the exons identified by microarray from non-coding k4-k36 domains.