| Literature DB >> 19895688 |
Ana C Marques1, Chris P Ponting.
Abstract
BACKGROUND: Despite increasing interest in the noncoding fraction of transcriptomes, the number, species-conservation and functions, if any, of many non-protein-coding transcripts remain to be discovered. Two extensive long intergenic noncoding RNA (ncRNA) transcript catalogues are now available for mouse: over 3,000 macroRNAs identified by cDNA sequencing, and 1,600 long intergenic noncoding RNA (lincRNA) intervals that are predicted from chromatin-state maps. Previously we showed that macroRNAs tend to be more highly conserved than putatively neutral sequence, although only 5% of bases are predicted as constrained. By contrast, over a thousand lincRNAs were reported as being highly conserved. This apparent difference may account for the surprisingly small fraction (11%) of transcripts that are represented in both catalogues. Here we sought to resolve the reported discrepancy between the evolutionary rates for these two sets.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19895688 PMCID: PMC3091318 DOI: 10.1186/gb-2009-10-11-r124
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Evolutionary signatures and other properties of ncRNAs
| macroRNA | lincRNA | |
|---|---|---|
| Transcripts or intervals | 3,051 | 1,675 |
| Exons | 5,893 | 2,126 |
| Mouse-human exonic alignments ≥100 bp | 3,537 | 1,784 |
| Median | 0.418 | 0.426 |
| Median G+C fraction | 0.418 | 0.430 |
| Median | 0.887 | 0.904 |
| Fold enrichment of IPSs | 1.73 | 1.77 |
| Fold enrichment of PhastCons 17-way | 2.2 | 2.6 |
| Fold enrichment of Evofold predictions | 1.4 | 0.8 |
| Fold enrichment with transcription factor genes* | 1.7 | 2.4 |
| Rare alleles/intermediate alleles | 978/2,675 | 369/1,110 |
| Constrained exons | 1393 | 595 |
| Promoters | 1802 | 504 |
| Mouse-human promoter alignments ≥100 bp | 1477 | 460 |
| Median | 0.366 | 0.402 |
| Median | 0.787 | 0.857 |
*Measured as the fold-enrichment within protein coding gene territories (see Materials and methods) associated with the Gene Ontology term 'regulation of transcription'.
Figure 1G+C content of mouse ncRNA exons and ancestral repeats. The figure shows the cumulative distribution of G+C fraction as measured for macroRNA exons (red), lincRNA exons (black) and ancestral repeats (blue). LincRNAs tend to have higher G+C contents than macroRNAs. Ancestral repeats tend to possess a low G+C content.
Figure 2Substitution rates of ncRNA and protein-coding genes. The cumulative distributions of substitution rate for (a) exons and (b) promoters as measured for macroRNAs (red), lincRNAs (black) and protein-coding genes (blue). MacroRNA and lincRNA exons exhibit similar degrees of constraint and appear to evolve faster than protein-coding exons. Protein-coding gene promoters evolve under stronger constraint than ncRNA exons. MacroRNA promoters have lower substitution rates than lincRNA promoters.
Noncoding RNA expression properties
| macroRNA | lincRNA | |
|---|---|---|
| Conserved exons | 4,401 | 2,103 |
| Conserved transcribed exons | 641 | 446 |
| Exons with expression data [ | 1,111 | 230 |
| Median AD value | 286.4 | 311.5 |
| Tissue-specific exons | 145 | 15 |
| Median maximum | 0.056 | 0.052 |
Figure 3Distribution of highly conserved sequence across ncRNA exon sequences. Examples of phastCons elements (as in [38]) within (a) lincRNA (located on chromosome 10, 68730506-68731547) and (b) macroRNA (located on chromosome 1, 47378880-47380310) exons. Blue histograms represent the conservation in 17 vertebrates based on a phylogenetic hidden Markov model [13]. Green histograms represent pairwise conservation to other vertebrate species. Images have been taken from the UCSC genome browser.