| Literature DB >> 27429005 |
Diana Chernikova1, David Managadze2, Galina V Glazko3, Wojciech Makalowski4, Igor B Rogozin5,6.
Abstract
The abundance of mammalian long intergenic non-coding RNA (lincRNA) genes is high, yet their functions remain largely unknown. One possible way to study this important question is to use large-scale comparisons of various characteristics of lincRNA with those of protein-coding genes for which a large body of functional information is available. A prominent feature of mammalian protein-coding genes is the high evolutionary conservation of the exon-intron structure. Comparative analysis of putative intron positions in lincRNA genes from various mammalian genomes suggests that some lincRNA introns have been conserved for over 100 million years, thus the primary and/or secondary structure of these molecules is likely to be functionally important.Entities:
Keywords: exon; genomic alignments; intron; intron gain; intron loss; lincRNA; non-coding RNA
Year: 2016 PMID: 27429005 PMCID: PMC5041003 DOI: 10.3390/life6030027
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Statistics of lincRNA datasets.
| Features of lincRNA Genes | Mouse | Human |
|---|---|---|
| Number of all lincRNAs | 2,390 | 589 |
| Number of intron-containing lincRNAs | 979 | 245 |
| Number of exons | 3,439 | 1,194 |
| Number of introns | 2,462 | 949 |
| Number of exons shorter than 15 nt | 41 | 7 |
| Number of introns per lincRNA | 2.52 | 3.86 |
| Average gene length, nt (standard error) | 11,775 (712) | 17,192 (1,921) |
| Median gene length, nt | 2,535 | 2,626 |
| Average exon length, nt (standard error) | 524 (21) | 409 (48) |
| Median exon length, nt | 464 | 356 |
| Average intron length, nt (standard error) | 9,621 (1,631) | 10,562 (4,539) |
| Median intron length, nt | 2,615 | 2,116 |
Conservation of splicing signals (pairwise comparisons between mouse or human and other vertebrates). The number of (putative) orthologs is the number of mouse/human lincRNAs that have an orthologous sequence in other species with the total alignment length ≥ 200 nucleotides. Number of mismatches is the number of dinucleotides different from GT/GC (donor sites) or AG (acceptor sites) in the orthologous positions of alignments.
| Species | Common Name (Number of Orthologs) | Splice Site Pairwise Comparison with Mouse or Human as a Reference | |||||
|---|---|---|---|---|---|---|---|
| Donor Splicing Site (GT or GC dinucleotide) | Acceptor Splicing Site (AG dinucleotide) | ||||||
| Number of Matches | Number of Mismatches | Percent Matches | Number of Matches | Number of Mismatches | Percent Matches | ||
| Rat (2285) | 1555 | 569 | 73% | 1448 | 669 | 68% | |
| Rabbit (1522) | 518 | 258 | 67% | 419 | 306 | 58% | |
| Human (2091) | 902 | 619 | 59% | 746 | 715 | 51% | |
| Chimp (2068) | 826 | 606 | 58% | 703 | 692 | 50% | |
| Macaque (1971) | 807 | 543 | 60% | 682 | 647 | 51% | |
| Cow (1815) | 694 | 402 | 63% | 560 | 498 | 53% | |
| Dog (1897) | 714 | 512 | 58% | 627 | 581 | 52% | |
| Elephant (1485) | 499 | 247 | 67% | 428 | 312 | 58% | |
| Tenrec (1256) | 368 | 179 | 67% | 283 | 193 | 59% | |
| Fugu (203) | 36 | 28 | 56% | 24 | 28 | 46% | |
| Opossum (1068) | 249 | 169 | 60% | 162 | 150 | 52% | |
| Armadillo (1426) | 469 | 260 | 64% | 382 | 322 | 54% | |
| Chicken (472) | 113 | 36 | 76% | 75 | 43 | 64% | |
| Zebrafish (207) | 44 | 27 | 62% | 26 | 32 | 45% | |
| Tetraodon (226) | 46 | 24 | 66% | 29 | 28 | 51% | |
| Frog (312) | 74 | 37 | 67% | 51 | 40 | 56% | |
| Chimp (575) | 870 | 19 | 98% | 867 | 15 | 98% | |
| Macaque (564) | 800 | 53 | 94% | 828 | 42 | 95% | |
| Mouse (488) | 368 | 120 | 75% | 364 | 105 | 78% | |
| Rat (476) | 369 | 112 | 77% | 342 | 102 | 77% | |
| Rabbit (463) | 445 | 86 | 84% | 415 | 114 | 78% | |
| Cow (527) | 531 | 122 | 81% | 484 | 144 | 77% | |
| Dog (476) | 546 | 121 | 82% | 543 | 118 | 82% | |
| Elephant (458) | 364 | 82 | 82% | 341 | 83 | 80% | |
| Tenrec (419) | 196 | 59 | 77% | 175 | 68 | 72% | |
| Armadillo (468) | 362 | 95 | 79% | 320 | 122 | 72% | |
| Opossum (287) | 213 | 35 | 86% | 189 | 62 | 75% | |
| Chicken (131) | 33 | 10 | 77% | 23 | 18 | 56% | |
| Fugu (80) | 48 | 7 | 87% | 51 | 11 | 82% | |
| Zebrafish (79) | 43 | 7 | 86% | 44 | 9 | 83% | |
| Tetraodon (87) | 49 | 16 | 75% | 52 | 18 | 74% | |
| Frog (89) | 29 | 4 | 88% | 29 | 10 | 74% | |
Figure 1Example of the intron presence/absence matrix. 100% conserved introns are shown in bold. Introns that were used for phylogenetic reconstructions (conserved introns) are underlined in the out-group sequence.
Figure 2Evolution of exon-intron structure in six Glires and primate species using positions of mouse introns as the reference gene structure. A total of 363 intron positions from mouse lincRNA genes were used for this reconstruction. A total of 124 conserved intron positions (intron positions that were present in the mouse lincRNA genes and in the orthologous position of primates and/or out-group consensus sequence, underlined in the Figure 1) were found. “-” means that loss of introns in mouse cannot be detected because mouse introns were used as a reference in our reconstructions.
Figure 3Evolution of exon-intron structure in six primate and Glires species using positions of human introns as the reference gene structure. A total of 656 intron positions from human lincRNA genes were used for this reconstruction. A total of 509 conserved intron positions (intron positions that were present in the mouse lincRNA genes and in the orthologous position of Glires and/or outgroup consensus sequence, underlined in the Figure 1) were found. “-” means that loss of introns in human cannot be detected because human introns were used as a reference in our reconstructions.