| Literature DB >> 17288572 |
Zhaolei Zhang1, Andy Wing Chun Pang, Mark Gerstein.
Abstract
BACKGROUND: Widespread transcription activities in the human genome were recently observed in high-resolution tiling array experiments, which revealed many novel transcripts that are outside of the boundaries of known protein or RNA genes. Termed as "TARs" (Transcriptionally Active Regions), these novel transcribed regions represent "dark matter" in the genome, and their origin and functionality need to be explained. Many of these transcripts are thought to code for novel proteins or non-protein-coding RNAs. We have applied an integrated bioinformatics approach to investigate the properties of these TARs, including cross-species conservation, and the ability to form stable secondary structures. The goal of this study is to identify a list of potential candidate sequences that are likely to code for functional non-protein-coding RNAs. We are particularly interested in the discovery of those functional RNA candidates that are primate-specific, i.e. those that do not have homologs in the mouse or dog genomes but in rhesus.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17288572 PMCID: PMC1796608 DOI: 10.1186/1471-2148-7-S1-S14
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Distribution of transcriptionally active regions (TARs), categorized by their distances from the nearest gene annotations
| Annotation on the opposite strand | |||||||
| Distal | 10 kb | 1 kb | exon | intron | Total | ||
| Annotation on the same strand as TAR4 | 1 Distal | 3,695 (21.5%) | 537 (3.1%) | 149 (0.9%) | 861 (5.0%) | 1,414 (8.2%) | 6,656 (38.7%) |
| 210 kb | 799 (4.6%) | 224 (1.3%) | 59 (0.3%) | 265 (1.5%) | 146 (0.8%) | 1,493 (8.7%) | |
| 31 kb | 347 (2.0%) | 92 (0.5%) | 32 (0.2%) | 52 (0.3%) | 20 (0.1%) | 543 (3.2%) | |
| exon | 4,637 (27.0%) | 1,454 (8.4%) | 345 (2.0%) | 120 (0.7%) | 172 (1.0%) | 6,728 (39.1%) | |
| intron | 1,554 (9.0%) | 137 (0.8%) | 24 (0.1%) | 35 (0.2%) | 28 (0.2%) | 1,778 (10.3%) | |
| Total: | 11,032 (64.1%) | 2,444 (14.2%) | 609 (3.5%) | 1,333 (7.8%) | 1,780 (10.4%) | 17,198 | |
1Distal: distance from nearest annotated exon > 10 kb
210 kb: distance from nearest annotated exon is less than 10 kb but longer than 1 kb
31 kb: distance from nearest annotated exon is less than 1 kb but do not overlap with exons
4 Genome annotation was based on NCBI assembly 34 downloaded from Ensembl.
Distribution of human genomic regions, categorized by annotations on both strands (Mb = megabases)
| Annotation on the antisense strand (Mb) | |||||||
| Distal | 10 kb | 1 kb | exon | intron | Total | ||
| Annotation on the sense strand 1 | Distal | 1,655 (54.%) | 130 (4.2%) | 17.6 (0.6%) | 37 (1.2%) | 438 (14.2%) | 2,279 (74%) |
| 10 kb | 133 (4.3%) | 26 (0.9%) | 49 (0.1%) | 7 (0.2%) | 34 (1.1%) | 206 (7%) | |
| 1 kb | 17 (0.6%) | 4 (0.1%) | 1 (0.03%) | 1 (0.04%) | 2.8 (0.1%) | 27 (0.9%) | |
| exon | 44 (1.4%) | 7.6 (0.25%) | 1 (0.04%) | 1.1 (0.04%) | 3.8 (0.12%) | 58 (1.9%) | |
| intron | 450 (14.6%) | 35 (1.2 %) | 3.0 (0.1%) | 4 (0.1%) | 12 (0.4%) | 505 (16%) | |
| Total: | 2301 (75%) | 204 (6.7%) | 27 (0.9%) | 50.7 (1.6%) | 491 (16%) | 3,076 | |
1Annotation is the same as in Table 1.
Total length and density of TARs in different types of genomic regions1
| Annotation on the antisense strand | |||||||
| Distal | 10 kb | 1 kb | exon | intron | Total | ||
| Annotation on the sense strand2 | Distal | 950,963 (574) | 137,168 (1,048) | 38,856 (2,202) | 223,699 (6,012) | 364,308 (831) | 1,715,000 (753) |
| 10 kb | 212,945 (1,596) | 61,991 (2,334) | 15,186 (3,584) | 69,413 (9,683) | 37,434 (1,077) | 396,969 (1,930) | |
| 1 kb | 97,974 (5,454) | 24,062 (5631) | 8,793 (8,647) | 14,475 (13,391) | 4,960 (1,751) | 150,264 (5,570) | |
| exon | 1,370,991 (30,952) | 422,364 (55,666) | 103,263 (90,951) | 33,079 (29,331) | 53,984 (1,4043) | 1,983,681 (34,200) | |
| intron | 398,063 (884) | 34,368 (970) | 6,230 (2,104) | 8,752 (2,107) | 7,059 (586) | 454,472 (900) | |
| Total: | 3,030,936 (1,320) | 679,953 (3,330) | 172,328 (6,380) | 349,418 (6,890) | 467,745 (953) | ||
1Numbers in the brackets are the normalized densities of the TARs, i.e. number of transcribed nucleotides per megabase of genomic DNA
2Annotation is the same as in Table 1(A)
Noncoding RNAs that are known to have medical implications.
| H19 RNA | Tumor suppressor | [51] |
| BIC | Hodgkin lymphoma | [52, 53] |
| DLEU2, | lymphocytic leukaemia | [54]; |
| NCRMS | rhabdomyosarcoma | [55] |
| 7H4 | postnatal development | [56], |
| NRSE/REST | found in adult neural stem cells | [57]. |
Figure 1A flow chart of the analysis pipeline.
BLAST matches in the other genomes and databases (E-value < 0.01)
| Human EST library | 12,021 | 70% |
| Human cDNA library | 7,546 | 44% |
| Chimpanzee ( | 15,757 | 92% |
| Macaque cDNA library | 5,955 | 35% |
| Mouse genome | 8,011 | 47% |
| Mouse EST library | 7,546 | 44% |
| Rat genome | 7,691 | 45% |
| * Rodents: mouse ∪ rat | 9,184 | 53% |
| Dog ( | 7,924 | 46% |
| Chicken ( | 3,600 | 21% |
| Frog ( | 2,279 | 13% |
| Japanese pufferfish ( | 2,254 | 13% |
| Green spotted pufferfish ( | 2,069 | 12% |
| ** Pufferfish: | 2867 | 17% |
| Sea squirt ( | 607 | 4% |
| Rfam | 252 | 1% |
| RNAdb | 1,995 | 12% |
| mirBase | 138 | 1% |
| NONCODE | 379 | 2% |
| *** Rfam ∪ RNAdb ∪ mirBase ∪ NONCODE | 2,637 | 15% |
* number of TARs that have hits in either mouse or rat genome, The symbol ∪ represents the union of two sets
** number of TARs that have hits in either species of pufferfish
*** number of TARs that have hits in either of the noncoding RNA database
Conservation profiles of TARs among vertebrates
| Human ∩ Chimp | 15,757 | 92% |
| Human ∩ Chimp ∩ rodents | 8,828 | 51% |
| Human ∩ Chimp ∩ rodents ∩ dog | 5,988 | 35% |
| Human ∩ Chimp ∩ rodents ∩ dog ∩ chicken | 2,435 | 14% |
| Human ∩ Chimp ∩ rodents ∩ dog ∩ chicken ∩ pufferfish | 1,244 | 7% |
| Human ∩ Chimp ∩ rodents ∩ dog ∩ chicken ∩ pufferfish ∩ sea squirt | 276 | 2% |
| Human ∩ macaque | 5,955 | 35% |
| Human ∩ macaque ∩ chimp | 5,815 | 34% |
| Human ∩ Rodent | 8,776 | 51% |
| Human ∩ Rodent ∩ dog | 6,166 | 36% |
| Human ∩ Rodents ∩ dog ∩ chicken | 2,493 | 14% |
| Human ∩ Rodent ∩ dog ∩ chicken ∩ pufferfish | 1,270 | 7% |
| Human ∩ Rodent ∩ dog ∩ chicken ∩ pufferfish ∩ sea squirt | 276 | 2% |
| Num. of TARs in human AND chimp, but NOT in rodents | 6,929 | 40% |
| Num. of TARs in human AND chimp, but NOT in any other vertebrates | 4,806 | 28% |
| Num. of TARs in human and chimp, but NOT in rodents, and NOT in databases (Rfam, RNAdb, mirBase, NONCODE) | 6,132 | 36% |
| Num. of TARs in human and chimp, but NOT in any other genomes, and NOT in databases (Rfam, RNAdb, mirBase, NONCODE) | 4,574 | 27% |
* ∪ represents the union of two or more sets
Conservation of TARs in chimpanzee and rodents, categorized by their distance to known genes on both strands
| Distal | 10 kb | 1 kb | exon | intron | ||
| Found in human AND chimp (15,757) | Distal | |||||
| 10 kb | 724 (4.6%) | 197 (1.3%) | 56 (0.4%) | 232 (1.5%) | 123 (0.8%) | |
| 1 kb | 327 (2.1%) | 87 (0.6%) | 30 (0.2%) | 46 (0.3%) | 18 (0.1%) | |
| exon | 4492 (28.5%) | 1368 (8.7%) | 323 (2.0%) | 116 (0.7%) | 165 (1.0%) | |
| intron | 1358 (8.6%) | 115 (0.7%) | 21 (0.1%) | 31 (0.2%) | 25 (0.2%) | |
| Distal | 10 kb | 1 kb | Exon | Intron | ||
| Found in human AND mouse (8,776) | Distal | |||||
| 10 kb | 340 (3.9%) | 79 (0.9%) | 20 (0.2%) | 198 (2.3%) | 33 (0.4%) | |
| 1 kb | 136 (1.5%) | 42 (0.5%) | 9 (0.1%) | 25 (0.3%) | 10 (0.1%) | |
| exon | 3398 (38.7%) | 1028 (11.7%) | 217 (2.5%) | 96 (1.1%) | 125 (1.4%) | |
| intron | 419 (4.8%) | 29 (0.3%) | 6 (0.1%) | 24 (0.3%) | 10 (0.1%) | |
| Distal | 10 kb | 1 kb | Exon | Intron | ||
| Found in human AND chimp AND rodents ( | Distal | |||||
| 10 kb | 330 (3.9%) | 75 (0.9%) | 19 (0.2%) | 180 (2.1%) | 30 (0.4%) | |
| 1 kb | 133 (1.6%) | 42 (0.5%) | 8 (0.1%) | 21 (0.2%) | 10 (0.1%) | |
| exon | 3306 (39.1%) | 983 (11.6%) | 206 (2.4%) | 93 (1.1%) | 121 (1.4%) | |
| intron | 393 (4.7%) | 26 (0.3%) | 6 (0.1%) | 22 (0.3%) | 10 (0.1%) | |
| Distal | 10 kb | 1 kb | Exon | Intron | ||
| Found in human AND in chimp NOT in rodents ( | Distal | |||||
| 10 kb | 394 (5.4%) | 122 (1.7%) | 37 (0.5%) | 52 (0.7%) | 93 (1.3%) | |
| 1 kb | 194 (2.7%) | 45 (0.6%) | 22 (0.3%) | 25 (0.3%) | 8 (0.1%) | |
| exon | 1186 (16.2%) | 385 (5.3%) | 117 (1.6%) | 23 (0.3%) | 44 (0.6%) | |
| intron | 965 (13.2%) | 89 (1.2%) | 15 (0.2%) | 9 (0.1%) | 15 (0.2%) | |
Predictions of RNA secondary structures by RNAZ (p-value > 90%)
| Distal | 10 kb | 1 kb | exon | intron | ||
| Found in human AND chimp (1436) | Distal | 418 (29.1%) | 64(4.5%) | 17(1.2%) | 62(4.3%) | 184 (12.8%) |
| 10 kb | 70 (4.9%) | 19(1.3%) | 3(0.2%) | 19(1.3%) | 24(1.7%) | |
| 1 kb | 30 (2.1%) | 8(0.6%) | 4(0.3%) | 3(0.2%) | 1(0.1%) | |
| exon | 164 (11.4%) | 61(4.2%) | 15(1.0%) | 6(0.4%) | 6(0.4%) | |
| intron | 226(15.7%) | 20(1.4%) | 3(0.2%) | 5(0.3%) | 4(0.3%) | |
| Distal | 10 kb | 1 kb | Exon | Intron | ||
| Found in human AND mouse (241) | Distal | 32(13.3%) | 2(0.8%) | 2(0.8%) | 39(16.2%) | 11(4.6%) |
| 10 kb | 4(1.7%) | 0(0.0%) | 0(0.0%) | 15(6.2%) | 0(0.0%) | |
| 1 kb | 2(0.8%) | 1(0.4%) | 1(0.4%) | 2(0.8%) | 0(0.0%) | |
| exon | 71(29.5%) | 32(13.3%) | 5(2.1%) | 2(0.8%) | 3(1.2%) | |
| intron | 9(3.7%) | 3(1.2%) | 1(0.4%) | 3(1.2%) | 1(0.4%) | |
| Distal | 10 kb | 1 kb | Exon | Intron | ||
| Found in human AND chimp AND rodents (234) | Distal | 31(13.2%) | 2(0.9%) | 2(0.9%) | 38(16.2%) | 10(4.3%) |
| 10 kb | 4(1.7%) | 0(0.0%) | 0(0.0%) | 14(6.0%) | 0(0.0%) | |
| 1 kb | 2(0.9%) | 1(0.4%) | 1(0.4%) | 1(0.4%) | 0(0.0%) | |
| exon | 70(29.9%) | 32(13.7%) | 5(2.1%) | 2(0.9%) | 3(1.3%) | |
| intron | 9(3.8%) | 2(0.9%) | 1(0.4%) | 3(1.3%) | 1(0.4%) | |
| Distal | 10 kb | 1 kb | exon | Intron | ||
| Found in human AND in chimp NOT in rodents (1202) | Distal | 387(32.2%) | 62(5.2%) | 15(1.2%) | 24(2.0%) | 174 (14.5%) |
| 10 kb | 66(5.5%) | 19(1.6%) | 3(0.2%) | 5(0.4%) | 24(2.0%) | |
| 1 kb | 28(2.3%) | 7(0.6%) | 3(0.2%) | 2(0.2%) | 1(0.1%) | |
| exon | 94(7.8%) | 29(2.4%) | 10(0.8%) | 4(0.3%) | 3(0.2%) | |
| intron | 217(18.1%) | 18(1.5%) | 2(0.2%) | 2(0.2%) | 3(0.2%) | |
| Distal | 10 kb | 1 kb | exon | Intron | ||
| Found in human AND in chimp NOT in rodents, and NOT present in databases (1073) | Distal | 56(5.2%) | 13(1.2%) | 22(2.1%) | 151 (14.1%) | |
| 10 kb | 57(5.3%) | 15(1.4%) | 2(0.2%) | 5(0.5%) | 20(1.9%) | |
| 1 kb | 24(2.2%) | 7(0.7%) | 3(0.3%) | 1(0.1%) | 1(0.1%) | |
| exon | 83(7.7%) | 24(2.2%) | 9(0.8%) | 2(0.2%) | 3(0.3%) | |
| intron | 196 (18.3%) | 17(1.6%) | 2(0.2%) | 2(0.2%) | 3(0.3%) | |
Figure 2Example of secondary structures of three candidates for novel primate-specific non-protein-coding RNAs as predicted by program RNAZ. The genomic coordinates (NCBI build 34) of the three TARs are chromosome 2:10,415,689–10,415,908; chromosomes 11: 11,3423,510–113,423,729; and chromosome 7:56,614,161–56,614,382.