| Literature DB >> 17522093 |
Yuta Sakakibara1, Takuma Irie, Yutaka Suzuki, Riu Yamashita, Hiroyuki Wakaguri, Akinori Kanai, Joe Chiba, Toshihisa Takagi, Junko Mizushima-Sugano, Shin-ichi Hashimoto, Kenta Nakai, Sumio Sugano.
Abstract
In order to understand an overview of promoter activities intrinsic to primary DNA sequences in the human genome within a particular cell type, we carried out systematic quantitative luciferase assays of DNA fragments corresponding to putative promoters for 472 human genes which are expressed in HEK (human embryonic kidney epithelial) 293 cells. We observed the promoter activities of them were distributed in a bimodal manner; putative promoters belonging to the first group (with strong promoter activities) were designated as P1 and the latter (with weak promoter activities) as P2. The frequencies of the TATA-boxes, the CpG islands, and the overall G + C-contents were significantly different between these two populations, indicating there are two separate groups of promoters. Interestingly, similar analysis using 251 randomly isolated genomic DNA fragments showed that P2-type promoter occasionally occurs within the human genome. Furthermore, 35 DNA fragments corresponding to putative promoters of non-protein-coding transcripts (ncRNAs) shared similar features with the P2 in both promoter activities and sequence compositions. At least, a part of ncRNAs, which have been massively identified by full-length cDNA projects with no functional relevance inferred, may have originated from those sporadic promoter activities of primary DNA sequences inherent to the human genome.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17522093 PMCID: PMC2779894 DOI: 10.1093/dnares/dsm006
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1Luciferase activities of the PPRs. Luciferase activities of the PPRs (A) and the randomly isolated genomic fragments (B) Error bar indicates the standard deviation of each assay. (C) Distribution of the luciferase activities of the PPRs (gray bars), random genomic fragments (blank bars) and PPRs of the ‘ncRNAs’ (solid bars). (D) The distribution of the PPRs which are supported by more than three oligo-cap cDNAs is shown by blank bars. The average luciferase activity of the random genomic fragments was designated as 1 for all of the analyses. Details of the methods are provided as supporting information.
Luciferase activities and the classification of the PPRs
| Luciferase activity | P | P1 | P2 | G | G1 | G2 | ncRNA |
|---|---|---|---|---|---|---|---|
| >103 | 2 | 2 | 0 | 0 | 0 | 0 | 0 |
| 103–102 | 175 | 175 | 0 | 0 | 0 | 0 | 1 |
| 102–101 | 217 | 217 | 0 | 3 | 3 | 0 | 3 |
| 101–100 | 63 | 17 | 46 | 43 | 30 | 13 | 16 |
| 100–101 | 15 | 0 | 15 | 196 | 0 | 196 | 15 |
| <10−1 | 0 | 0 | 0 | 9 | 0 | 9 | 0 |
| 472 | 411 | 61 | 251 | 33 | 218 | 35 |
Statistical significances of the marked positions are shown in the margin.
Sequence features of the PPRs and genomic fragments
| P (%) | P1 (%) | P2 (%) | G (%) | G1 (%) | G2 (%) | |
|---|---|---|---|---|---|---|
| CpG island | 267 (57) | 259 (63)* | 8 (13) | 0 (0) | 0 (0) | 0 (0) |
| TATA box: strict | 34 (7) | 30 (7) | 4 (7) | 47 (19) | 7 (21) | 40 (18) |
| TATA box: less strict | 103 (22) | 81 (20)** | 22 (36) | 140 (56) | 20 (61) | 120 (55) |
| Average G + C content | 0.53 | 0.54*** | 0.47*** | 0.45 | 0.43 | 0.45 |
| Total | 472 | 411 | 61 | 251 | 33 | 218 |
Figure 2G + C content of the PPRs. Box plot chart of the G + C content of the indicated population of the PPRs is shown.
Sequence features of the PPRs of ncRNAs and the orphan cDNAs
| ncRNAcore (%) | ncRNA (293positive) (%) | Orphan cDNAs | |
|---|---|---|---|
| CpG island | 87 (11) | 3 (9) | 1872 (20) |
| TATA box: strict | 117 (15) | 6 (23) | 1411 (15) |
| TATA box: less strict | 413 (54) | 22 (63) | 4584 (49) |
| Average G + C content | 0.45 | 0.44 | 0.47 |
| Total | 768 | 35 | 9377 |